Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Brainstorming rare diseases and making diagnosis and providing treatment using medical science are different things.

If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.



Proof based higher math and being good at calculating the answers to arithmetical formulas are two pretty unrelated things that just happen to both be called "math".

One of my better math professors in a very good pure math undergraduate program added 7 + 9 and got 15 during a lecture, that really doesn't say anything about his ability as a mathematician though.


That’s sorta my point: diagnosing well studied diseases and providing precise treatment is different from speculating causes for rare diseases.

Who knows, OP could be a paint sniffer and that’s their root issue. Brainstorming these things requires creativity and even hallucination. But that’s not what doctors do.


I thought all math was similar due to the ability to work with it requiring decent working memory. Both mental math and conceptually complex items from theory require excellent working memory, which is a function of IQ


You still have to practice arithmetic to be good at it, and a lot of mathematicians don't


> But it fails at simple arithmetic.

Does it though? When allowing LLMs to use their outputs as a form of state they can very much succeed up to 14 digits with > 99.9% accuracy, and it goes up to 18 without deteriorating significantly [1].

That really isn't a good argument because you are asking it to do one-shot something that 99.999% of humans can't.

https://arxiv.org/abs/2211.09066


Try asking it to combine some simple formulas involving unit conversions. It does not do math. You can ask it questions that let it complete patterns more easily.


It does not have to do math in one shot, and neither can humans. The model needs only to decompose the problem to subcomponents and solve those. If it can do so recursively via the agents approach then by all means it can do it.

The cited paper covers this to some extend. Instead of asking the LLMs to do multiplication of large integers directly, they ask the LLM to break the task into 3-digit numbers, do the multiplications, add the carries, and then sum everything up. It does quite well.


What do you mean one-shot? Hasn't ChatGPT been trained on hundreds of maths textbooks?


When I ask a human to do 13 digit addition, 99.999% of them will do the addition in steps, and almost nobody will immediately blurt out an answer that is also correct without doing intermediate steps in their head. Addition requires carries, and we start from least to most significant and calculate with the carries. That is what 1-shot refers to.

If allow LLMs to do the same instead of producing the output in a single textual response, then they will do just fine according to the cited paper.

Average humans can do multiplication in 1 step for small numbers because they have memorized the tables. So can LLMs. Humans need multiple steps for addition, and so do LLMs.


Ok. In the context of AI, 1-shot generally means that the system was trained only on 1 example (or few examples).

Regarding of the number of steps it takes an LLM to get the right answer: isn't it more important that it gets the right answer, since LLMs are faster than humans anyway?


I am well aware what it means, and I used 1-shot for the same reason we humans say I gave it "a shot", meaning attempt.

LLMs get the right answer and do so faster than humans. The only real limitation here is the back and forth because of the chat interface and implementation. Ultimately, it all boils down to giving prompts that achieve the same thing as shown in the paper.

Furthermore, this is a weird boundary/goal-post humans get stuff wrong all the time, and we created tools to make our lives easier, if we let LLMs use tools, they do even better.


LLMs are not for doing arithmetic. Don’t use a hammer to drive screws.


It’s an irregularity in their performance profile. Arithmetic is a known issue. How many such irregularities exist but are not measurable?


They are terrible at synthesizing knowledge.

If a search engine result says water is wet, they’ll tell you about it.

If not, then we should consider all the issues around water and wetness, but note that water is a great candidate for wetting things, though it is important to remember that it has severe limitations with respect to wetting things, and, at all costs some other alternatives should be considered, including list of paragraphs about tangential buzzwords such as buckets and watering cans go here.


is arithmetic based on language? should an LLM be expected to handle one plus one ad infinitum? Makes no sense since it's not built for it


Why does this apply for math but not for being a doctor?? It can do basic math, but you say that of course it can't do math- math isn't language. The fact that it can do some basic diagnosis does not mean it's good at doctor things or even that its better than webmd.


Arithmetic requires a step-by-step execution of an algorithm. LLMs don't do that implicitly. What they do is vector adjacency search in absurdly high-dimensional space. This makes them good at giving you things related to what you wrote. But it's the opposite of executing arbitrary algorithms.

Or, look at it this way: the LLM doesn't have a "voice in its head" in any form other than a back-and-forth with you. If I gave you any arithmetic problem less trivial than the times table, you won't suddenly come up with the right answer - you'll do some sequence of steps in your head. If you let an LLM voice the steps, it gets better at procedural tasks too.


Despite the article, I don’t think it would be a good doctor.

I read a report of a doctor who tried it on his case files from the ER (I’m sure it was here in HN) It called some of the cases correctly, missed a few others, and would have killed one woman. I’m sure it has its place, but use a real doctor if your symptoms are in any way concerning.


If you don't have insurance, it might be the only chance you get at a doctor diagnosis in some parts of the civilised world


> If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.

The only reason failing at basic arithmetic indicates something when discussing a human is because you can reasonably expect any human to be first taught arithmetic in school. Otherwise, those things are hardly related. Now, LLMs don't go to school.


Most humans fail at doing simple arithmetic in their head. At the very least I'd say GPT4 is superior to 99% of people at mental math. And because it can explain its work step by step it's easy to find where the flaw in its reasoning is and fix it. GPT-4 is capable of self-correction with the right prompts in my experience.


Being bad at arithmetic and making diagnoses are entirely separate things.

If that's your best argument, you don't have an argument.


You're completely wrong- look at the wikipedia page for differential diagnosis: https://en.wikipedia.org/wiki/Differential_diagnosis

Literally the majority of the page is basic arithmetic, mostly Bayes. Diagnosis is a process of determining (sometimes quantitative, sometimes qualitative) the relative incidences of different diseases and all the possible ways they can present. Could this be X rare virus, or is it Y common virus presenting atypically?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: