Regarding safety, no benchmark showed 0% misalignment. The best we had was "safest model so far" marketing speech.
Regarding predicting the future (in general, but also around AI), I'm not sure why would anyone think anything is certain, or why would you trust anyone who thinks that.
Humanity is a complex system which doesn't always have predictable output given some input (like AI advancing). And here even the input is very uncertain (we may reach "AGI" in 2 years or in 100).
I guess that it generally has 50/50 chance of drive/walk, but some prompts nudge it toward one or the other.
Btw explanations don't matter that much. Since it writes the answer first, the only thing that matters is what it will decide for the first token. If first token is "walk" (or "wa" or however it's split), it has no choice but to make up an explanation to defend the answer.
Can one solution be always doing two scans, N months apart, before drawing any conclusions (excluding things that can be reliably detected from a single scan)? Initial scan could affect N (if you find something potentially aggressive, you can schedule the second scan sooner). And then do a follow up every M years.
That should exclude benign or very slowing growing things
The question is how can you know if it needs treatment or not. I guess you either need to do a biopsy, or check if it's grown after N months (leaving patient scared and anxious during that time). Neither are great if most cases end up not needing treatment.
If the test provides you zero information about whether it needs treating then it was never a useful test. Presumably it's more like "there's a X% chance this needs treatment". In which case you just set reasonable thresholds for X. E.g. if it's 5% you monitor it, 10% you do a biopsy, 70% you operate, etc.
This is much more sensible than just not testing at all and letting people die from cancer.
> leaving patient scared and anxious during that time
This seems to be the actual motivation. We don't want to scare people with test results so we're just not going to test them. I think that should be up to the patient.
> This is much more sensible than just not testing at all and letting people die from cancer.
This is not what happens. You're assuming that if the cancer does not get detected by the screening then it never gets detected. What actually happens is that the test gives information that might actually be redundant and obtainable in less risky way. What the studies are showing is that waiting until there are other, more specific signs and symptoms of the prostate cancer results in the same survival rates.
See https://pubmed.ncbi.nlm.nih.gov/38926075/. I was not aware of the ERSPC which came out late last year and gives better outcomes for screening, but overall the evidence is not super clear yet. There are possibly certain groups that can benefit from PSA screening more than others. Also, modern, more effective treatments might allow for later diagnosis with the same clinical results.
I don't see the interface changing much in 3-6 months, and definitely not fundamentally.
Sure, there will probably be some changes around MCP, skills, AGENTS.md and similar, but I don't see them as big changes, and you can use the tools now without those things.
> I don't see the interface changing much in 3-6 months, and definitely not fundamentally.
This is as insightful as a fellow noting that both a caulk gun and a shotgun have a fixed handle and movable trigger and genuinely wondering why an expert user of the former would ever have even a moment's trouble learning to use the latter.
Everything involves performing and actually proving what you know. If this is such an issue, then its something you need to fix. I have never actually met anyone who has this “perfomance anxiety” where they are so brilliant but do poorly on tests because of it. I think its a myth to attack rigor of academics. For knowledge workers everntually you have to go into court, or perform surgery, or do the taxes or give the presentation, or have the high pressure meeting. If anxiety is truly debilitating to the person all of these situations theyll be doomed so filter them out.
Keep the huge, complex business logic on the server whenever possible.
That doesn't work for webapps that are effectively entirely based on client side reactivity like Figma, though the list of projects that need to work like that is extremely low. Even for those style of apps I do wonder how far something like Phoenix LiveView might go towards the end goal.
Regarding predicting the future (in general, but also around AI), I'm not sure why would anyone think anything is certain, or why would you trust anyone who thinks that.
Humanity is a complex system which doesn't always have predictable output given some input (like AI advancing). And here even the input is very uncertain (we may reach "AGI" in 2 years or in 100).
reply