More

srdjanr · 2026-02-20T12:00:49 1771588849

Regarding safety, no benchmark showed 0% misalignment. The best we had was "safest model so far" marketing speech.

Regarding predicting the future (in general, but also around AI), I'm not sure why would anyone think anything is certain, or why would you trust anyone who thinks that.

Humanity is a complex system which doesn't always have predictable output given some input (like AI advancing). And here even the input is very uncertain (we may reach "AGI" in 2 years or in 100).

srdjanr · 2026-02-18T20:01:31 1771444891

I guess that it generally has 50/50 chance of drive/walk, but some prompts nudge it toward one or the other.

Btw explanations don't matter that much. Since it writes the answer first, the only thing that matters is what it will decide for the first token. If first token is "walk" (or "wa" or however it's split), it has no choice but to make up an explanation to defend the answer.

srdjanr · 2026-02-16T21:06:45 1771276005

If I had to choose between a large organization and a single person vibe coded app, I'd choose large organization.

srdjanr · 2026-02-15T09:52:09 1771149129

Can one solution be always doing two scans, N months apart, before drawing any conclusions (excluding things that can be reliably detected from a single scan)? Initial scan could affect N (if you find something potentially aggressive, you can schedule the second scan sooner). And then do a follow up every M years.

That should exclude benign or very slowing growing things

srdjanr · 2026-02-15T09:22:03 1771147323

The question is how can you know if it needs treatment or not. I guess you either need to do a biopsy, or check if it's grown after N months (leaving patient scared and anxious during that time). Neither are great if most cases end up not needing treatment.

IshKebab · 2026-02-15T11:21:12 1771154472

If the test provides you zero information about whether it needs treating then it was never a useful test. Presumably it's more like "there's a X% chance this needs treatment". In which case you just set reasonable thresholds for X. E.g. if it's 5% you monitor it, 10% you do a biopsy, 70% you operate, etc.

This is much more sensible than just not testing at all and letting people die from cancer.

> leaving patient scared and anxious during that time

This seems to be the actual motivation. We don't want to scare people with test results so we're just not going to test them. I think that should be up to the patient.

gjulianm · 2026-02-15T12:32:21 1771158741

> This is much more sensible than just not testing at all and letting people die from cancer.

This is not what happens. You're assuming that if the cancer does not get detected by the screening then it never gets detected. What actually happens is that the test gives information that might actually be redundant and obtainable in less risky way. What the studies are showing is that waiting until there are other, more specific signs and symptoms of the prostate cancer results in the same survival rates.

IshKebab · 2026-02-15T17:40:39 1771177239

Interesting. Do you have a source for that?

gjulianm · 2026-02-15T18:48:47 1771181327

See https://pubmed.ncbi.nlm.nih.gov/38926075/. I was not aware of the ERSPC which came out late last year and gives better outcomes for screening, but overall the evidence is not super clear yet. There are possibly certain groups that can benefit from PSA screening more than others. Also, modern, more effective treatments might allow for later diagnosis with the same clinical results.

srdjanr · 2026-02-08T12:24:34 1770553474

I don't see the interface changing much in 3-6 months, and definitely not fundamentally.

Sure, there will probably be some changes around MCP, skills, AGENTS.md and similar, but I don't see them as big changes, and you can use the tools now without those things.

simoncion · 2026-02-10T00:07:36 1770682056

> I don't see the interface changing much in 3-6 months, and definitely not fundamentally.

This is as insightful as a fellow noting that both a caulk gun and a shotgun have a fixed handle and movable trigger and genuinely wondering why an expert user of the former would ever have even a moment's trouble learning to use the latter.

srdjanr · 2026-02-07T14:37:59 1770475079

> Why do you ever need, for most of the use cases you can think of, a useless, expensive, flawed, often vulnerable framework

Like the vibe coded solution won't be flawed and vulnerable

srdjanr · 2026-01-20T15:22:36 1768922556

Why would performance anxiety be disqualifying for knowledge workers?

ecshafer · 2026-01-20T16:12:43 1768925563

Everything involves performing and actually proving what you know. If this is such an issue, then its something you need to fix. I have never actually met anyone who has this “perfomance anxiety” where they are so brilliant but do poorly on tests because of it. I think its a myth to attack rigor of academics. For knowledge workers everntually you have to go into court, or perform surgery, or do the taxes or give the presentation, or have the high pressure meeting. If anxiety is truly debilitating to the person all of these situations theyll be doomed so filter them out.

srdjanr · 2026-01-10T10:52:48 1768042368

What else should you use for huge complex web apps?

_heimdall · 2026-01-10T11:26:43 1768044403

Keep the huge, complex business logic on the server whenever possible.

That doesn't work for webapps that are effectively entirely based on client side reactivity like Figma, though the list of projects that need to work like that is extremely low. Even for those style of apps I do wonder how far something like Phoenix LiveView might go towards the end goal.

vrighter · 2026-01-10T11:01:39 1768042899

maybe, just maybe, the browser is not always the best tool for the job

DecoySalamander · 2026-01-10T12:34:55 1768048495

I think that there are more apps that are better off as web apps (cross platform and sandboxed) than not.

FridgeSeal · 2026-01-10T12:38:17 1768048697

But I hired the whole react dev, so I’ll use the whole react dev!

/s

Blackarea · 2026-01-10T16:20:14 1768062014

<3 if I don't see 15 new node modules and 3 CVEs by EOB today I'll replace you with a css architect and vibe-coding nft monkey by next week!

srdjanr · 2026-01-02T08:58:57 1767344337

Well if the solution doesn't work in real world, it is EU's problem

mwarkentin · 2026-01-02T12:29:52 1767356992

Some have done it (repo includes links to main articles covering the process): https://github.com/getsentry/sentry-watchdog

Note: I work for sentry.