I find it interesting that this thread is full of pragmatic posts that seem to honestly reflect the real limits of current Gen-Ai.
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.
The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.
> "The biggest issue I see is Microsoft's entire mentality around AI adoption that focuses more on "getting the numbers up" then actually delivering a product people want to use."
That succinctly describes 90% of the economy right now if you just change a word and remove a couple:
The biggest issue I see is the entire mentality that focuses more on "getting the numbers up" than actually delivering a product people want to use.
KPI infection. You see projects whose goal is, say "repos with A I code review turned on" vs "Code review suggestions that were accepted". And then if you do get adoption (like, say, a Claude Code trial), then VPs balk about price. If it's actually expensive now it's because they are actually using it all the time!
The same kind of logic that led companies to migrate from Slack to Teams. Metrics that don't actually look at actual, positive impact, as nobody picks a risky KPI, and will instead pick a useless one that can't miss.
"The company can be held vicariously liable" means that in this analogy, the company represents the human who used AI inappropriately, and the employee represents the AI model that did something it wasn't directly told to do.
Nobody tries to jail the automobile being driven when it hits a pedestrian when on cruise control. The driver is responsible for knowing the limits of the tool and adjusting accordingly.
Can you help me understand where you are coming from? Is it that you think the benchmark is flawed or overly harsh? Or that you interpret the tone as blaming AI for failing a task that is inherently tricky or poorly specified?
My takeaway was more "maybe AI coding assistants today aren’t yet good at this specific, realistic engineering task"....
In my experience many OTEL libraries are aweful to use and most of the "official" ones are the worst offenders as the are largely codegened. That typically makes them feel clunky to use and they exhibit code patterns that are non-native to the language used, which would an explanation of why AI systems struggle with the benchmark.
I think you would see similar results if tasking an AI to e.g. write GRPC/Protobuf systems using only the builtin/official protobuf codegen languages.
Where I think the benchmark is quite fair is in the solutions. It looks like for each of the languages (at least the ones I'm familiar with), the "better" options were chosen, e.g. using `tracing-opentelemtry` rather than `opentelemetry-sdk` directly in Rust.
However the one-shot nature of the benchmark also isn't that reflective of the actual utility. In my experience, if you have the initial framework setup done in your repo + a handful of examples, they do a great job of applying OTEL tracing to the majority of your project.
Where I work we are looking at a lot of our documentation and implementations where AI has a hard time when doing it.
This almost always correlates with customers having similar issues in getting things working.
This has lead us to rewrite a lot of documentation to be more consistent and clear. In addition we set out series of examples from simple to complex. This shows as less tickets later, and more complex implementations being setup by customers without the need for support.
I did similar for about 25 years. I had one injury from overtraining (I basically ran 20 miles every Sunday morning for 6 months, in addition to two shorter runs each week) that ended up plantar fasciitis and I had to take 4-5 month off.
I stopped doing that sort of weekly long run after that and did a lot more in the 6-10 miles range.
Then during and immediately post-COVID shutdowns, I just started running every time I felt stressed about something, and I started to neglect all the other holistic movements that complement running.
This ended up leading to a weird twinge in my hip that 2 years of focused strength training hasn't eliminated. Doctor says there is nothing structural but I don't run any more and I miss it often. There is a flow state I seem to get in somewhere just under to just over an hour in to a run.
The only other time I ever get in to that wonderful flow state is every once in a while when playing guitar, but it's rare.
There's a great analog with this in chess as well.
~1200 - omg chess is so amazing and hard. this is great.
~1500 - i'm really starting to get it! i can beat most people i know easily. i love studying this complex game!
~1800 - this game really isn't that hard. i can beat most people at the club without trying. really I think the only thing separating me from Kasparov is just a lot of opening prep and study
~2300 - omg this game is so friggin hard. 2600s are on an entirely different plane, let alone a Kasparov or a Carlsen.
Magnus Carlsen - "Wow, I really have no understanding of chess." - Said without irony after playing some game and going over it with a computer on stream. A fairly frequent happening.
IMO both perspectives have their place. Sometimes what's missing is the information, sometimes what's lacking is the ability to communicate it and/or the willingness to understand it. So in different circumstances either viewpoint may be appropriate.
What's missing more often than not, across fields of study as well as levels of education, is the overall commitment to conceputal integrity. From this we observe people's habitual inability or unwillingness to be definite about what their words mean - and their consequent fear of abstraction.
If one is in the habit of using one's set of concepts in the manner of bludgeons, one will find many ways and many reasons to bludgeon another with them - such as if a person turned out to be using concepts as something more akin to clockwork.
Simple counterexample: chess. The rules are simple enough we regularly teach them to young children. There's basically no randomness involved. And yet, the rules taken together form a game complex enough that no human alive can fully comprehend their consequences.
This is actually insightful: we usually don't know the question we are trying to answer. The idea that you can "just" find the right question is naive.
Sure, you can put it this way, with the caveat that reality at large isn't strongly definable.
You can sort of see this with good engineering: half of it is strongly defining a system simple enough to be reasoned about and built up, the other half is making damn sure that the rest of reality can't intrude, violate your assumptions and ruin it all.
Some of these are obviously related to the closing of some of the retail businesses. And some might simply be middle management bloat that happens often at tech companies.
But imagine you're one of the people who remain (e.g., not impacted by the eliminated companies or products) and now there are fewer people to do the same amount of work? I've seen that movie and it usually has an economic impact 6-9 months later when people burn out.
It's almost like you can write the script:
Month 0–3:
Survivors are relieved, grateful, and over-perform. Leadership reads this as “proof the cuts worked.”
Month 3–6:
Context loss shows up. Decision latency increases. Domain knowledge walked out the door.
Month 6–9:
Burnout, attrition, and quality failures begin. The “hidden layoffs” start as top performers quietly leave.
Month 9–12:
Rehiring or contracting resumes (usually at higher cost)
The key misunderstanding here is assuming AI substitutes for organizational slack and human coordination. It doesn’t.
And sometimes middle management "bloat" is misdiagnosed. Remove them without redesigning decision rights and workflows, and the load doesn’t disappear it redistributes to the IC's.
Watch for Amazon "strategic investments" in early Q4 2026 (this will be a cover for the rehiring).
I haven't detected overproduction after layoff I have seen. It was other way round, people who remained were sad, depressed and demotivated. What happened was general slow down of remaining people + organizational chaos as people did not figured out yet who should fill for missing positions and how.
I've noticed at my company after a lot of layoffs and restructuring and moving people between projects, when I ask "who is responsible for X now?" there can be a lot of confusion getting the answer to that question.
That is not an answer to the "who is responsible for X now" question. Laid off Bill is not responsible for X now.
Also, it is not useful answer at all, it is an uncooperative answer. Whoever is asking about the responsible person is trying to work. They have legitimate question about who they should contact about X, sending them to someone who does not work there is less then useless.
But it doesn't change that Bill was the person who was responsible, and now is gone. So what exactly are they supposed to say? In the context of the GP's post, that seems to be the point - there is no longer anybody there who is responsible for X anymore.
Several options, pretty much all of them involve being actually cooperative rather then intentionally unhelpful. If Bill was part of some other team, point to that team or its leader.
If he was in your team, you or leader can ask about what the person wants and move from there. Maybe you can actually answer the question. Maybe the proper reaction involves raising jira ticket. Maybe the answer is "we are probably not going to do that anymore". It all depends on what the person who came with the question wants.
> But it doesn't change that Bill was the person who was responsible, and now is gone.
The other people are still there. And the team IS responsible for X. And without doubt, they are fully responsible for helping figure out who should be contacted now and what should be done.
That is normal part of work after any reorganization.
I have seen it many times that when Bill leaves, the thing he was responsible for doesn't get picked up by anyone.
It doesn't necessarily even mean that the organization is "abnormal". Perhaps the reason Bill was let go was because X was not considered business-critical any more.
> I have seen it many times that when Bill leaves, the thing he was responsible for doesn't get picked up by anyone.
I LITERALLY offered the "we are probably not going to do that anymore" option. In your situation, you can scratch the probably away. That answer is still actually helpful unlike the original answer.
HN is where I keep hearing the “50× more productive” claims the most.
I’ve been reading 2024 annual reports and 2025 quarterlies to see whether any of this shows up on the other side of the hype.
So far, the only company making loud, concrete claims backed by audited financials is Klarna and once you dig in, their improved profitability lines up far more cleanly with layoffs, hiring freezes, business simplification, and a cyclical rebound than with Gen-AI magically multiplying output. AI helped support a smaller org that eliminated more complicated financial products that have edge cases, but it didn’t create a step-change in productivity.
If Gen-AI were making tech workers even 10× more productive at scale, you’d expect to see it reflected in revenue per employee, margins, or operating leverage across the sector.
I have friends who make such 50x productivity claims. They are correct if we define productivity as creating untested apps and games and their features that will never ship --- or be purchased, even if they were to ship. Thus, “productivity” has become just another point of contention.
100% agree. There are far more half-baked, incomplete "products" and projects out there now that it is easier to generate code. Generously, that doesn't necessarily equate to productivity.
I've agree with the fact that the last 10% of a project is the hardest part, and that's the part that Gen-AI sucks at (hell, maybe the 30%).
> If Gen-AI were making tech workers even 10× more productive at scale, you’d expect to see it reflected in revenue per employee, margins, or operating leverage across the sector.
If we’re even just talking a 2x multiplier, it should show up in some externally verifiable numbers.
I agree, and we might be seeing this but there is so much noise, so many other factors, and we're in the midst of capital re-asserting control after a temporary loss of leverage which might also be part of a productivity boost (people are scared so they are working harder).
The issue is that I'm not a professional financial analyst and I can't spend all day on comps so I can't tell through the noise yet if we're seeing even 2x related to AI.
But, if we're seeing 10x, I'd be finding it in the financials. Hell, a blind squirrel would, and it's simply not there.
Yes, I think there many issues in a big company that could hide a 2x productivity increase for a little while. But I'd expect it to be very visible in small companies and projects. Looking at things like number of games released on steam, new products launched on new product sites, or issues fixed on popular open source repos, you'd expect a 2x bump to be visible.
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
reply