Maybe it is language specific? Maybe LLMs have a lot of good JavaScript/TypeScript samples for training and it works for those devs (e.g. me). I heard that Scala devs have problems with LLMs writing code too. I am puzzled by good devs not managing to get LLM work for them.
I definitely think it's language specific. My history may deceive me here, but i believe that LLMs are infinitely better at pumping out python scripts than java. Now i have much, much more experience with java than python, so maybe it's just a case of what you don't know.... However, The tools it writes in python just work for me, and i can incrementally improve them and the tools get rationally better and more aligned with what i want.
I then ask it to do the same thing in java, and it spends a half hour trying to do the same job and gets caught in some bit of trivia around how to convert html escape characters, for instance, s.replace("<", "<").replace(">", ">").replace("\"").replace("""); as an example and endlessly compiles and fails over and over again, never able to figure out what it has done wrong, nor decides to give up on the minutia and continue with the more important parts.
Maybe it's because there's no overall benefit to these things.
There's been a lot of talk about it for the past few years but we're just not seeing impacts. Oh sure, management talk it up a lot, but where's the corresponding increase in feature delivery? Software stability? Gross profit? EBITDA?
Give me something measurable and I'll consider it.
When I used it before Christmas (free trial), it very visibly paused for a bit every so often, telling me that it was compressing/summarising its too-full context window.
I forget the exact phrasing, but it was impossible to miss unless you'd put everything in the equivalent of a Ralph loop and gone AFK or put the terminal in the background for extended periods.
However I run like 3x concurrent sessions that do multiple compacts throughout, for like 8hrs/day, and I go through a 20x subscription in about 1/2 week. So I'm extremely skeptical of these negative claims.
Edit: However I stay on top of my prompting efficiency, maybe doing some incredibly wasteful task is... wasteful?
This is where engineering practices help. Based on 1.5 years data from my team I can say that I see about 30% performance increase on mature system (about 9 years old code base), maybe more. The interesting stuff - LLMs is leverage, the better engineer you are the more you benefit from LLM.
I guess I am kind of "AI evangelist" in my circles (team, ecosystem and etc). I personally see benefits in "AI" both for side-projects and main work. However according to my last measurements improvements is not dramatic, it is huge (about 30%), but not dramatic. I share my insights purely to have less on my shoulders (if my team members can do more it is less for me to do).
In some cases, that’s true, but sometimes you need to update cutting rules because of law changes, or you saw different way of cutting for example. There are cases where this is not one time investment. What I agree with that cutting-it-yourself became significantly cheaper
2. If using Cursor (as I usually am), this isn't what it always does by default, though you can invoke something like it using "plan" mode. It's default is to keep todo items in a little nice todo list, but that isn't the same thing as a spec.
3. I've found that Claude Code doesn't always do this, for reasons unknown to me.
4. The prompt is completely fungible! It's really just an example of the idea.
reply