Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).
This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.
... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.
It has nothing to do about the performance of the string replacement.
The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.
> Why not just extend the OpenAPI specification to skills?
Because approximately none of what exists in the existing OpenAPI specification is relevant to the task, and nothing needed for the tasks is relevant to the current OpenAPI use case, so trying to jam one use case into a tool designed for the other would be pure nonsense.
It’s like needing to drive nails and asking why grab a hammer when you already have a screwdriver.
The language itself was not invented for the purpose: it was the language spoken in Florence, than adopted by the literary movement and than selected as the national language.
It seems like the best tradeoff between information density and understandability actually comes from the deep latin roots of the language
I was honestly surprised to find it in the first place, because I assumed English to be at first place given the simpler grammar and the huge dataset available.
I agree with your belief, other languages have either lower density (e.g. German) or lower understandability (e.g. English)
English has a ton of homophones, way more sounds that differ slightly (long/short vowels), and major pronunciation differences across major "official" languages (think Australia/US/Canada/UK).
Italian has one official italian (two, if you count IT_ch, but difference is minor), doesn't pay much attention to stress and vowel length, and only has a few "confusable" sounds (gl/l, gn/n, double consonants, stuff you get wrong in primary school). Italian dialects would be a disaster tho :)
Sure, it's a dumpster fire. But human engineers work on it just fine without investing man-decades into refactoring it into some shrine to the software engineer's craft.
The whole point of AI, in our parent company's eyes, is for no one to mention "code quality" as something impeding the delivery of features, yesterday, ever.
Claude, with a modicum of guidance from an engineer familiar with your monolith, could could write comprehensive unit tests of your existing system, then refactor it into coherent composable parts, in a day.
Not doing so while senior management demands the use of AI augmentation seems odd.
It baffles me how much the discourse over native apps rarely takes this into consideration.
You reduce development effort by a third, it is ok to debate whether a company so big should invest into a better product anyway but it is pretty clear why they are doing this
That might be true (although you do add in the mess of web frameworks), but I strongly believe that resource usage must factor into these calculations too. It's a net negative to end users if you can develop an app a bit quicker but require the end users to have multiple more times RAM, CPU, etc.
Part of this (especially the CPU) is teams under-optimizing their Electron apps. See the multi-X speedup examples when they look into it and move hot code to C et al.
It might be a cynical take, but I don't think there is a single person in these companies that cares about end user resource usage.
They might care if the target were less tech savvy people that are likely to have some laptop barely holding up with just Win11. But for a developer with a MacBook, what is one more electron window?
I agree. I also find it interesting that many developers don't mind using Docker to run Redis / Postgresql and other services on Mac that are very simple to install and run directly. That's fine, but then they don't get to complain about Electron.
There are valid use cases for Docker on those types of software, but most users just use Docker for convenience or because "everyone else" uses them. Maybe influenced by Linux users where Docker has lower overhead. It's convenient for sure, but it also has a cost on Mac/Windows
Especially given how fast things progress, timeline and performance are a tradeoff where I'd say swaying things in favour of the latter is not per definition net positive.
Microsoft gets largely pilloried on every UI rethink, Apple’s Liquid Glass just annoyed everyone I’ve heard comment on it, and, fwiw, YouTube Music asking if it feels outdated is an unnecessary annoyance.
After every time I read "save effort with Electron", I go back to Win2K VM and poke around things and realize how faster everything is than M4 Max, just because value is value, and Electron saves some effort.
There are cross platform GUI toolkits out there so while I am in team web for lots of reasons, generally it’s because web apps are faster and cheaper to iterate.
Cross platform GUIs might does have the same of support and distributed knowledge as HTML/CSS/JS. If that vendor goes away or the oss maintainers go a different direction, now you have an unsupported GUI platform.
I mean the initial release of Qt predates JavaScript by a few months and CSS by more than a year. GTK is only younger by a few years and both remain actively maintained.
Argument feels more like FUD than something rooted in factual reality.
The real question is how much better are native apps compared to Electron apps.
Yes that would take much disk space, but it takes 50Mb or 500Mb isn't noticeable for most users. Same goes for memory, there is a gain for sure but unless you open your system monitor you wouldn't know.
So even if it's something the company could afford, is it even worth it?
Also it's not just about cost but opportunity cost. If a feature takes longer to implement natively compared to Electron, that can cause costly delays.
It absolutely is noticeable the moment you have to run several of these electron “apps” at once.
I have a MacBook with 16GB of RAM and I routinely run out of memory from just having Slack, Discord, Cursor, Figma, Spotify and a couple of Firefox tabs open. I went back to listening to mp3s with a native app to have enough memory to run Docker containers for my dev server.
Come on, I could listen to music, program, chat on IRC or Skype, do graphic design, etc. with 512MB of DDR2 back in 2006, and now you couldn’t run a single one of those Electron apps with that amount of memory. How can a billion dollar corporation doing music streaming not have the resources to make a native app, but the Songbird team could do it for free back in 2006?
I’ve shipped cross platform native UIs by myself. It’s not that hard, and with skyrocketing RAM prices, users might be coming back to 8GB laptops. There’s no justification for a big corporation not to have a native app other than developer negligence.
On that note, I could also comfortably fit a couple of chat windows (skype) on a 17'' CRT (1024x768) back in those days. It's not just the "browser-based resource hog" bit that sucks - non-touch UIs have generally become way less space-efficient.
I think the comparison between native apps and Electron apps is conflating two things:
- Native apps integrate well with the native OS look and feel and native OS features. I'd say it's nice to have, but not a must have, especially considering that the same app can run on multiple platforms.
- Native apps use much less RAM than Electron apps. I believe this one is a real issue for many users. Running Slack, Figma, Linear, Spotify, Discord, Obsidian, and others at the same time consumes a lot of memory for no good reason.
Which makes me wonder: Is there anything that could removed from Electron to make it lighter, similar to what Qt does?
If I understood correctly, the same can be done on VS Code with the github plugins (for github PRs)
It's pretty straightforward: you checkout a PR, move around, and either make some edits (that you can commit and push to the feature branch) or add comments.
I don't understand what kind of evidence you expect to receive.
There are plenty of examples from talented individuals, like Antirez or Simonw, and an ocean of examples from random individuals online.
I can say to you that some tasks that would take me a day to complete are done in 2h of agentic coding and 1h of code review, with the additional feature that during the 2h of agenti coding I can do something else. Is this the kind of evidence you are looking for?
The problem with jQuery is that, being imperative, it quickly becomes complex when you need to handle more than one thing because you need to cover imperatively all cases.
Yeah, that's the other HN koan about "You probably don't need React if..." But if you are using jquery/vanilla to shove state into your HTML, you probably actually do need something like react.
After having some time to think about it, I've seen some really perverse DOM stuff in jquery. Like $(el).next().children(3) type stuff. So I think this stuff really fell-over when there was 'too much state' for the DOM.
I think if you want to go high-dom manipulation a la jQuery, and want some form of complex state, storing the state _on_ the DOM might make sense? Things like data attributes and such, but I also feel like that’s itching for something more like htmx or maybe svelte (I’ve not looked into either enough, so I may be completely off base).
I do agree with the notion that jQuery is easy to mishandle when logic grows beyond a pretty narrow (mostly stateless) scope. It’s fantastic up until that point, and incredibly easy to footgun beyond it.
Yeah, that's the thing, it might make sense in some simple 1-dimensional case, but beyond that it turns into spaghetti code (or a homebrew 'framework'). The big thing is that if you want to re-gigger some of the DOM, React is actually a lot nicer than jquery.
I’ll die on the “give me Vue over react any day” hill in that case, admittedly because I think React’s template/code mix is atrocious. I also _feel_ like React suffers from “why not do everything” syndrome, and that’s from a very naive perspective so grain or mountain of salt
Part of me feels the same way, and ~2015 me was full on SPA believer, but nowadays I sigh a little sigh of relief when I land on a site with the aesthetic markers of PHP and jQuery and not whatever Facebook Marketplace is made out of. Not saying I’d personally want to code in either of them, but I appreciate that they work (or fail) predictably, and usually don’t grind my browser tab to a halt. Maybe it’s because sites that used jQuery and survived, survived because they didn’t exceed a very low threshold of complexity.
It feels like shooting a fly with a bazooka
reply