Hacker Newsnew | past | comments | ask | show | jobs | submit | theshrike79's commentslogin

It makes the stock go up and people earn crazy amounts of money.

iff "earn" == "receive"

In the absence of force or fraud, those are the same, more or less.

Like any other mega-scaler, theyre just playing Money Chicken.

Everyone is spending crazy amounts of money in the hopes that the competition will tap out because they can't afford it anymore.

Then they can cool down on their spending and increase prices to a sustainable level because they have an effective monopoly.


Money Chicken is the best term I've seen for this!

Working with and communicating with offshored teams is a specific skill too.

There are tips and tricks on how to manage them and not knowing them will bite you later on. Like the basic thing of never asking yes or no questions, because in some cultures saying "no" isn't a thing. They'll rather just default to yes and effectively lie than admit failure.


Skills can contain scripts, making them a lot more versatile than just a document.

Of course any LLM can write any script based on a document, but that's not very deterministic.

A good example is Anthropic's PDF creator skill. It has the basic english instructions as well as actual Python code to generate PDFs


This strikes me as entirely logical in the short run, and an insane way of packaging software that we will certainly regret in the long run.

"Just a document" can certainly contain a script or code or whatever.

Of course, but the agent can't run a code block in a readme.

It _can_ run a PEP723 script without any specific setup (as long as uv and python are installed). It will automatically create a virtual environment AND install all dependencies. All with a single command without polluting the context with tons of setup.


How is this different from a README.md with a code block?

The code block isn't an executable script?

The same way you'd test a human following written instructions over time.

Check the results.


Mode like open local models are becoming "good enough".

I got stuff done with Sonnet 3.7 just fine, it did need a bunch of babysitting, but still it was a net positive to productivity. Now local models are at that level, closing up on the current SOTA.

When "anyone" can run an Opus 4.5 level model at home, we're going to be getting diminishing returns from closed online-only models.


See, the market is investing like _that will never happen_.

I'm just riding the VC powered wave of way-too-cheap online AI services and building tools and scaffolding to prepare for the eventual switch to local models =)

Should be possible with optimised models, just drop all "generic" stuff and focus on coding performance.

There's no reason for a coding model to contain all of ao3 and wikipedia =)


There is: It works (even if we can't explain why right now).

If we knew how to create a SOTA coding model by just putting coding stuff in there, that is how we would build SOTA coding models.


I think I like coding models that know a lot about the world. They can disambiguate my requirements and build better products.

I generally prefer a coding model that can google for the docs, but separate models for /plan and /build is also a thing.

> separate models for /plan and /build

I had not considered that, seems like a great solution for local models that may be more resource-constrained.


You can configure aider that way. You get three, in fact: an architect model, a code editor model, and a quick model for things like commit messages. Although I'm not sure if it's got doc searching capabilities.

That's what Meta thought initially too, training codellama and chat llama separately, and then they realized they're idiots and that adding the other half of data vastly improves both models. As long as it's quality data, more of it doesn't do harm.

Besides, programming is far from just knowing how to autocomplete syntax, you need a model that's proficient in the fields that the automation is placed in, otherwise they'll be no help in actually automating it.


But as far as I know, that was way before tool calling was a thing.

I'm more bullish about small and medium sized models + efficient tool calling than I'm about LLMs too large to be run at home without $20k of hardware.

The model doesn't need to have the full knowledge of everything built into it when it has the toolset to fetch, cache and read any information available.


But... but... I need my coding model to be able to write fanfiction in the comments...

Now I wonder how strong the correlation between coding performance and ao3 knowledge is in human programmers. Maybe we are on to something here /s

Tesla was a decent car with a very good computer in it.

They never bothered to improve on the car part, causing Teslas across the western world to fail inspections at staggering rates when the very basic car bits couldn't handle the torque of an EV.

Now old manufacturers have caught up on the computer front and China is blowing past at crazy rates and Tesla is legitimately in trouble.

The very high profile CEO cosplaying as an efficiency edgelord with the american president didn't help the company's image at all either.


It's just realism.

MacOS is unix under the hood so the models can just use bash and cli tools easily instead of dealing with WSL or Powershell.

MacOS has built-in sandboxing at a better level than Windows (afaik the Codex App is delayed for Windows due to sandboxing complexities)

Also the vast majority of devs use MacBooks unless they work for Microsoft or are in a company where the vast majority of employees are locked to Windows for some reason (usually software related).


> Also the vast majority of devs use MacBooks unless

It takes 15 seconds to verify this isn't even true in webdev, less so everything else.


Codex runs in a stupidly tight sandbox and because of that it refuses to run anything.

But using the same model through pi, for example, it's super smart because pi just doesn't have ANY safeguards :D


I'll take this as my sign to give Pi a shot then :D Edit: I don't want to speak too son, but this Pi thing is really growing on me so far… Thank you!

Wait until you figure out you can just say "create a skill to do..." and it'll just do it, write it in the right place and tell you to /reload

Or "create an extension to..." and it'll write the whole-ass extension and install it :D


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: