Working with and communicating with offshored teams is a specific skill too.
There are tips and tricks on how to manage them and not knowing them will bite you later on. Like the basic thing of never asking yes or no questions, because in some cultures saying "no" isn't a thing. They'll rather just default to yes and effectively lie than admit failure.
Of course, but the agent can't run a code block in a readme.
It _can_ run a PEP723 script without any specific setup (as long as uv and python are installed). It will automatically create a virtual environment AND install all dependencies. All with a single command without polluting the context with tons of setup.
Mode like open local models are becoming "good enough".
I got stuff done with Sonnet 3.7 just fine, it did need a bunch of babysitting, but still it was a net positive to productivity. Now local models are at that level, closing up on the current SOTA.
When "anyone" can run an Opus 4.5 level model at home, we're going to be getting diminishing returns from closed online-only models.
I'm just riding the VC powered wave of way-too-cheap online AI services and building tools and scaffolding to prepare for the eventual switch to local models =)
You can configure aider that way. You get three, in fact: an architect model, a code editor model, and a quick model for things like commit messages. Although I'm not sure if it's got doc searching capabilities.
That's what Meta thought initially too, training codellama and chat llama separately, and then they realized they're idiots and that adding the other half of data vastly improves both models. As long as it's quality data, more of it doesn't do harm.
Besides, programming is far from just knowing how to autocomplete syntax, you need a model that's proficient in the fields that the automation is placed in, otherwise they'll be no help in actually automating it.
But as far as I know, that was way before tool calling was a thing.
I'm more bullish about small and medium sized models + efficient tool calling than I'm about LLMs too large to be run at home without $20k of hardware.
The model doesn't need to have the full knowledge of everything built into it when it has the toolset to fetch, cache and read any information available.
Tesla was a decent car with a very good computer in it.
They never bothered to improve on the car part, causing Teslas across the western world to fail inspections at staggering rates when the very basic car bits couldn't handle the torque of an EV.
Now old manufacturers have caught up on the computer front and China is blowing past at crazy rates and Tesla is legitimately in trouble.
The very high profile CEO cosplaying as an efficiency edgelord with the american president didn't help the company's image at all either.
MacOS is unix under the hood so the models can just use bash and cli tools easily instead of dealing with WSL or Powershell.
MacOS has built-in sandboxing at a better level than Windows (afaik the Codex App is delayed for Windows due to sandboxing complexities)
Also the vast majority of devs use MacBooks unless they work for Microsoft or are in a company where the vast majority of employees are locked to Windows for some reason (usually software related).
reply