More

cmrdporcupine · 2026-02-06T19:11:03 1770405063

I know we're not supposed to complain about comment quality, but -- I came here to look for interesting technical analysis but instead it's Slashdot level snipes about Microsoft the company. And yes, I also dislike Windows and Microsoft generally but this looks like a very interesting project and I'm frankly frustrated at the level of discussion here, it's juvenile. This has nothing to do with Windows, and it looks like most people didn't even read past the title.

I'll play with this later today after work and see how mature it is and hopefully have something concrete and constructive to say. Hopefully others will, too.

BrouteMinou · 2026-02-06T21:04:48 1770411888

I am with you on that. HN is becoming a "14 years old edgy mini-tech" Facebook.

"Microsoft bad, Linux good" kind of comments are all over the place. There is no more in depth discussions about projects anymore. Add the people linking their blogs only to sell you thier services for an imaginary problem, and you get HN 2026.

It's maybe the time to find another tech media. If you know one, I would be glad to know.

cmrdporcupine · 2026-02-06T17:46:42 1770400002

I never saw this movie back in the day, but now I want to.

Just listening to Halcyon & On & On is putting a lump in my chest. That era in time was just so fantastic and I don't think it's just because I was 21 and utopian.

I think I could perma stay in 1995/96, Groundhog Day style. Just relive those same "halcyon" days over and over perfecting and absorbing everything over and over.

"We have to go back!"

cmrdporcupine · 2026-02-05T01:40:49 1770255649

Honestly, it's about creating guardrails, and forcing the machine to stay within them.

You create the pattern. You describe the constraints. And then make it do the gruntwork.

If it's starting from nothing, and you let it be "creative" you will hate the results.

It's just a tool like any other. Hold the power drill firmly in hand and make sure you have your safety goggles on when playing with the band saw.

cmrdporcupine · 2026-02-03T23:59:54 1770163194

I tried the FP8 in vLLM on my Spark and although it fit in memory, I started swapping once I actually tried to run any queries, and, yeah, could not have a context larger than 8k.

I figured out later this is because vLLM apparently de-quantizes to BF16 at runtime, so pointless to run the FP8?

I get about 30-35 tok/second using llama.cpp and a 4-bit quant. And a 200+k context, using only 50GB of RAM.

justaboutanyone · 2026-02-04T00:55:34 1770166534

Running llama.cpp rather than vLLM, it's happy enough to run the FP8 variant with 200k+ context using about 90GB vram

cmrdporcupine · 2026-02-04T02:08:06 1770170886

yeah, what did you get for tok/sec there though? Memory bandwidth is the limitation with these devices. With 4 bit I didn't get over 35-39 tok/sec, and averaged more like 30 when doing actual tool use with opencode. I can't imagine fp8 being faster.

cmrdporcupine · 2026-02-03T21:52:52 1770155572

Nobody has been saying they'd be dethroned. We're saying they're often "good enough" for many use cases, and that they're doing a good job of stopping the Big Guys from creating a giant expensive moat around their businesses.

Chinese labs are acting as a disruption against Altman etcs attempt to create big tech monopolies, and that's why some of us cheer for them.

Der_Einzige · 2026-02-04T05:05:28 1770181528

"Nobody says X" is as presumptuous and wrong (both metaphorically and literally) as "LLMs can't do X". It is one of the worst thought terminating cliches.

Thousands have been saying this, you aren't paying attention.

cmrdporcupine · 2026-02-04T05:25:29 1770182729

As thought terminating as "HN Thought [insert strawman here]"

C'mon.

cmrdporcupine · 2026-02-03T21:51:19 1770155479

It feels more like Haiku level than Sonnet 4.5 from my playing with it.

cmrdporcupine · 2026-02-03T21:08:05 1770152885

This is going to be a crazy month because the Chinese labs are all trying to get their releases out prior to their holidays (Lunar New Year / Spring Festival).

So we've seen a series of big ones already -- GLM 4.7 Flash, Kimi 2.5, StepFun 3.5, and now this. Still to come is likely a new DeepSeek model, which could be exciting.

And then I expect the Big3, OpenAI/Google/Anthropic will try to clog the airspace at the same time, to get in front of the potential competition.

cmrdporcupine · 2026-02-03T21:05:06 1770152706

And it's worth pointing out that Claude Code now dispatches "subagents" from Opus->Sonnet and Opus->Haiku ... all the time, depending on the problem.

Running this thing locally on my Spark with 4-bit quant I'm getting 30-35 tokens/sec in opencode but it doesn't feel any "stupider" than Haiku, that's for sure. Haiku can be dumb as a post. This thing is smarter than that.

It feels somewhere around Sonnet 4 level, and I am finding it genuinely useful at 4-bit even. Though I have paid subscriptions elsewhere, so I doubt I'll actually use it much.

I could see configuration OpenCode somehow to use paid Kimi 2.5 or Gemini for the planning/analysis & compaction, and this for the task execution. It seems entirely competent.

cmrdporcupine · 2026-02-03T19:18:37 1770146317

I'm getting similar numbers on NVIDIA Spark around 25-30 tokens/sec output, 251 token/sec prompt processing... but I'm running with the Q4_K_XL quant. I'll try the Q8 next, but that would leave less room for context.

I tried FP8 in vLLM and it used 110GB and then my machine started to swap when I hit it with a query. Only room for 16k context.

I suspect there will be some optimizations over the next few weeks that will pick up the performance on these type of machines.

I have it writing some Rust code and it's definitely slower than using a hosted model but it's actually seeming pretty competent. These are the first results I've had on a locally hosted model that I could see myself actually using, though only once the speed picks up a bit.

I suspect the API providers will offer this model for nice and cheap, too.

aseipp · 2026-02-03T19:33:57 1770147237

llama.cpp is giving me ~35tok/sec with the unsloth quants (UD-Q4_K_XL, elsewhere in this thread) on my Spark. FWIW my understanding and experience is that llama.cpp seems to give slight better performance for "single user" workloads, but I'm not sure why.

I'm asking it to do some analysis/explain some Rust code in a rather large open source project and it's working nicely. I agree this is a model I could possibly, maybe use locally...

cmrdporcupine · 2026-02-03T20:06:51 1770149211

Yeah I got 35-39tok/sec for one shot prompts, but for real-world longer context interactions through opencode it seems to be averaging out to 20-30tok/sec. I tried both MXFP4 and Q4_K_XL, no big difference, unfortunately.

--no-mmap --fa on options seemed to help, but not dramatically.

As with everything Spark, memory bandwidth is the limitation.

I'd like to be impressed with 30tok/sec but it's sort of a "leave it overnight and come back to the results" kind of experience, wouldn't replace my normal agent use.

However I suspect in a few days/weeks DeepInfra.com and others will have this model (maybe Groq, too?), and will serve it faster and for fairly cheap.

cmrdporcupine · 2026-02-02T18:49:56 1770058196

Almost wonder if this kind of thing will be an impetus for GNU Hurd to get more momentum. I saw an update recently that they're now finally properly supporting 64bit and sounds like there's active dev going on there again.

It apparently uses SysVInit

cf100clunk · 2026-02-02T19:00:33 1770058833

Others have been reminding us of the *BSD init systems, and I remind that SysVinit is not going away from Linux while projects like Devuan and others continue. GNU Hurd is another other-than-systemd learning opportunity.

frumplestlatz · 2026-02-02T19:06:12 1770059172

I would somewhat doubt it; the negative aspects of Mach’s design are a technical albatross around the neck of any kernel.

Apple has had to invest reams of engineering effort in mitigating Mach’s performance and security issues in XNU; systemd dissatisfaction alone seems unlikely to shift the needle towards Hurd.

antonyh · 2026-02-02T18:54:39 1770058479

I've heard of Hurd, but never felt tempted to try it. That could be an interesting option.

raggi · 2026-02-02T19:07:40 1770059260

hurd init is a lot like systemd architecturally, it just gets to use kernel provided ipc rather than having to manage its own. if your objection to systemd is its architecture you don't want anything to do with hurd.

tokyobreakfast · 2026-02-02T19:13:35 1770059615

Did they finally add USB support?