> On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by...

andersa · 2025-06-06T00:53:39 1749171219

TensorRT-LLM being open source is a lie, all the important kernels are loaded from cubins.

nabakin · 2025-06-07T20:47:22 1749329242

Yeah you're right (although, they started to open source some of that recently iirc). I meant SOTA for inference engines we can actually download and use ourselves.

qeternity · 2025-06-06T07:23:16 1749194596

It also appears that this was a sampling benchmark...which is not representative.

Generation benchmark was 5% faster than SGLang.