Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.

Looks like they don't compare to TensorRT-LLM throughput numbers which, last I checked, are SOTA in open source.



TensorRT-LLM being open source is a lie, all the important kernels are loaded from cubins.


Yeah you're right (although, they started to open source some of that recently iirc). I meant SOTA for inference engines we can actually download and use ourselves.


It also appears that this was a sampling benchmark...which is not representative.

Generation benchmark was 5% faster than SGLang.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: