Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A 3090 (or any GPU with >=20GB VRAM) can run StarCoder with int8 quantization at about 12 tokens per second, 33 with assisted generation -- which will come out for StarCoder in the coming days.

When 4-bit quantization comes out, I would expect a GPU with 12GB VRAM to be able to run it.

Disclaimer: I work at Hugging Face



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: