A 3090 (or any GPU with >=20GB VRAM) can run StarCoder with int8 quantization at about 12 tokens per second, 33 with assisted generation -- which will come out for StarCoder in the coming days.
When 4-bit quantization comes out, I would expect a GPU with 12GB VRAM to be able to run it.
When 4-bit quantization comes out, I would expect a GPU with 12GB VRAM to be able to run it.
Disclaimer: I work at Hugging Face