Have you ran the model in full FP16? It is possible a lot of performance is lost...

		tarruda on Sept 5, 2024 \| parent \| context \| favorite \| on: Yi-Coder: A Small but Mighty LLM for Code Have you ran the model in full FP16? It is possible a lot of performance is lost when running quantized versions.