Yeah I am using the default training script with int8 quantisation. It uses peft...

int_19h · on May 16, 2023

I'm not sure about this model specifically, but training with 4-bit quantization has been a thing with LLaMA for a while now, although the setup involves manual hacks of various libraries.

freeqaz · on May 15, 2023

Is it possible to offload some layers to CPU and still train in a reasonable amount of time?

generalizations · on May 16, 2023

There’s also that pruning tool that was on hn in the last couple weeks. It seemed to work really well on the larger models, and could reduce size by 30-50%