Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are hardware requirements to run this?

I see the mixture model is ~ 300 GB and was trained on 256 GPUs.

I assume distilled versions can easily be run on one GPU.



We release several smaller models as well: https://github.com/facebookresearch/fairseq/tree/nllb/exampl... that are 1.3B and 615M parameters. These are usable on smaller GPUs. To create these smaller models but retain good performance, we use knowledge distillation. If you're curious to learn more, we describe the process and results in Section 8.6 of our paper: https://research.facebook.com/publications/no-language-left-...


"All models are licensed under CC-BY-NC 4.0" :

So, to clarify, does this mean that companies cannot use these models in the course of business, or is it more about selling the translation results directly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: