But 40gb? I feel like that includes training data or something. A model can't just be 40 GB or else all of the audio would have to be passed through all 40gb of the model during inference. That seems huge.
40GB is maybe too much, but when building the docker images there is some space wasted. The image size after building is around 20GB (12GB marytts, 3GB kaldi de, 6GB kaldi en)