Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm hoping torch.compile is a gateway to "easy" non-Nvidia accelerator support in PyTorch.

Also, I have been using torch.compile for the Stable Diffusion unet/vae since February, to good effect. I'm guessing similar optimizations will pop up for LLaMA.



Is there somewhere I can see your Stable Diffusion + torch.compile code? I am interesting in how you integrated.


In `diffusers` implementations (like InvokeAI) its pretty easy: https://github.com/huggingface/diffusers/blob/42beaf1d23b5cc...

But I also compile the VAE and some other modules, I will reply again later when I can look at my local code. Some modules (like face restoration or the scheduler) still dont like torch.compile.

For the Automatic1111 repo (and presumably other original Stability AI implementations), I just add `m.model = torch.compile(m.model)` here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob...

I tried changing the options in the config dict one by one, but TBH nothing seems to make a significant difference behind the default settings in benchmarks.

I haven't messed with compiling LORA training yet, as I dont train much and it is sufficiently fast, but I'm sure it could be done.


Here is the InvokeAI code, minus the codeformer/gfpgan changes that dont work yet:

https://gist.github.com/brucethemoose/ea64f498b0aa51adcc88f5...

I intend to start some issues for this on the repo soon(TM).


Could you give a bit more details about this? Do you have a link?


See the above reply ^




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: