Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing I'm looking forward to most is having Flash Attention built-in. Right now you have to use xformers or similar, but that dependency has been a nightmare to use, from breaking, to requiring specific concoctions of installing dependencies or else conda will barf, to being impossible to pin because I have to use -dev releases which they constantly drop from the repositories.

PyTorch 2.0 comes with a few different efficient transformer implementations built-in. And unlike 1.13, they work during training and don't require specific configurations. Seemed to work just fine during my pre-release testing. Also, having it built into PyTorch might mean more pressure to keep it optimized. As-is xformers targets A100 primarily, with other archs as an afterthought.

And, as promised, `torch.compile` worked out of the box, providing IIRC a nice ~20% speed up on a ViT without any other tuning.

I did have to do some dependency fiddling on the pre-release version. Been looking forward to the "stable" release before using it more extensively.

Anyone else seeing nice boosts from `torch.compile`?



I really wish compiling cuda extensions worked better out of the box. Is there a reason they can't bundle nvcc alongside pytorch outside of complexity/expense?


Legal reasons.

Filesize.

Platform compatibility.


Interesting, I had not considered these points outside of file size! Do you think it is possible they will be overcome or is the chance 0?


I work on xFormers and we definitely appreciate the candid feedback:

- We partnered with our PyTorch colleagues and some of the PyTorch 2.0 kernels for efficient attention actually originated from xFormers, so glad to read that having this now built-in into PyTorch is something users are really eager to use.

- While xFormers was originally targeting a pure researcher audience, we were aware of the installation problems: we started end of last year gradually making it easier to setup and use the library (both internally and externally). We have recently introduced non-dev conda packages, pip wheels and are also trying to release more often,

- We very much welcome hearing about any issue with the library and would certainly love discussing more the specifics of your experience (or others' who read this) if you have time (maybe via our GitHub to start with). Thanks again for the feedback here!


What size of ViT? I’ve tried it with both a unet and an LM and didn’t see any benefit with the default args (and got a CUDA error after 30 mins of processing trying to compile an AR generation routine with all optimization turned on).


B/16




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: