Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These benchmarks are on Ampere, where FA3 has no performance benefits over FA2.

On Hopper, FlexAttention is currently about 80% of FlashAttention3's performance (about 500 TFLOPs peak)



Not bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: