Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was done already here as well: https://arxiv.org/abs/2507.04239




Sounds interesting, but...

> these models dominate both exponential attention and linear attention at long-context training

There is no exponential attention; standard attention is quadratic. Strange mistake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: