Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The analogy I like the most is that they've found the "shape" of languages in high dimensions, and if you rotate the shape for English the right way, you get an unreasonably good fit for the shape of Spanish, again for all the other languages.

We're at a point where it's now possible to determine the shape of every language, provided there are enough speakers of the language left who are both able and willing to help.

<Snark> Once done, Facebook can then commodify their dissent, and sell it back to them in their native language. </Snark>



Anyone who knows or is learning another language can easily tell you that the "warping" methodology of MTL is insufficient. There was a really good video by Tom Scott [1] that talked about this but the short version is that there is critical bits of language in context and inferred by speakers. Any accurate MTL needs nearly full context both on the page and in the cultural moment, in addition to probably needing to ask questions of the author.

[1]: https://www.youtube.com/watch?v=GAgp7nXdkLU


So, if I had a corpus of all the literature from 1800-1850 digitized, the context would be sufficiently different as to be a new language?

It seems to me that the happy accident of doing this research at the start of getting all human knowledge digitized is part of the unreasonable effectiveness of this overall technique.

Had it happened in 200 years, it might not have worked, right?


Darmok and Jalad at Tanagra.


The shape analogy doesn't really apply with modern language models. Each word gets its own context dependent high dimensional point. With everything being context dependent, simple transformations like rotations are impossible. A more accurate perception is that any concept expressible in language now has its own high dimensional representation, which can then be decoded into any other language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: