Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They don't claim to support Polish, but they do support Russian.

> The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.

I wonder how much having languages with the same roots (e.g. the romance languages in the list above or multiple Slavic languages) affects the parameter count and the training set. Do you need more training data to differentiate between multiple similar languages? How would swapping, for example, Hindi (fairly distinct from the other 12 supported languages) for Ukrainian and Polish (both share some roots with Russian) affect the parameter count?





Nobody ever supports Polish. It's the worst. They'll support like, ̵Swahili, but not Polish.

edit: I stand corrected lol. I'll go with "Gaelic" instead.


Swahili is subcontinental lingua franca spoken by 200M people and growing quickly. Polish is spoken by a shrinking population in one country where English is understood anyways.

> where English is understood anyways.

It's popular. But not that popular - you couldn't assume a random person over 30yo on the street would be able to have a chat.


200 million people speak Swahili.

39 million people speak Polish, and most of those also speak English or another more common language.


You could say the same about Dutch to be fair. 90-95% speak English - I bet that's way higher than in Poland.

As an American, my perspective is that Dutch people speak better English than a large percentage of English people and Americans.

As a Dutch person, I'm very doubtful that's the case, but I'm willing to bet a good ESL speaker is more aware of common grammatical errors than some native speakers. For example, the your/you're mixup makes no sense if you've had to explicitly learn about English contractions in the first place.

Heh, based on my incorrect and probably wrong experience Dutch and Swedes are the best non-native english speakers in term of both the accent and fluency.

Those and Icelandic people. But there's a fun correlation - see how much the US media content is played compared to local one per country. And which countries use subs rather than dubs or voiceovers in cinemas and TV. https://publications.europa.eu/resource/cellar/e4d5cbf4-a839...

If you have exposure to English media from young age and don't get a translation, you learn pretty quickly.


Just a side note to remember that this is a mini model. It's very small and yet 12 languages.

I guess a European version can be created but now it's aimed at a world wide distribution.


I guess I will check Korean. OpenAI audio mini is not bad but I always have to make gpt to check and fix transcription.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: