Do we know if this is better than Nvidia Parakeet V3? That has been my go-to mod...

m1el · 2026-02-04T18:29:05 1770229745

I've been using nemotron ASR with my own ported inference, and happy about it:

https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...

https://github.com/m1el/nemotron-asr.cpp https://huggingface.co/m1el/nemotron-speech-streaming-0.6B-g...

Multicomp · 2026-02-04T19:19:59 1770232799

I'm so amazed to find out just how close we are to the start trek voice computer.

I used to use Dragon Dictation to draft my first novel, had to learn a 'language' to tell the rudimentary engine how to recognize my speech.

And then I discovered [1] and have been using it for some basic speech recognition, amazed at what a local model can do.

But it can't transcribe any text until I finish recording a file, and then it starts work, so very slow batches in terms of feedback latency cycles.

And now you've posted this cool solution which streams audio chunks to a model in infinite small pieces, amazing, just amazing.

Now if only I can figure out how to contribute to Handy or similar to do that Speech To Text in a streaming mode, STT locally will be a solved problem for me.

[1] https://github.com/cjpais/Handy

m1el · 2026-02-04T21:32:49 1770240769

you should check out

https://github.com/pipecat-ai/nemotron-january-2026/

discovered through this twitter post:

https://x.com/kwindla/status/2008601717987045382

kwindla · 2026-02-04T22:00:06 1770242406

Happy to answer questions about this (or work with people on further optimizing the open source inference code here). NVIDIA has more inference tooling coming, but it's also fun to hack on the PyTorch/etc stuff they've released so far.

pstroqaty · 2026-02-05T07:35:27 1770276927

Thank you for sharing! Does your implementation allow running the Nemotron model on Vulkan? Like whisper.cpp? I'm curious to try other models, but I don't have Nvidia, so my choices are limited.

d4rkp4ttern · 2026-02-05T02:44:08 1770259448

I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.

https://github.com/cjpais/Handy

tylergetsay · 2026-02-04T17:01:15 1770224475

I've been using Parakeet V3 locally and totally ancedotaly this feels more accurate but slightly slower

czottmann · 2026-02-04T17:54:16 1770227656

I liked Parakeet v3 a lot until it started to drop whole sentences, willy-nilly.

cypherpunks01 · 2026-02-04T22:29:14 1770244154

Yeah, I think the multilingual improvements in V3 caused some kind of regression for English - I've noticed large blocks occasionally dropped as well, so reverted to v2 for my usage. Specifically nvidia/parakeet-tdt-0.6b-v2 vs nvidia/parakeet-tdt-0.6b-v3

d4rkp4ttern · 2026-02-05T02:46:28 1770259588

I didn’t see that but I do get a lot of stutters (words or syllables repeated 5+ times), not sure if it’s a model problem or post processing issue in the Handy app.

WXLCKNO · 2026-02-05T00:56:54 1770253014

Oh god am I glad to read this. Thought it was my microphone or something.

moffkalast · 2026-02-04T22:34:22 1770244462

Parakeet is really good imo too, and it's just 0.6B so it can actually run on edge devices. 4B is massive, I don't see Voxtral running realtime on an Orin or fitting on a Hailo. An Orin Nano probably can't even load it at BF16.

whinvik · 2026-02-04T18:01:27 1770228087

Came here to ask the same question!