Ancestors

Toot

Written by h4sh on 2025-01-03 at 04:00

This is actually an insane amount of gaslighting from #LLM.

The #Whisper transcription model automatically translates the speech of the user after detecting another language or accent, then the output of the model translates the speech of the user into that language even if the person is speaking English! Basically translation all transcriptions in the wrong language, but with correct meaning.

I'm just trying to transcribe some subtitles for #38c3 talks with English audio and it's giving me German, with no way to turn this off because it's baked into the model.

https://community.openai.com/t/whisper-is-translating-my-audios-for-some-reason/86468

=> More informations about this toot | More toots from h4sh@infosec.exchange

Descendants

Written by h4sh on 2025-01-03 at 04:02

Can anyone recommend a local-only transcription software that doesn't use LLMs?

=> More informations about this toot | More toots from h4sh@infosec.exchange

Written by ron on 2025-01-03 at 07:23

@h4sh You can use Open Whisper and their local models, so nothing leaves your device: https://github.com/openai/whisper

For Mac I can recommend https://goodsnooze.gumroad.com/l/macwhisper as an easy-to-use version of it.

If you want to avoid LLMs completely (even offline) I suppose you won’t find comparative solutions.

=> More informations about this toot | More toots from ron@chaos.social

Written by h4sh on 2025-01-03 at 07:28

@ron Yea MacWhisper is where I encountered the bug - the OpenAI forum links were users encountering the same issue with Whisper. Using the Turbo model instead of the older "small" Whisper model fixed the issue

Kind of sad that there aren't good non LLM alternatives offline but I guess translation software has always been based on machine learning and transformers so it makes sense that the latest ones are all LLMs

=> More informations about this toot | More toots from h4sh@infosec.exchange

Written by ron on 2025-01-03 at 07:35

@h4sh Oh, I see. Thought the issue reports refer to the OpenAI API only. So my idea was that the older/smaller local models might be less „clever“.

Good to know that using another model helps mitigating this issue.

=> More informations about this toot | More toots from ron@chaos.social

Written by h4sh on 2025-01-03 at 07:40

@ron Yea.. I think using the OpenAI API for Whisper is just using the same models they open source online. Compared to ChatGPT, Whisper is a lot more special-purposed and lightweight (it doesn't take more than 5% of my macbook's battery and 10 minutes to transcribe a 1 hour long audio).

Where as the new o1 and o3 model would probably burn up a third of my battery just for one query (if it even runs locally)

=> More informations about this toot | More toots from h4sh@infosec.exchange

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113762422898001387
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 536.438642 milliseconds
Gemini-to-HTML Time: 1.115767 milliseconds

This content has been proxied by September (3851b).