Ancestors

Toot

Written by hok@lemmy.dbzer0.com on 2024-12-07 at 14:39

Fish Speech 1.5, an open source voice cloning TTS that's actually good

https://lemmy.dbzer0.com/post/32923869

=> More informations about this toot | More toots from hok@lemmy.dbzer0.com

Descendants

Written by PriorityMotif@lemmy.world on 2024-12-07 at 14:56

For a minute I thought there were actually recordings of fish noises from underwater and that someone has put them into TTS.

=> More informations about this toot | More toots from PriorityMotif@lemmy.world

Written by Hule@lemmy.world on 2024-12-08 at 06:48

But their logo is a whale!

=> More informations about this toot | More toots from Hule@lemmy.world

Written by PerogiBoi@lemmy.ca on 2024-12-07 at 15:54

How do you run this locally? What program does one use? I know you can take LLM models and throw them into ollama or gpt4all. What about this?

=> More informations about this toot | More toots from PerogiBoi@lemmy.ca

Written by hok@lemmy.dbzer0.com on 2024-12-07 at 17:07

I followed their instructions here: speech.fish.audio

I am using the API server to do inference: speech.fish.audio/inference/#http-api-inference

I don’t know about other ways. To be clear, this is not (necessarily) an LLM, it’s just for speech synthesis, so you don’t run it on ollama. That said I think it does technically use Llama under the hood since there are two models, one for encoding text and the other for decoding to audio. Honestly the paper is terrible but it explains the architecture somewhat: arxiv.org/pdf/2411.01156

=> More informations about this toot | More toots from hok@lemmy.dbzer0.com

Written by modulus on 2024-12-07 at 22:01

From the link:

You should mention that the content is released under a CC BY-NC-SA 4.0 licence.

So which is it, open source or CC-BY-NC-SA? NC restrictions are not compatible with either the free software or the open source definitions.

=> More informations about this toot | More toots from modulus@lemmy.ml

Written by hok@lemmy.dbzer0.com on 2024-12-08 at 19:12

You are right. Their description of “SOTA Open Source TTS” caused me to assume it was open source, but it’s clear that

This codebase and all models are released under CC-BY-NC-SA-4.0 License.

So, it’s “source available” and not released under a permissive licence.

=> More informations about this toot | More toots from hok@lemmy.dbzer0.com

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113612052662192595
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
274.193103 milliseconds
Gemini-to-HTML Time
1.279379 milliseconds

This content has been proxied by September (3851b).