Ancestors

Toot

Written by Michael Kennedy on 2025-01-02 at 03:27

Need some advice:

I'm looking for an open source text-to-speech library I can run locally or in a server. I needs to be pretty pleasant to listen to even if generation takes a bit longer. Ideally #python as the language but I'm open to looking around.

Recommendations? Found StyleTTS2 so far.

=> More informations about this toot | More toots from mkennedy@fosstodon.org

Descendants

Written by dis0x on 2025-01-02 at 03:32

@mkennedy What about https://github.com/rhasspy/piper ?

=> More informations about this toot | More toots from dis0@c.im

Written by Michael Kennedy on 2025-01-02 at 05:30

@dis0 Looks good, thanks!

=> More informations about this toot | More toots from mkennedy@fosstodon.org

Written by Jer Warren on 2025-01-02 at 04:37

@mkennedy it's been 20+ years since I used it, but Festival was really nice back then.

github.com/festvox/festival

=> More informations about this toot | More toots from nyquildotorg@fedia.social

Written by Jer Warren on 2025-01-02 at 04:39

@mkennedy I remember there being some really excellent existing voices, and the ability to create one of your own by recording yourself reading "training" phrases

=> More informations about this toot | More toots from nyquildotorg@fedia.social

Written by Jonathan Hartley on 2025-01-02 at 13:39

@nyquildotorg @mkennedy I tried to get festival working on Ubuntu a couple of weeks ago, and it was very hard work. This is an unmaintained academic project from decades ago. Many central sites and downloadable voices are 404

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Jonathan Hartley on 2025-01-02 at 13:46

@mkennedy I tried this myself recently, and found it harder than I expected. Lots of people saying "just use one of the many amazing modern AI generated systems, download a voice you like, and you are good". But the options I tried were all a mess and hard to install.

I settled on piper, which only works against Python <= 3.10 (eye roll). I have dead snakes ppa installed so this Python is just an apt install for me...

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Jonathan Hartley on 2025-01-02 at 13:47

@mkennedy On Ubuntu/Pop!OS, I ended up writing my own script to wrap the installation of piper and its dependencies, downloading a voice, and the ultimate invocation:

https://github.com/tartley/dotfiles/blob/main/bin/say

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Jonathan Hartley on 2025-01-02 at 13:48

@mkennedy I would love to hear if there are better options or I'm doing it wrong.

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Jonathan Hartley on 2025-01-02 at 13:53

@mkennedy It's possible to make a less clumsy invocation of piper in bash which pipes the output directly into aplay, but this fails if your text to speak includes a period, because piper does some sort of seek on the output if it contains multiple sentences.

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Jonathan Hartley on 2025-01-02 at 13:53

@mkennedy (eye roll again)

=> More informations about this toot | More toots from tartley@mastodon.social

Written by Michael Kennedy on 2025-01-02 at 15:40

@tartley Thanks for all of this Jonathan! I'm starting to get the same feeling of clumsy academic projects that aren't installable or very hard to do so.

=> More informations about this toot | More toots from mkennedy@fosstodon.org

Written by João S. O. Bueno on 2025-01-02 at 14:19

@mkennedy mu timeline just by cponcidence has an article about "audiogenipy".

=> More informations about this toot | More toots from gwidion@floss.social

Written by Deborah Hartmann Preuss, pcc on 2025-01-02 at 22:57

@ChristineMalec maybe you know someone who can help? It's geek to me, LOL.

/cc @mkennedy

Edit: oh, what a missed opportunity! I have now changed "greek" to "geek" 🤦‍♀️

=> More informations about this toot | More toots from deborahh@cosocial.ca

Written by neoluddite on 2025-01-02 at 23:05

@mkennedy this sounds very @KathyReid areas

=> More informations about this toot | More toots from neoluddite@aus.social

Written by Kathy Reid on 2025-01-02 at 23:12

@neoluddite @mkennedy thx for the tag - have you seen

https://github.com/coqui-ai/TTS

Also this post - very out of date as Mozilla TTS is no longer supported and hasn't been for 3 years.

The work Mike Hansen did for Mimic 3 is now in Home Assistant, and I have been very impressed by it

https://www.datacamp.com/blog/best-open-source-text-to-speech-tts-engines

=> More informations about this toot | More toots from KathyReid@aus.social

Written by Michael Kennedy on 2025-01-02 at 23:18

@KathyReid @neoluddite Thanks! I did check that one out. But decided against it because their company's website has this message as the H1. :(

Coqui isshutting down.

Thank you for all your support! ❤️

I don't want to build on a foundation that's about to vanish.

=> More informations about this toot | More toots from mkennedy@fosstodon.org

Written by Kathy Reid on 2025-01-03 at 00:30

@mkennedy @neoluddite fair point - there has also been a general decline in open source TTS across the board, similar to STTvwith the release of Whisperer

Interested to hear what you decide on.

=> More informations about this toot | More toots from KathyReid@aus.social

Written by Prayson W. Daniel on 2025-01-06 at 19:08

@mkennedy 🤗

Going ML, you can pick one of leading open source model https://huggingface.co/spaces/TTS-AGI/TTS-Arena

=> More informations about this toot | More toots from proteusiq@fosstodon.org

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113756632582656961
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
398.019494 milliseconds
Gemini-to-HTML Time
3.358128 milliseconds

This content has been proxied by September (ba2dc).