Need some advice:
I'm looking for an open source text-to-speech library I can run locally or in a server. I needs to be pretty pleasant to listen to even if generation takes a bit longer. Ideally #python as the language but I'm open to looking around.
Recommendations? Found StyleTTS2 so far.
=> More informations about this toot | More toots from mkennedy@fosstodon.org
@mkennedy What about https://github.com/rhasspy/piper ?
=> More informations about this toot | More toots from dis0@c.im
@dis0 Looks good, thanks!
=> More informations about this toot | More toots from mkennedy@fosstodon.org
@mkennedy it's been 20+ years since I used it, but Festival was really nice back then.
github.com/festvox/festival
=> More informations about this toot | More toots from nyquildotorg@fedia.social
@mkennedy I remember there being some really excellent existing voices, and the ability to create one of your own by recording yourself reading "training" phrases
=> More informations about this toot | More toots from nyquildotorg@fedia.social
@nyquildotorg @mkennedy I tried to get festival working on Ubuntu a couple of weeks ago, and it was very hard work. This is an unmaintained academic project from decades ago. Many central sites and downloadable voices are 404
=> More informations about this toot | More toots from tartley@mastodon.social
@mkennedy I tried this myself recently, and found it harder than I expected. Lots of people saying "just use one of the many amazing modern AI generated systems, download a voice you like, and you are good". But the options I tried were all a mess and hard to install.
I settled on piper, which only works against Python <= 3.10 (eye roll). I have dead snakes ppa installed so this Python is just an apt install for me...
=> More informations about this toot | More toots from tartley@mastodon.social
@mkennedy On Ubuntu/Pop!OS, I ended up writing my own script to wrap the installation of piper and its dependencies, downloading a voice, and the ultimate invocation:
https://github.com/tartley/dotfiles/blob/main/bin/say
=> More informations about this toot | More toots from tartley@mastodon.social
@mkennedy I would love to hear if there are better options or I'm doing it wrong.
=> More informations about this toot | More toots from tartley@mastodon.social
@mkennedy It's possible to make a less clumsy invocation of piper in bash which pipes the output directly into aplay, but this fails if your text to speak includes a period, because piper does some sort of seek on the output if it contains multiple sentences.
=> More informations about this toot | More toots from tartley@mastodon.social
@mkennedy (eye roll again)
=> More informations about this toot | More toots from tartley@mastodon.social
@tartley Thanks for all of this Jonathan! I'm starting to get the same feeling of clumsy academic projects that aren't installable or very hard to do so.
=> More informations about this toot | More toots from mkennedy@fosstodon.org
@mkennedy mu timeline just by cponcidence has an article about "audiogenipy".
=> More informations about this toot | More toots from gwidion@floss.social
@ChristineMalec maybe you know someone who can help? It's geek to me, LOL.
/cc @mkennedy
Edit: oh, what a missed opportunity! I have now changed "greek" to "geek" 🤦♀️
=> More informations about this toot | More toots from deborahh@cosocial.ca
@mkennedy this sounds very @KathyReid areas
=> More informations about this toot | More toots from neoluddite@aus.social
@neoluddite @mkennedy thx for the tag - have you seen
https://github.com/coqui-ai/TTS
Also this post - very out of date as Mozilla TTS is no longer supported and hasn't been for 3 years.
The work Mike Hansen did for Mimic 3 is now in Home Assistant, and I have been very impressed by it
https://www.datacamp.com/blog/best-open-source-text-to-speech-tts-engines
=> More informations about this toot | More toots from KathyReid@aus.social
@KathyReid @neoluddite Thanks! I did check that one out. But decided against it because their company's website has this message as the H1. :(
Coqui isshutting down.
Thank you for all your support! ❤️
I don't want to build on a foundation that's about to vanish.
=> More informations about this toot | More toots from mkennedy@fosstodon.org
@mkennedy @neoluddite fair point - there has also been a general decline in open source TTS across the board, similar to STTvwith the release of Whisperer
Interested to hear what you decide on.
=> More informations about this toot | More toots from KathyReid@aus.social
@mkennedy 🤗
Going ML, you can pick one of leading open source model https://huggingface.co/spaces/TTS-AGI/TTS-Arena
=> More informations about this toot | More toots from proteusiq@fosstodon.org This content has been proxied by September (ba2dc).Proxy Information
text/gemini