Google Gemini – Text to Cloned Voice

I created a new app with Google Gemini called the “Monk’s Insight Generator” – it takes a news story (or any piece of content) and turns it into a two-liner by a witty monk. After that, the app turns the text into speech using my cloned voice. It’s my second attempt to make an “agentic” app with multiple features.

I made this because a client had requested a similar proof of concept for an upcoming talk. In order to make the app work, I had to plug in the Gemini API (for text processing) and the Elevenlabs API (for audio processing). I had cloned my voice last year with Elevenlabs and it was finally put to good use here.

Frankly, I didn’t expect that Gemini could make the app work, but it really surprised me. Gemini also taught me how to package this as a local app on my computer, instead of a web app. It was no walk in the park though – the whole coding took about 5 hours because I built it bit by bit, and I kept tweaking different visual and audio settings.

I could have made this more “agentic” by having the app do both text and audio processing at one go, but I decided to break up the two functions so I could show this demo in stages.

Unfortunately, I’m not able to share the app with you because the app contains my API keys (which cost money to use), so here’s the demo video instead!