An application with the voices of over 40 celebrities, mimicked by artificial intelligence.
/ task
To develop voice simulation technology, overlay impersonation functionality on it. Package it into a mobile application and release it in a short period of time.
/ Solution
The neural network was initially trained on a huge amount of data (more than 30 hours of speech recordings) of one particular speaker who is not a famous person. Further training on the data of the selected celebrity lasts no more than 2 hours. In essence, the neural network has "its" voice and overlays the intonations and speech features of the person being parodied on top of it. Synthesis technologies are packaged in a mobile application based on React Native.
In 3 months we built an MVP and managed to occupy a new niche. Saved 50% development budget by choosing cross-platform technology and speed. At the moment the application has been downloaded by more than 9 million users.
/ technologies
Python, React
NLP frontend, which includes neural network text normalization and a model for pause and accent placement. Tacotron 2, which takes letters as input. Autoregressive WaveNet running in real time on the CPU.