1. Record dry vocals
Dry vocals means recording vocals with as much clarity and fidelity as possible. Ditch any sort of fancy effects like reverb or delay and instead aim for simplicity. If you are recording samples using your own software, ensure you or your chosen vocalist sing or rap into the microphone without affectation, that is, attempting to replicate accents or other vocal mannerisms. The aim should be to provide recordings that are as representative of the vocalist themselves as possible, demonstrating their voice to the full. Not sure what recording software to use? Windows offers Sound Recorder out of the box, and on Apple devices you will have access to Voice Memos. For a more curated recording experience, we recommend free software like Audacity which will allow you to edit your recording afterwards.
2. Provide variation in your recording
We recommend starting with at least 10 minutes of vocals and then testing to see whether the results work for you. You can always delete an existing model and train a new one with additional samples. As for variation, we suggest you provide several samples that are 1-3 minutes long, and include different sentences and vocal registers. Done well, this will provide the AI with enough diversity of information to create a high-quality model.
3. Ensure a quality recording
The better quality vocals you can provide, the better the training output will be. A quality recording should have a minimum of distortion, compression and background noise. Vocals should be clear and without any slurring which could muddy the results. You don't need to have an expensive microphone in order to do this (although of course if you do have a decent microphone, feel free to make use of it!). Recording in a small room which has carpets, and other small things which absorb sounds, can be a very easy way of recording with a minimum of ambient noise. If you find that your recordings do have background noise, you can make use of free software like Audacity to do some clean-up.
4. Use vocal exercises
Vocal exercises, like singing scales, are extremely useful for training data. They'll allow the model to develop a more complete understanding of a given vocalist's unique sound. Vocal exercises also are used to demonstrate a complete repertoire of phonemes, which in turn creates a more accurate and authentic voice.
In summary:
- Provide at least 10 minutes of audio
- Vocals should have as much variety as possible
- Look to vocal exercises as a way to fill out your samples
- No effects on your samples, keep things clean
- Better quality samples = a higher quality model
