Friday, October 17, 2014

Hatsune Miku: Virtual Vocals and Synthetic Singing

During a recent Facebook scrolling session, an odd link popped up on my news feed. It was this video of a musical performance on the Late Show with David Letterman.


You don't need to be the most observant person in the world to realise that the performer, Hatsune Miku, or 初音ミク, as her name is written in Japanese, is not a real person. Hatsune Miku is not the first virtual performer; other popular virtual acts include Alvin and the Chipmunks, The Archies, and Gorillaz. However, Hatsune Miku can do something that other acts can't do: sing.

You may think that her high-pitched singing is not as good as the sped-up singing of Alvin, Simon, and Theodore, and you may be right. However, the Chipmunks, much like other virtual acts, had their music and their vocals pre-recorded. Hatsune Miku's vocals are synthesised using Yamaha's VOCALOID2 and VOCALOID3 vocal synthesisers.

If you're familiar with Japanese, you may recognise the components of Hatsune Miku's name. In fact, the name translates as "the first sound from the future", with Hatsu (初) meaning "first", Ne (音) meaning "sound", and Miku (ミク) meaning "future".

Sapporo, Japan, the hometown of Hatsune Miku.
While 16 year-old Hatsune Miku could be said to be from Sapporo, the technology that allows her to sing was conceived of in Spain as part of a research project at Pompeu Fabra University in Barcelona.

Hatsune Miku's voice isn't purely synthesised and is in fact generated from phonemes prerecorded by Japanese voice actress Saki Fujita. Initially, only Japanese phonemes were recorded, before learning English (from Saki Fujita's recordings) for a later release. This allows her to sing in both languages, albeit with a Japanese accent when she sings in English.

The process that allows for the manipulation of the phonemes into song is known as concatenative synthesis. Using this process, sound samples (known as units) can be manipulated. This allows the user to modify a range of qualities, including the unit's length, pitch, and timbre.

Since anyone who owns the software can synthesise speech and vocals, Hatsune Miku is "technically" the performer of thousands of songs. She's not alone, though. There are also other virtual performers available with different language combinations such as Spanish and Chinese. Other languages can also be approximated using preexisting phonemes, with differing levels of success.