Related Projects

The audio recordings in the Tone Perfect database are related to the following studies oriented around human perception of tone in language, particularly in Mandarin, conducted by Catherine Ryu of Michigan State University’s Department of Linguistics and Languages.

The recordings in the Tone Perfect database have been used to create other works listed on this page.


Lingua Incognita

Contact Information

"The aim of Lingua Incognita is to explore the musical possibilities of speech while also simulating the confusion of the novice language learner. [...] It’s up to the listener to decide on the meaning or lack thereof." Read more about this piece at Ben Fuhrman's Lingua Incognita page.


Heart Doubt

Contact Information

Heart Doubt is a piece of experimental electronic music. It utilizes both the audio assets from the Tone Perfect database and dramatic utterances of “I love you” by two native speakers (male and female), expressing a wide range of emotions such as anger, irony, tenderness, irritation, solemness, playfulness, and others. As such, this piece infuses the richness of Mandarin lexical and emotional tones into a novel sonic experience. To fully appreciate the acoustic intricacy of this piece, a high quality headset is recommended

Voice actors: Zijin Liu and Haitian Yan

Program Notes: Benjamin R. Fuhrman

Commissioned by Catherine Ryu of Michigan State University’s Department of Linguistics and Languages, Heart Doubt was created out of the Mandarin syllables “wo ai ni” (trans. “I love you”) to show off the creative possibilities of the Tone Perfect: Multimodal Database for Mandarin Chinese (https://tone.lib.msu.edu). Using these syllables, this piece explores the initial stage of interest and doubt at the beginning of a relationship. The male and female speakers are essentially engaged in an examination of their feelings for each other in a classic “she loves me, she loves me not” type game, trying to make sense of their feelings as outside thoughts intrude on their inner monologues.

The sounds in Heart Doubt were created by first quantizing the speech segments to begin on a controlled pitch (still allowing for the pitch contour changes in Mandarin), and then running the sounds into multiple instances of the IRCAM Multi Granular and IRCAM Stretch sample oscillators in UVI’s Falcon instrument in order to create different sonic textures. Additional textural work was created using PaulStretch to create the “monk choir” chanting the syllables that control the action in each of the three sections of the piece: wo (0:00-2:30), ai (2:30-3:30), and ni (3:30-5:00). Enveloping of each section was roughly mapped to the pitch contour of the respective control syllable in terms of both volume and effects automation.

Additional sounds were created on a modular synthesizer by feeding the DAW output to an envelope follower, and using that to control triggering, additional envelopes, and oscillators, creating the bell and chime sounds heard at different points through the piece.


Mandarin Sound Table

Contact Information

This sound table contains all monosyllabic sounds in Mandarin Chinese in four tones. If you do not know what kinds of sounds exist in Mandarin Chinese, explore them through this sound table.

Explore the full sound table


KUAI App

Contact Information

KUAI app teaches basic Mandarin pronunciation by providing random syllables and asking the user to transcribe them using pinyin.


Mandarin Tone Machine Learning Project

Contact Information

Mandarin Chinese is a tonal language where detection of tone is crucial to understanding the word’s meaning. However, for people with cochlear implants or non-native Chinese speakers, tone identification is a challenging task, providing the need for an automated system for tone recognition. This study trained a convolutional neural network (CNN) to classify Mandarin tones from audio recordings. The training data used was a monosyllabic Mandarin Chinese dataset of 9,860 audio files. The neural network was trained on either male, female, or combined data, and for each dataset split, either mel-frequency cepstral coefficients (MFCC), mel-spectrograms, or pitch contours were extracted from the audio files and fed as input features into the CNN. The highest test accuracy achieved in this research is 99.8%, which outperforms monosyllabic Mandarin dataset-trained models reported in previous literature. Results indicate that separating audio files by gender when training the neural network yields highest testing accuracies, which has important implications for future research in tone recognition.


CCC (Chinese Consonant Classifier)

Contact Information

CCC (Chinese Consonant Classifier) is a Mandarin learning assistant that is specialized in showing how close a user's consonant pronunciation is to the native pronunciation. The project is intended to be a tool for language learners who struggle to distinguish similar Mandarin consonants especially sounds that do not exist in their native languages. A learner can pronounce more like a native speaker if his/her mistakes are pointed out early and we are hoping that CCC will assist us in this part of the learning process.


Watch Your Tone

Contact Information

Watch Your Tone is a self-testing program to help Mandarin learners develop their ear for the different tones.


Supervised Learning Models for Classifying Tones in Mandarin Chinese

Contact Information

This open sourced project is focused on creating auditory feedback by effectively categorizing user speech into one of the four tones of Mandarin Chinese. This will enhance language learning with real-time pronunciation feedback. There already exists a plethora of sources to aid in memorizing vocabulary and understanding grammar, yer few for real-time pronunciation without the addition of an instructor or large price tag. This serves as a supplementary source that can provide quick and informative auditory feedback for tonal monosyllabic words.


The Interactive Fiction for Second Language Learners Collection

Contact Information

The Interactive Fiction for Second Language Learners Collection, a project by Jeremy A. Robinson at Gustavus Adolphus College, contains various interactive fiction stories designed for use by beginning or intermediate second language students. Smile from the Heart (发自内心的微笑) by Runyan Xu, A Polar Bear Living under Climate Change (初识气候变化的北极熊) by Runyan Xu, and A Cat (一只猫) by Linda Ruan are three of these stories. In order to help the reader with pronunciation of unknown characters, each character in the stories is conveniently linked to the Tone Perfect: Multimodal Database for Mandarin Chinese.


Open Source Mandarin Tone Practice App

Contact Information

As all people learning Chinese know, pronouncing tones is one of the most difficult aspects of the language. Pronouncing the tones of individual characters is hard enough, but pronouncing whole phrases correctly is much more difficult.

The Open Source Mandarin Tone Practice App addresses this problem. Users can record themselves saying entire phrases and a machine learning model will tell them which tones they are using. Users can additionally practice on each character separately before trying to tackle the entire phrase.

The application is open-source, and all phrases were written using an HSK 1 vocabulary.


Voice-Changing Detection with Convolutional Neural Network

Contact Information

This was a project by a Chinese high school student, Chuntung Zhuang Leo, to determine whether tonality and/or expression of tone play a key factor in helping machine learning programs differentiate between real human voices and artificial or altered ones. His hypothesis was based on findings that using Chinese as the training language outperformed English when it came to identifying fake voices. Early efforts at evaluating his hypothesis seem to indicate that the hypothesis is correct.

Mandarin Trainer

Contact Information

"Mandarin Trainer" is a web-based tool designed to help non-native speakers practice and improve their Mandarin pronunciation, with a focus on tones and syllables. The project utilizes a deep learning model based on the HuBERT architecture, fine-tuned on the Michigan State University Tone Perfect dataset, containing recordings of 6 different speakers pronouncing each of the possible Mandarin tone/syllable combinations.

Using two distinct fine-tuned audio classification models, the model achieves 100% accuracy in tone classification and 98% accuracy in syllable identification. To achieve this, various data augmentation techniques were utilized to enhance the robustness and generalization capabilities of the model based on the Tone Perfect dataset., including time stretching, white noise addition, low-pass filtering, and random silence insertion,

Mandarin Trainer allows users to practice pronunciation attempts for a given tone/syllable combination and receive immediate feedback on their tone and syllable accuracy by running audio classification against both Tone and the Syllable model. By providing an accessible, interactive platform for targeted practice, Mandarin Trainer aims to address the challenges faced by learners in mastering the tonal aspects of Mandarin Chinese, which are crucial for effective communication in the language.

The project demonstrates the application of state-of-the-art machine learning techniques to language learning, showcasing how AI can be leveraged to create personalized, data-driven tools for improving pronunciation skills in tonal languages.

The use of DNN as a tool to evaluate Mandarin pronunciation for CFL speakers (with Russian native speakers as test subjects)

Contact Information

The experiment is a part of MA thesis. The main task is to explore the possibility of use of a deep neural network (trained on a standard Mandarin audio corpus using supervised learning) as a tool to evaluate the quality of Mandarin tone pronunciation of second language learners, so the probability estimate of class prediction would reflect the test subject's quality of tone pronunciation.