SingingFace

This work was supported in part by National Key R&D Program of China (Grant No. 2021YFC3300403), National Natural Science Foundation (Grant No. 62072382), National Science Foundation (OAC-2007661), and Yango Charitable Foundation.

SingingFace dataset.

SingingFace collects over 600 Chinese and English singing videos of 6 human subjects, totaling 40 hours with 30 FPS. Each video has a stable camera location and appropriate lighting conditions. We organize the dataset by recording singing videos ourselves. Specifically, we collect the singing audio set first, then the face region of the person singing the song with music played simultaneously is recorded. Finally, we automatically align each video to the corresponding music audio using SyncNet to ensure audio-visual synchronization. The following are some sample videos.

Rendered 3D singing face video.

Details of dataset files.

The dataset files are structured as follows:

For the download link of the full dataset, please contact zengming@xmu.edu.cn.