We introduce a novel method for synthesizing dance motions that follow the emotions and contents of a piece of music. Our method employs a learning-based approach to model the music to motion mapping relationship embodied in example dance motions along with those motions’ accompanying background music. A key step in our method is to train a music to motion matching quality rating function through learning the music to motion mapping relationship exhibited in synchronized music and dance motion data, which were captured from professional human dance performance. To generate an optimal sequence of dance motion segments to match with a piece of music, we introduce a constraint-based dynamic programming procedure. This procedure considers both music to motion matching quality and visual smoothness of a resultant dance motion sequence. We also introduce a two-way evaluation strategy, coupled with a GPU-based implementation, through which we can execute the dynamic programming process in parallel, resulting in significant speedup. To evaluate the effectiveness of our method, we quantitatively compare the dance motions synthesized by our method with motion synthesis results by several peer methods using the motions captured from professional human dancers’ performance as the gold standard. We also conducted several medium-scale user studies to explore how perceptually our dance motion synthesis method can outperform existing methods in synthesizing dance motions to match with a piece of music. These user studies produced very positive results on our music-driven dance motion synthesis experiments for several Asian dance genres, confirming the advantages of our method.
We have implemented a prototype demo system. Users can import a music piece from our CAPG music dance database and the corresponding dance motion will be automatically imported into the system and displayed on the left. After that, users can choose segmentation method, music feature and motion feature. Finally, click generate button which will invoke the generation process by calling the pre-trained model. When the generated motion is ready, it will be displayed on the right of the original motion.
 Jin B, Feng L, Liu G, et al. A hybrid approach to animating the murals with Dunhuang style[C]//2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2014: 1–6
 Jin B, Geng W. Correspondence specification learned from master frames for automatic inbetweening[J]. Multimedia Tools and Applications, Springer US, 2015, 74(13): 4873–4889
 Du Y, Wong Y, Liu Y, et al. Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps[G]//Springer, Cham, 2016: 20–36
 Du Y, Wong Y, Jin W, et al. Semi-supervised learning for surface EMG-based gesture recognition[J]. IJCAI International Joint Conference on Artificial Intelligence, California: International Joint Conferences on Artificial Intelligence Organization, 2017: 1624–1630.
 Wang Z, Han F, Geng W. Image mosaicking for oversized documents with a multi-camera rig[C]//2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 2017: 161–167
 Wang Z, Geng W. Generation of view-dependent textures for an inaccurate model[C]//2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS). IEEE, 2017: 85–90
 Wang Z, Jin B, Geng W. Estimation of Antenna Pose in the Earth Frame Using Camera and IMU Data from Mobile Phones[J]. Sensors, Multidisciplinary Digital Publishing Institute, 2017, 17(4): 806
 Du Y, Jin W, Wei W, et al. Surface EMG-Based Inter-Session Gesture Recognition Enhanced by Deep Domain Adaptation[J]. Sensors, Multidisciplinary Digital Publishing Institute, 2017, 17(3): 458.