Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

Zhihuan Kuang; Shi Zong; Jianbing Zhang; Jiajun Chen; Hongfu Liu

doi:10.48550/arxiv.2210.00434

Back

Preprint

Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

Zhihuan Kuang, Shi Zong, Jianbing Zhang, Jiajun Chen and Hongfu Liu

10/02/2022

DOI: https://doi.org/10.48550/arxiv.2210.00434

Abstract

In this paper, we consider a novel research problem, music-to-text synaesthesia. Different from the classical music tagging problem that classifies a music recording into pre-defined categories, the music-to-text synaesthesia aims to generate descriptive texts from music recordings for further understanding. Although this is a new and interesting application to the machine learning community, to our best knowledge, the existing music-related datasets do not contain the semantic descriptions on music recordings and cannot serve the music-to-text synaesthesia task. In light of this, we collect a new dataset that contains 1,955 aligned pairs of classical music recordings and text descriptions. Based on this, we build a computational model to generate sentences that can describe the content of the music recording. To tackle the highly non-discriminative classical music, we design a group topology-preservation loss in our computational model, which considers more samples as a group reference and preserves the relative topology among different samples. Extensive experimental results qualitatively and quantitatively demonstrate the effectiveness of our proposed model over five heuristics or pre-trained competitive methods and their variants on our collected dataset.

Metrics

64 Record Views

Details

Title: Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Creators: Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu - Brandeis University, Michtom School of Computer Science
Identifiers: 9924208486201921
Academic Unit: Benjamin and Mae Volen National Center for Complex Systems; Michtom School of Computer Science
Language: English
Resource Type: Preprint

Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

Abstract

Metrics

Details

Brandeis University Social media