MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription

Xianke Wang; Bowen Tian; Weiming Yang; Wei Xu; Wenqing Cheng

Полный текст Страница публикации Публикация в OpenAlex

Аннотация: Automatic singing transcription (AST), which refers to the process of inferring the onset, offset, and pitch from the singing audio, is of great significance in music information retrieval. Most AST models use the convolutional neural network to extract spectral features and predict the onset and offset moments separately. The frame-level probabilities are inferred first, and then the note-level transcription results are obtained through post-processing. In this paper, a new AST framework called MusicYOLO is proposed, which obtains the note-level transcription results directly. The onset/offset detection is based on the object detection model YOLOX, and the pitch labeling is completed by a spectrogram peak search. Compared with previous methods, the MusicYOLO detects note objects rather than isolated onset/offset moments, thus greatly enhancing the transcription performance. On the sight-singing vocal dataset (SSVD) established in this paper, the MusicYOLO achieves an 84.60% transcription F1-score, which is the state-of-the-art method.

Год издания: 2022

Авторы: Xianke Wang, Bowen Tian, Weiming Yang, Wei Xu, Wenqing Cheng

Издательство: Institute of Electrical and Electronics Engineers

Источник: IEEE/ACM Transactions on Audio Speech and Language Processing

Ключевые слова: Music and Audio Processing, Speech and Audio Processing, Speech Recognition and Synthesis

Показать дополнительные сведения

Будние дни	9:00–19:00
Суббота	9:00–17:00
Воскресенье	выходной день

Подразделения:

8:30–17:00 (обед 12:30–13:00), пн-пт

Контакты

Единый телефон	+7 (391) 291-25-74
Библиотека	+7 (391) 206-21-06
Издательство	+7 (391) 206-25-88
E-mail	bik [at] sfu-kras.ru
Адрес	пр. Свободный, 79/10

Библиотечно-издательский комплекс СФУ

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription
статья из журнала

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcriptionстатья из журнала

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription
статья из журнала