Literature Review on Singer Identification

Authors: Mohit Raj Nayak

DOI Link: https://doi.org/10.22214/ijraset.2023.54357

Abstract

Introduction

I. INTRODUCTION

This study introduces a technique for automatically identifying singers that groups songs based on voice resemblance after analysing musical signals. The technique employs a statistical model to distinguish singing voices from other sounds and identify singers. With an accuracy rate of 80%, it extracts audio components and matches them with singer voice models. To evaluate vocalist identification strategies in polyphonic music, the method additionally makes use of pattern classification and vocal separation algorithms.

The project also investigates the use of timbre data to detect vocal qualities of vocalists and middle-level variables to capture perceptual properties of music. On the Artist20 benchmark dataset, the proposed convolutional recurrent neural network (CRNN) achieves an average F1 score of 0.81.

The study looks into how popular song performers might be recognised using vibrato traits. The method computes cepstral coefficients based on vibration features and leverages high-level musical experience in song structure. The work generates polyphonic musical audio signals, models the characteristics of a singing voice, and employs MFCC and LPC coefficients to identify artists in Indian video songs.

The work provides methods for modelling the characteristics of a singing voice and producing polyphonic musical audio signals. As domain adaptation strategies, the method employs gradient reversal, contrastive adaptation network (CAN), and maximum mean discrepancy (MMD). The study looks at the use of 32 Mel-Frequency Cepstral Coefficients in two subsets: low order MFCCs that represent vocal tract resonances and high order MFCCs linked to glottal wave shape.

Additionally, the study looks at how polyphonic audio signals can imitate the characteristics of singing voices utilised in Sri Lankan music. A deep neural network model called the KNN-Net for SID learns the representation of local timbre features from the vocalist's voice and the background music.

The authors present novel methods for learning classifiers directly from polyphonic musical signals to improve MIR systems. These techniques are meant to increase the metaverse's vocalist identification's precision and dependability.

As music databases grow, solutions are needed to efficiently classify and retrieve digital music libraries. Automatic singer identification (SID), which recognises singing performers in audio samples, is an essential part of MIR systems. Traditional methods like the Hidden Markov Model (HMM), Support Vector Machine (SVM), and Gaussian Mixture Model (GMM) are not flawless. Researchers have suggested utilising Open-unmix, an open-source tool, to separate singing from accompaniment. For speech and music recognition, respectively, Mel spectrum cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) are employed. To improve SID accuracy, Nasrullah and Zhao propose a novel model based on a convolutional recurrent neural network (CRNN) framework.

The development of online music archives has led to the emergence of content-based music retrieval algorithms, with automatic artist identification playing a crucial role. Identifying singing segments, isolating vocal segment characteristics, and building a singer classifier using feature parameters are the three primary processes in the critical MIR problem of singer identification (SingerID). Recent studies have concentrated on perceptually motivated characteristics of singing voices for music content processing and analysis, such as vibrato and harmonic acoustic characteristics.

This study uses higher-order cepstral coefficients related to pitch and fine spectral structure to characterise singing voices with a focus on individuality rather than formantic organisation. The definition of singing patterns, author classification of music, and singer automatic recognition could be the main areas of future study.

Conclusion

The article addresses automatic singer recognition (SAR) methods that improve accuracy and get closer to Kullback-Leibler divergence in polyphonic music and split vocals. The approach focuses on middle-level musical features and employs convolutional and recurrent networks. It offers HSI, a ground-breaking technique with extraordinary efficacy, scalability, efficiency, and durability for identifying singers in large music libraries. A novel data set for singer identification using CMS, a technique for recognising vocalists based on harmonic, vibrato, and timbre information, a MIR system based on vocal timbre similarity, and a computer method for vocalist recognition in music retrieval tasks are also presented in this paper. Although the approach is all-encompassing and applicable to GMM-based MIR classification issues, VTS uncertainty propagation equations must be changed.

References

[1] Zhang, Tong. \"Automatic singer identification.\" 2003 International Conference on Multimedia and Expo. ICME\'03. Proceedings (Cat. No. 03TH8698). Vol. 1. IEEE, 2003. [2] Mesaros, Annamaria, Tuomas Virtanen, and Anssi Klapuri. \"Singer identification in polyphonic music using vocal separation and pattern recognition methods.\" ISMIR. 2007. [3] Zhang, Xulong, et al. \"Singer identification for metaverse with timbral and middle-level perceptual features.\" 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. [4] Fujihara, Hiromasa, et al. \"Singer Identification Based on Accompaniment Sound Reduction and Reliable Frame Selection.\" ISMIR. 2005. [5] Hsieh, Tsung-Han, et al. \"Addressing the confounds of accompaniments in singer identification.\" ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. [6] Shen, Jialie, et al. \"Towards efficient automated singer identification in large music databases.\" Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006. [7] Nwe, Tin Lay, and Haizhou Li. \"Exploring vibrato-motivated acoustic features for singer identification.\" IEEE Transactions on Audio, Speech, and Language Processing 15.2 (2007): 519-530. [8] Holzapfel, Andre, and Yannis Stylianou. \"Singer identification in rembetiko music.\" Conference on Sound and Music Computing (SMC). Sound and music Computing network, 2007. [9] Ratanpara, Tushar, and Narendra Patel. \"Singer identification using MFCC and LPC coefficients from Indian video songs.\" Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India (CSI) Volume 1. Springer International Publishing, 2015. [10] Kalayar Khine, Swe Zin, Tin Lay Nwe, and Haizhou Li. \"Exploring perceptual based timbre feature for singer identification.\" Computer Music Modeling and Retrieval. Sense of Sounds: 4th International Symposium, CMMR 2007, Copenhagen, Denmark, August 27-31, 2007. Revised Papers 4. Springer Berlin Heidelberg, 2008. [11] Fujihara, Hiromasa, et al. \"A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval.\" IEEE Transactions on Audio, Speech, and Language Processing 18.3 (2010): 638-648. [12] Maddage, Namunu Chinthaka, Changsheng Xu, and Ye Wang. \"Singer identification based on vocal and instrumental models.\" Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.. Vol. 2. IEEE, 2004. [13] Liu, Chih-Chin, and Chuan-Sung Huang. \"A singer identification technique for content-based classification of MP3 music objects.\" Proceedings of the eleventh international conference on Information and knowledge management. 2002. [14] Zhang, Xulong, et al. \"Metasid: Singer identification with domain adaptation for metaverse.\" 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. [15] Mesaros, Annamaria, and Jaakko Astola. \"The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification.\" ISMIR. 2005. [16] Amarasinghe, Rajitha, and Lakshman Jayaratne. \"Supervised learning approach for singer identification in sri lankan music.\" European Journal of Computer Science and Information Technology (EJCSIT) by European Centre for Research Training and Development UK 4.6 (2016): 1-14. [17] Zhang, Xulong, et al. \"Singer identification using deep timbre feature learning with knn-net.\" ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. [18] Lagrange, Mathieu, Alexey Ozerov, and Emmanuel Vincent. \"Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning.\" 13th International Society for Music Information Retrieval Conference (ISMIR). 2012. [19] Sharma, Bidisha, Rohan Kumar Das, and Haizhou Li. \"On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music.\" INTERSPEECH. 2019. [20] Zhang, Xulong, et al. \"Mdcnn-sid: Multi-scale dilated convolution network for singer identification.\" 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022.

Copyright

Copyright © 2023 Mohit Raj Nayak. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET54357

Publish Date : 2023-06-23

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here