Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: G. Hasini Varma, J. Vishnu Varun , K. Ganesh Reddy , T. Radhika Reddy , N. Omkar Sainath , CH. Chandu , Mr. D Arivazhagan
DOI Link: https://doi.org/10.22214/ijraset.2023.57496
Certificate: View Certificate
Music plays a very important role in people’s lives. Music brings like-minded people together and is the glue that holds communities together. Communities can be recognized by the type of songs that they compose, or even listen to the purpose of our project and research is to find a better machine learning algorithm than the pre-existing models that predicts the genre of songs. Genres can be defined as categorical labels created by humans to identify or characterize the style of music. The concept of automatic music genre classification has become very popular in recent years as a result of the rapid growth of the digital entertainment industry. offering the potential to enhance a wide range of visual-based applications.
I. INTRODUCTION
A. Overview Of Project
Music genres are a set of descriptive keywords that convey high-level information about a music clip (jazz, classical, rock...). Music classification is considered as a very challenging task due to selection and extraction of appropriate audio features. While unlabeled data is readily available music tracks with appropriate genre tags is very less. Genre classification is a task that aims to predict music genre using the audio signal. Being able to automatize the task of detecting musical tags allow to create interesting content for the user like music discovery and playlist creations, and for the content provider like music labeling and ordering. Building this system requires extracting acoustic features that are good estimators of the type of genres we are interested, followed by a single or multi label classification or in some cases, regression stage. Conventionally, feature extraction relies on a signal processing front-end in order to compute relevant features from time or frequency domain audio representation. The features are then used as input to the machine learning stage. Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it learn for themselves. Machine learning algorithms are used to classify and predict the genre of a music.
II. LITERATURE REVIEW
III. AIM AND SCOPE OF PRESENT INVESTIGATION
A. Aim Of The Project
Embarking on a deep learning project for the automatic classification of musical genres from audio files holds significant importance in various practical scenarios. The focus of our project is to utilize low-level features related to frequency and time domain in audio data for this classification task. To ensure consistency in our analysis, we are in need of a dataset comprising audio tracks with similar sizes and frequency ranges.
The GTZAN genre classification dataset emerges as the most recommended choice for our endeavor. Widely regarded as a benchmark in the field, this dataset offers a diverse collection of audio tracks spanning various genres. Its comprehensive nature makes it an ideal foundation for training and evaluating deep learning models. By tapping into the GTZAN dataset, our objective is to construct a resilient and efficient deep learning system capable of precisely categorizing music genres based on their distinctive acoustic characteristics.
B. Scope And Objective
a. Developing a machine learning model that classifies music into genres shows that there exists a solution which automatically classifies music into its genres based on various different features, instead of manually entering the genre.
b. Another objective is to reach a good accuracy so that the model classifies new music into its genre correctly.
c. This model should be better than at least a few preexisting models.
C. System Requirements
The most common set of requirements defined by any operating system or software application is the physical computer resources, also known as hardware. The minimal hardware requirements are as follows,
a. Processor : Pentium IV
b. RAM : 8 GB
c. Processor : 2.4 GHz
d. Main Memory : 8GB RAM
e. Hard Disk Drive : 1tb
f. Keyboard : 104 Keys
2. Software Requirements
Software requirements deals with defining resource requirements and prerequisites that needs to be installed on a computer to provide functioning of an application.
The minimal software requirements are as follows,
a. Front end : python
b. Dataset : GTZAN.csv
c. IDE: Command prompt.
d. Operating System : Windows 11
IV. DESIGN METHODOLOGY
A. Existing System
In existing system, we used k-nearest neighbor (k-NN) to classify the genres. This doesn’t give an absolute reasonable correlation between learning strategies for classification of music genre. It uses filter modeling before Piecewise Gaussian Modeling. However, these improvements are not statistically significant. This procedure does not increase classification accuracy and it doesn’t achieve the efficiency prediction.
B. Proposed System
Support Vector Machine (SVMs) have been actively used for various music classification tasks such as music tagging , genre classification and user-item latent feature prediction for recommendation. SVMs and KNNs assume features that are in different levels of hierarchy and can be extracted by convolutional kernels. The hierarchical features are learned to achieve a given task during supervised training.
???????C. Module Description
An SVM model is basically a representation of different classes in a hyperplane in multidimensional space. The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized. The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH).
a. Support Vectors: Datapoints that are closest to the hyperplane is called support vectors. Separating line will be defined with the help of these data points.
b. Hyperplane: As we can see in the above diagram, it is a decision plane or space which is divided between a set of objects having different classes.
c. Margin: It may be defined as the gap between two lines on the closet data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors.
2. Working of KNN Algorithm
K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further X means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of following steps:
a. Feature Extraction: The process of feature extraction is useful when you need to reduce the number of resources needed for processing without losing important or relevant information. Feature extraction can also reduce the amount of redundant data for a given analysis. Also, the reduction of the data and the machine’s efforts in building variable combinations (features) facilitate the speed of learning and generalization steps in the machine learning process.
???????D. Model Training
A training model is a dataset that is used to train an ML algorithm. It consists of the sample output data and the corresponding sets of input data that have an influence on the output. The training model is used to run the input data through the algorithm to correlate the processed output against the sample output. The result from this correlation is used to modify the model. This iterative process is called “model fitting”. The accuracy of the training dataset or the validation dataset is critical for the precision of the model. Model training in machine language is the process of feeding an ML algorithm with data to help identify and learn good values for all attributes involved. Ther e are several types of machine learning models, of which the most common ones are supervised and unsupervised learning. Supervised learning is possible when the training data contains both the input and output values. Each set of data that has the inputs and the expected output is called a supervisory signal. The training is done based on the deviation of the processed result from the documented result when the inputs are fed into the model.
???????E. Trained Classifier
Classification is the process of predicting the class of given data points. Classes are sometimes called as targets/ labels or categories. Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).
???????F. Prediction
The song genre is predicted from the input we provided.
VII. FUTURE ENHANCEMENTS
In our future endeavors, we aim to explore different deep learning methods, particularly those specialized for time series data, such as RNN models like GRU and LSTM, considering their potential for improved performance. Additionally, we're intrigued by the generative aspects of the project, contemplating genre conversion akin to generative adversarial networks transforming photos into artistic styles, but tailored for music. This opens possibilities for creative applications and unique user experiences.
Moreover, we anticipate opportunities for transfer learning, envisioning applications in classifying music by artist or decade. Leveraging pre-trained models and knowledge from one task to enhance performance in another could offer efficiency and accuracy gains in our music genre classification project.
To address the multi-class problem, we propose an ensemble approach, combining SVM and KNN models to create confidence vectors for each input. This ensemble strategy allows for nuanced predictions and could enhance a streaming service, enabling multiple genre inputs. Additionally, experimenting with diverse machine learning models, beyond SVM and KNN, aims to expose different implicit features in the dataset. This exploration contributes to a more comprehensive understanding of music genres and can potentially improve classification accuracy.
[1] Git Hub: https://github.com/alikaratana/Music-Genre-Classification [2] Images: https://www.google.co.in/search?sca_esv=589379245&sxsrf=AM9HkKnWFsTHkeOfyhXh5aBoM1gaso1ouQ:1702127489750&q=music+genre+classification&tbm=isch&source=lnms&sa=X&ved=2ahUKEwiM5I2Dt4KDAxWJTGwGHWMjDyYQ0pQJegQIDBAB&biw=1280&bih=643&dpr=1.5 [3] K-means and hierarchical clustering with Python by Joel Grus Released on August 2016. [4] https://www.analyticsvidhya.com/blog/2022/03/music-genre-classification-project-using-machine-learning-techniques/ [5] https://www.geeksforgeeks.org/music-genre-classifier-using-machine-learning/ [6] Music Genre Classification Using Independent Recurrent Neural Network, Wenli Wu ; Fang Han ; Guangxiao Song ; Zhijie Wang, 2018 Chinese Automation Congress(CAC). [7] Music Genre Classification and Recommendation by Using Machine Learning Techniques, Ahmet Elbir ; Hilmi Bilal Çam ; Mehmet Emre Iyican ; Berkay Öztürk ; Nizamettin Aydin, 2018 Innovations in Intelligent Systems and Applications Conference (ASYU). [8] Improve Music Genre Classification with Conv Nets, Rafael L. Aguiar ; Yandre M.G. Costa ; Carlos N. Silla, 2018 International Joint Conference on Neural Networks(IJCNN). [9] Genre Classification using Word Embeddings and Deep Learning, Akshi Kumar ; Arjun Rajpal ; Dushyant Rathore, 2018 International Conference on AdvancesinComputing,Communications andInformatics(ICACCI). [10] Music Genre Classification: A N-Gram Based Musicological Approach, Eve Zheng ; Melody Moh ; Teng-Sheng Moh, 2017 IEEE 7th International Advance Computing Conference (IACC). [11] Long short-term memory recurrent neural network based segment features for music genre classification, Jia Dai ; Shan Liang ; Wei Xue ; Chongjia Ni ; Wenju Liu, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). [12] Music Genre Recognition Using Residual Neural Networks, Dipjyoti Bisharad ; Rabul Hussain Laskar TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON).
Copyright © 2023 G. Hasini Varma, J. Vishnu Varun , K. Ganesh Reddy , T. Radhika Reddy , N. Omkar Sainath , CH. Chandu , Mr. D Arivazhagan . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET57496
Publish Date : 2023-12-11
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here