Image Categorization through Convolutional Neural Network

Authors: Harsh Kumar, Aijaz Ahmad Chopan

DOI Link: https://doi.org/10.22214/ijraset.2024.63631

Abstract

Ten years ago, there were barriers to ideal accuracy in many computer vision issues. But the emergence of deep learning techniques brought about a dramatic change that greatly improved the accuracy of these problems. Among these, image classification stands out as a key problem: it is the challenge of accurately classifying images into their corresponding classifications, such dogs and cats. This research aims to improve accuracy by utilizing state-of-the-art object detecting algorithms. In order to tackle this, a great deal of effort has gone into building a convolutional neural network (CNN) that is robust and designed with image categorization in mind. The principal goal is to leverage the capabilities of cutting-edge object identification techniques in order to achieve significant improvements in image classification accuracy.

Introduction

I. INTRODUCTION

Image classification is a fundamental problem in the machine learning and is important for many real-world applications. The main goal of this project is to identify photographs that show dogs and cats, which is a well-known computer vision challenge. There are 25,000 photos in all in the collection, with 12,500 photos of cats and 12,500 photos of dogs distributed equally.

This project uses 20,000 photographs for training data, with a deliberate distribution of images for validation and training. This gives the machine learning model the ability to extract complex patterns and features that are necessary for distinguishing between images of cats and dogs.

Furthermore, a separate collection of 5,000 photos has been set aside for validation, which is essential for assessing how well the model performs on untested data and verifying its generalizability.

Convolutional Neural Networks (CNNs), a potent deep learning method skilled in processing visual data, are the foundation of this project. CNNs are particularly good at deciphering the hierarchical representations seen in images, which makes it possible to extract the subtle elements required for precise categorization.

The main objective of this project is to create a reliable and accurate model that can discriminate between photos of cats and dogs by utilizing CNNs and a well selected dataset.

The emphasis on using 20,000 photographs for training and 5,000 images for validation is meant to make sure that the model is capable of learning from the given data and performing well on fresh, unviewed images of dogs and cats. This project intends to produce insights and solutions useful for a variety of sectors and applications. It also makes a contribution to the current improvements in machine learning, particularly in the area of picture classification.

II. CONVOLUTIONAL NEURAL NETWORK

In contemporary machine learning, Convolutional Neural Networks (CNNs) are an essential component, especially for tasks involving images. These are specialized deep learning models that are particularly helpful for processing visual data, such as photographs, because they are made to automatically and adaptively learn hierarchical representations of data. CNNs are excellent at comprehending and categorizing complicated visual input because they use convolutional layers, which record local patterns and hierarchies, pooling layers, which down sample and extract key characteristics, and fully connected layers for classification. Their scalability, robustness against spatial variance, and capacity to learn complex patterns make them a preferred method in many computer vision applications, including object identification and image recognition.

III. MODEL ARCHITECTURE

The code uses Keras' Sequential API to define a CNN model. The 2D convolutional layer (Conv2D) with 32 filters, a (3,3) kernel size, and a ReLU activation function is the first layer in the process. Max-pooling (MaxPooling2D) and batch normalization layers come after each convolutional layer. There are three convolutional blocks stacked, 32, 64, and 128 filter sizes, in ascending order. The output is then flattened by the model and sent through two dense layers of 128 and 64 units, respectively, each of which uses dropout and ReLU activation to lessen overfitting. With a sigmoid activation function and a single unit, the final layer is dense and appropriate for binary classification.

IV. MODEL COMPILATION AND TRAINING

The binary cross-entropy loss function and Adam optimizer are used to create the model, and accuracy is selected as the evaluation metric. Using the training dataset (train_ds) for learning and the validation dataset (validation_ds) for validation, the training loop runs for 15 epochs (model.fit(train_ds,epochs=15,validation_data=validation_ds)). The fit function uses backpropagation to update the model's weights during training, taking into account the given optimizer and loss function.

V. MODEL EVALUATION

Following training, the code uses Matplotlib to plot the accuracy and loss curves for the training , validation datasets (plt.plot(history.history['accuracy'],...) and plt.plot(history.history['loss'],...)) in order to visualize the performance of the model. These charts illustrate the evolution of the accuracy and loss metrics across each epoch, offering insights into the generalization and learning process of the model. By analyzing these curves, one can determine if the model performs well on unobserved data and determine if it is overfitting or underfitting.

Conclusion

This study\'s 98.57% accuracy rate is a significant achievement in the field of using Convolutional Neural Networks (CNNs) to classify images as cat or dog. This high degree of accuracy highlights the model\'s effectiveness in differentiating between photos of cats and dogs in the dataset. This remarkable accuracy highlights the correctness and efficacy of the model architectures and procedures used in this study. This work demonstrates impressive progress in picture classification approaches, especially in differentiating between photographs of cats and dogs, with an accuracy far greater than the previously published standards.

References

[1] Jajodia, T., & Garg, P. (2019). Image classification–cat and dog images. Image, 6(23), 570-572. [2] Parkhi, O. M., Vedaldi, A., Zisserman, A., & Jawahar, C. V. (2012, June). Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3498-3505). IEEE. [3] chen, Y. Y. (2020, November). Dog and cat classification with deep residual network. In Proceedings of the 2020 European Symposium on Software Engineering (pp. 137-141). [4] Kolla, V. R. K. (2020). Paws And Reflect: A Comparative Study of Deep Learning Techniques For Cat Vs Dog Image Classification. International Journal of Computer Engineering and Technology. [5] Bang Liu, Yan Liu, Kai Zhou “Image Classification for Dogs and Cats”. [6] https://www.kaggle.com/datasets/salader/dogs-vs-cats

Copyright

Copyright © 2024 Harsh Kumar, Aijaz Ahmad Chopan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET63631

Publish Date : 2024-07-15

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here