Image Classification using Deep Learning and Tensorflow

Authors: Ankit Yadav, Ibtesaam Rais, Manoj Kumar, Anurag Sharma, Abhay Kushwaha

DOI Link: https://doi.org/10.22214/ijraset.2022.43385

Abstract

The image classification is one of the most classical problem of image processing. This research paper about image classification by using deep neural network(DNN) or also known as Deep learning by using framework Tensorflow. Python is used as a programming language because it comes together with Tensorflow framework. Image Classification nowdays is used to narrow the gap between the computer vision and human vision so that the images can be identify by the machine in the same way as human can do. It handle the assigning task for image class. So we are proposing a system called Image Classification using Deep Learning that classifies given images using classifiers such as Neural Network. The system will be built using Python as a programming language and Tensorflow to create neural networks.

Introduction

I. INTRODUCTION

Classifications are systematically divided into groups and categories based on their characteristics. Image classification has emerged to narrow the gap between computer vision and human vision by training computers with data. Image classification is achieved by classifying images into predetermined categories based on the content of the vision. Motivated by [1], this article describes the study of image classification using deep learning. Traditional image classification methods are part of the field of artificial intelligence (AI), formally known as machine learning. Machine learning consists of a feature extraction engine that extracts important features such as edges and textures, and a classification engine that classifies based on the extracted features. The main limitation of machine learning is that it can be separated, but it can only extract specific features on the image, not characteristic features from the training dataset. This shortcoming is eliminated by using deep learning [2]. Deep learning (DL) is a subfield of machine learning that can be learned by a unique calculation method. Deep learning models have been introduced to permanently decompose information in a homogeneous structure that humans encounter. To achieve this, deep learning uses a hierarchical structure of multiple algorithms, represented as an artificial neural system (ANN). ANN's architecture is simulated using the biological neural network of the human brain. This makes deep learning more powerful than standard machine learning models [3, 4]. Deep learning examines neural networks that identify images based on the characteristics of the image. This is achieved to build a complete feature extraction model that can solve the difficulties faced by traditional methods. The integrated model extractor should be able to learn to accurately extract the discriminator from the training set of images.

II. ARTIFICIAL NEURAL NETWORK

A neural network is a collection of hardware joined or separated by software that operates on the neuron, a small component of the human brain. As an alternative to the above scenario, a multi-layered neural network can be proposed. The amount of training picture samples should be more than nine times the number of parameters required to tune the classical classification in high resolution. The construction of a multi-layered neural network in real-world implementations is quite complicated[5-8]. Deep Learning is the current term for a multi-layered neural network.

We train the model by feeding it data, which is then passed through hidden cycles that create bespoke grid pictures, extract data from each area, and inform the network about its output. The number of layers involved in producing inputs and outputs, as well as the depth of the neural network, are used to describe neural networks. The most well-known implementation of genetic algorithms for hidden layers is the Convolutional neutral network, which involves pooling and padding data to prepare it for insertion into the training model using a test dataset.

III. CONVOLUTIONAL NEURAL NETWORK

The Convolutional Neural Networks (CNN) are utilized in a number of tasks which have agreat performance in different applications. Recognition of handwritten digits [9] was one among the first application where CNN architecture was successfully implemented. Since the creation of CNN, there has been continuous improvement in networks with the innovation of latest layers and involvement of different computer vision techniques [10]. Convolutional Neural Networks are mostly utilized in the ImageNet Challenge with various combinations of datasets of sketches [11].On image datasets, few studies have shown a comparison between the detection abilities of a human subject and a trained network . The comparison results showed that person corresponds to a 73.1% accuracy rate on the dataset whereas the outcomes of a trained network show a 64% accuracy rate [12]. Similarly, when Convolutional Neural Networks was applied to the identical dataset it yielded an accuracy of 74.9%, hence outperforming the accuracy rate of humans [13]. The used methods mostly make use of the strokes’ order to achieve a much better accuracy rate. There are studies happening that aim at understanding Deep Neural Network’s behavior in diverse situations [12]. These studies present how small changes made to a picture can severely change the results of grouping. within the work also, presents images that are fully unrecognized by human’s beings but are classified with high accuracy rates by the trained networks [12]. There has been a lot of progress in the field of feature detectors and descriptors, and various algorithms and strategies for object and scene categorization have been created. The resemblance between object detectors, texture filters, and filter banks is often enticing.

IV. METHODOLOGY

The main aim of our work is to understand the performance of the networks for static as well as live video feeds. the primary step for the subsequent is to perform transfer learning on the networks with image datasets. this is often followed by checking the prediction rate of the identical object on static images and real-time video feeds. the various accuracy rates are observed and noted and presented within the tables given in further sections. The third essential criterion for assessing performance was to see if prediction accuracy differed among all CNNs used in the study.

It must be noted that videos aren't used as a training dataset, they're used as testing datasets. Hence we are trying to find best image classifier where the object is the main attribute for classification of scene category. Different layers of the convolutional neural network used are:

Input Layer: The primary layer of each CNN used is ‘input layer’ which takes images, resize them for passing onto further layers for feature extraction.
Convolution Layer: The next few layers are 'Convolution layers,' which operate as image filters, allowing you to extract features from images and calculate match feature points during testing.
Pooling Layer: The extracted feature sets are then passed to ‘pooling layer’. This layer takes large images and shrink them down while preserving the foremost important information in them. It keeps the utmost value from each window, it preserves the simplest fits of each feature within the window.
Rectified Linear Measured Layer: The next ‘Rectified Linear Unit’ or ReLU layer swaps every negative number of the pooling layer with 0. This keeps learnt values from being stuck near 0 or berating toward infinity, allowing the CNN to remain mathematically stable.
Fully Connected Layer: The ultimate layer is the fully connected layers which takes the high-level filtered images and translate them into categories with labels.

The implementation of the image classification flowchart is done using Tensorflow. This flow chart shows that classification begins with exploring and understanding the data. After this input pipeline is created, the CNN is applied to train the model. For CNN, the test depends on the image of the leaves. If the output does not match your expectations, you will need to restart your CNN to get accurate results. This process ends when the output falls into the specified category.

VI. PROPOSED SYSTEM
Our system will operate according to the system architecture depicted in the diagram below, capturing images either through a digital camera or through a database. For the next step, each image will be normalised to a predetermined size. We employ feature extraction approaches as M-BTC (Block Transition Coding), Histogram Equlization, and others to reduncate dimentianality.Feature vectors are formed by extracting features from a picture using various approaches such as MBTC (Block Transition Coding), Histogram Equlization, and so on.The NN will be given this processed image to use in the classification process.

There are many open source frameworks used to implement deep learning. The most well-known framework is Tensorflow. This is the framework used to implement deep learning. Deep learning is about making a computer aware of objects, shapes, and languages. You can think of it in the same way as machine learning. In traditional applications, computers are manually taught by humans how to recognize the unique properties of objects, but in deep learning this is not the case. Deep learning builds a neural network that takes on the task of characterizing an image. Neural networks have an input layer, n hidden layers, and an output layer. As you enter the image, it passes through n hidden layers, each responsible for performing a particular operation, and finally produces output in the output layer. In this way, instead of manually understanding how to classify images, we ask the system to learn by finding different patterns in different images and assigning the appropriate classes. Our system also handles the creation of different types of neural networks that train themselves by observing the patterns they contain. Currently, our system focuses on creating only four classes (indoor, outdoor, cat, dog). Our system is developed using Python and the Tensorflow framework

VII. COMPARISON OF IMAGE CLASSIFICATION MODEL AND RESULT

Deep learning models include various image classification models used in real-world applications. Many methods have been developed and are emerging. Therefore, here are some basics of other models compared to advanced CNNs.

Deep Neural Networks (DNN) are used to train neural networks for regression and classification. DNN performance is not good for images due to its low accuracy.
Convolutional Neural Networks (CNN) have proven to be very successful in image classification and objects. Identification, recognition, etc. Here, the results are highly optimized compared to DNN. However, with CNNs, the loss of verification is high, leading to overfitting.
Transfer learning is another approach used to reuse acquired knowledge. This means that already trained models will be used in large datasets to get good results in the relevant work. However, the accuracy is high here and the time is shorter than other products.

However, you can continuously improve accuracy and time management by adding data extensions, epochs, and most important layers. Therefore, Advanced CNN is a perfect replacement for all of these.

	Accuracy rate	Time Consume	Error rate	Validation loss
DNN	70-80%	6.4Hrs	Very high	7.8
CNN	90%	5.4Hrs	High	3.3
Transfer Learning	92%	12mins	Low	0.64
Advanced CNN	More than 95%	8mins	Very low	0.3

Our model is in 10,118 test cases Misclassified a total of 661 images after 300 images Epoch with a detection rate of 93.47% It is shown in Table 1. The result is pretty good for 3 people For such a simple model with a hundred epochs and a CPU Training with less training time (about 3 hours).

The test accuracy of the model is 93.47 percent. Well-versed in forecasting. The training set has a size of As the volume of data grows, so does the accuracy.The more data you have in your training set, the less impact your training and test errors will have, and so the accuracy will improve.

Conclusion

In conclusion, this study focuses on picture classification using TensorFlow and deep learning. It has three (3) goals that have been met during this study. The objectives are intimately linked to the conclusions since they can decide whether or not all of the objectives are met. It may be stated that all of the findings acquired thus far have been extremely impressive. The deep neural network (DNN) becomes the main focus of this study, particularly in the field of image categorization. The DNN technique was investigated in further depth, beginning with the assembly, training model, and classification of images into categories. Epochs\' responsibilities in DNN were able to control accuracy while also preventing issues like overfitting.The use of TensorFlow to implement deep learning yielded positive results, as it was able to simulate, train, and classify five (5) different varieties of flowers with up to 90% accuracy. Finally, Python was selected as the programming language throughout this study since it is compatible with the TensorFlow framework, which allows for the complete design of the system to be done in Python.

References

[1] https://in.mathworks.com/matlabcentral/fileexchange/59133-neural-network-toolbox-tm--model-for-alexnet-networ [2] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierar- chical representations. In Proceedings of the 26th Annual Interna- tional Conference on Machine Learning, pages 609–616. ACM, 2009 [3] Deep Learning with MATLAB – matlab expo2018 [4] Introducing Deep Learning with the MATLAB – Deep Learning E-Book provided by the mathworks. [5] KISHORE, P.V.V., KISHORE, S.R.C. and PRASAD, M.V.D., 2013. Conglomeration of hand shapes and texture information for recognizing gestures of indian sign language using feed forward neural networks. International Journal of Engineering and Tech- nology, 5(5), pp. 3742-3756. [6] RAMKIRAN, D.S., MADHAV, B.T.P., PRASANTH, A.M., HARSHA, N.S., VARDHAN, V., AVINASH, K., CHAITANYA, M.N. and NAGASAI, U.S., 2015. Novel compact asymmetrical fractal aperture Notch band antenna. Leonardo Electronic Journal of Practices and Technologies, 14(27), pp. 1-12. [7] KARTHIK, G.V.S., FATHIMA, S.Y., RAHMAN, M.Z.U., AHAMED, S.R. and LAY-EKUAKILLE, A., 2013. Efficient sig- nal conditioning techniques for brain activity in remote health monitoring network. IEEE Sensors Journal, 13(9), pp. 3273-3283. [8] KISHORE, P.V.V., PRASAD, M.V.D., PRASAD, C.R. and RA- HUL, R., 2015. 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN, International Con- ference on Signal Processing and Communication Engineering Systems - Proceedings of SPACES 2015, in Association with IEEE 2015, pp. 34-38. [9] LeCun, Y., Bottou, L., Bengio, Y., &Haffner, P. (1998) “Gradient-based learning applied to document recognition.” proceedings of the IEEE 86(11): 2278-2324. [10] Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., &Salakhutdinov, R. (2014) “Dropout: a simple way to prevent neural networks from overfitting.”Journal of machine learning research 15(1): 1929-1958. [11] Eitz, M., Hays, J., & Alexa, M. (2012) “How do humans sketch objects?”ACM Trans. Graph., 31(4). [12] Ballester, P., & deAraújo, R. M. (2016, February)“On the Performance of GoogLeNet and AlexNet Applied to Sketches.” in AAAI. [13] Yang, Y., &Hospedales, T. M. (2015)“Deep neural networks for sketch recognition”.

Copyright

Copyright © 2022 Ankit Yadav, Ibtesaam Rais, Manoj Kumar, Anurag Sharma, Abhay Kushwaha. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET43385

Publish Date : 2022-05-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here