The process of billing a product in a retail outlet is done using the system of pre-made barcodes or RFIDs. This is done to keep a track of the items and identify them in the billing process, but it is time-consuming and labour-intensive as all items need to be affixed with barcodes before being used. Most of the small to medium scale retail stores also have a large part of their sales with items of a variable quantity, which cannot be affixed with barcodes as their quantity is determined dynamically and according to the customer’s needs. In this paper, we have explored various technologies including but not limited to computer vision, object detection and image recognition, especially due to its latest boom with machine learning and deep learning. We have briefed our findings on technologies that can be used to build efficient and novel systems to solve the issues that arise with using systems that are labour intensive, redundant, time-consuming and explore ways to incorporate items that can be packed with variable quantities.
Introduction
I. INTRODUCTION
The system of using barcodes and RFIDs to identify the products is widely used when automating the billing process. While this does reduce the time needed to bill an item when compared to the manual method of billing, it is not as efficient because it needs a lot of pre-processing work. Each and every product has to be individually affixed with a distinct barcode every time. This process is tedious, time-consuming and labour intensive. Items with variable quantities also exist in most stores, these items cannot be billed using barcodes as they are packaged during checkout and cannot be affixed with barcodes without creating new barcodes, this is again a time-consuming and redundant process. This system of barcodes is so widely used because it makes the identification of an entity by a machine possible as the machine can understand and easily differentiate the barcodes compared to other natural stimuli data regarding the physical appearance of the entity. Machines have a tough time recognizing images and other visual data using conventional technologies, thus we have to look for a new paradigm of techniques that change the way machines work. It is when we came across the concept of Machine Learning. Machine learning addresses this problem as the systems built using machine learning do not need to have set rules for classification and identification problems. This underlying philosophy can be used to create a billing system that can learn to classify and identify the actual product rather than working with a barcode that is in turn used for product identification. Henceforth, we will discuss some of the terminologies and mechanisms that can be used to create a billing system using the various technologies and sub-fields of machine learning in this paper.
II. LITERATURE REVIEW
M. I. Jordan et .al [1] suggested that Machine learning addresses the question of how to build computers that improve automatically through experience. Even the infamous Bill Gates of Microsoft has quoted “A breakthrough in machine learning would be worth ten Microsofts”. Machine learning is seeing a lot of traction and development in current times mainly because of the advancements in Big Data [4] and various other data analysis and data science fields. Even though Machine Learning had a relatively early start in 1952 by Arthur Samuel, it did not take off due to the lack of data needed to effectively create systems that can be trained with machine learning. Wo Chang et .al [4] remarked that never before have we humans created so much digital data and the collective digital footprint of humanity continues to grow almost exponentially according to. All these factors lead to an unprecedented advancement drive in machine learning using all the data we currently hold to train and create models that can again learn and grow.
Machine Learning uses a different working philosophy compared to conventional programming. In conventional programming techniques, we make rules for machines that it can use to change the input to the desired output, but in machine learning, we give the computer enough input and output so that it can create its own rules based on the input-output associations and the trends that accompany them. The system then uses those rules to work on data that is similar and homogeneous to its input-output data.[1] It gave rise to software systems that create rules for programs by studying the data and evolving. Machine learning again has further advancements in its subfields, especially in deep learning.
Machine Learning can be broadly classified into three categories, they are, reinforcement learning, unsupervised learning, and finally the most widely used supervised learning [1]. Our desired system requires supervised ways of learning so we can implement computer vision and object detection. Supervised learning is widely used in various classification tasks and since image recognition is also a way of classification, we need to look into supervised learning. Machine Learning is currently being extensively used to create systems in the fields of web search, computational biology, finance, E-commerce, space exploration, robotics, Information extraction, social networks and system debugging [2]. Thus, we need to look into how machine learning can be used in the field of billing and inventory management.
Deep earning is a subfield of machine learning that deals with algorithms stimulated through the shape and characteristics of the human brain called artificial neural networks. Yann LeCunn et .al [5] suggested that deep neural networks let in computational fashions which might be composed of multiple processing layers to study representations of statistics with multiple ranges of abstraction. These new strategies have substantially progressed the development and improvements in speech recognition, visual object recognition, object detection [2]. and many different domains which include drug discovery and genomics
Ajeet Ram Pathak et .al [2] suggests that Object detection is the first step in any visual recognition activity, it is the methodology of deciding the example of the class to which the object has a place and assessing the area of the object by yielding the bounding box around the object. Identifying a single example of class from the image is called single-class object detection while identifying the classes of all objects present in the image is known as multi-class object detection. Various difficulties like fractional/full impediment, differing light circumstances, presents, scale, and so forth are required to have been taken care of while playing out the object detection. After successfully detecting the image, the next step involves classifying the image.
Deepika et .al [6] suggests that image classification is the name given to the technique where a computer can break down an image and distinguish the 'class' the image falls under. A class is an abstract set of rules that can be used to effectively distinguish each image from one another. Images are first analysed together to form trends and classes. These classes can later be used to classify a new image. The technique of deciding the class is based on picking the class highest probability metric.
One of the most famous deep neural networks is the Convolutional Neural Network (CNN). It is the broadly explored region in the area of computer vision and has accomplished noteworthy brings about overall rivalries like ILSVRC, PASCAL VOC, and Microsoft COCO with the assistance of deep learning [3]. It takes this call from a mathematical linear operation among matrixes called convolution. The CNN has a top-notch performance in device gaining knowledge of problems. Especially the applications that deal with picture data, including the biggest image dataset (Image Net), computer vision, object detection and natural language processing and the outcomes completed have been very splendid [3]. The results of ILSVRC 2010 shows us that systems built on convolutional neural networks are marginally better any past classification system by achieving the lowest recorded error rate of 17% [3].
R. Girshick [8] suggested an upgraded version of the convolutional neural network called Fast R-CNN, which stands for Fast Range based Convolutional Neural Networks. It's a technique where we utilize a particular search to select only 2000 regions from the image and he called them region proposals. Hence, reducing the workload significantly instead of attempting to group an immense number of regions, you can simply work with 2000 regions.
R Girshick et .al [9] again suggested an upgraded version of his own Fast R-CNN called Faster Range based Convolutional neural Networks. This solved the drawbacks of R-CNN and could be implemented for object detection with faster runtimes. The methodology is like the R-CNN algorithm. However, rather than using the region proposals to the CNN, we feed the input image to the CNN to create a convolutional feature map.
Kaiming He et .al [10] teamed with R Girshick [8, 9] and other experts to create another improved version the convolutional Neural network called Mask R-CNN. It quickly became state-of-the-art for image segmentation problems. It works by creating a variant of the deep neural networks that can detect objects in an image to create a segmentation mask during each instance. Mask R-CNN uses image segmentation algorithms to effective implement object detection.
Golnaz Ghiasi et .al [11] suggested another implementation of deep neural networks called Neural Architecture Search – Feature Pyramid or NAS-FPN, It is a pyramid portrayal for deep learning that joins low-resolution however solid semantic highlights and weak semantic yet high-resolution highlights through hierarchical and parallel associations. NAS-FPN is a programmed neural design search technique that spotlights on tracking down ideal associations between various layers for pyramidal portrayals. It is one of the few models that was trained on the MS COCO dataset, and along these lines, it sums up well to a large number of natural use cases.
III. PROPOSED SYSTEM
After thorough analysis of the available technologies that can be used to replace the current system of using bar codes and RFIDs, we have proposed a system that uses Machine Learning and Convolutional Neural Networks to establish product classification instead of the existing one. The identification of products will be done using a live feed of the product. The product need not be changed or altered in order to be identified as all of the processing work that is needed to identify the product is done once and can be used for the product till it changes its recognizable image. This system is relatively less labor and time intensive and reduces the cost of maintenance and operation for small to medium scale retail outlets. Many grocery shops which sell non-pre-packaged products and now incorporate them into the billing process, which could not have been possible if we used the current system.
Conclusion
Due to the continuous development of powerful computing equipment, object detection technology based on deep learning has also been developed rapidly, this helps people to implement efficient versions of systems we already have. One of the areas we are focusing on is the system of semi-autonomous billing. After reviewing some of the commonly used detector algorithms, the available technologies needed for our proposed technology and related work, we can conclude that a system which uses machine learning to detect and identify products during the process of billing can be implemented and efficiently used to bill items and incorporate variable quantity items into the semi-automated process.
References
[1] Jordan, M.I., Mitchel, T.M (2015), “Machine learning: trends, perspectives, and prospects”. In Science (vol. 349, issue 6245).
[2] Ajeet Ram Pathak, Manjusha Pandey, Siddharth Rautaray (2018), “Application of Deep Learning for Object Detection”. In Procedia Computer Science (vol. 132, pages 1706-1717).
[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. (2012). “Imagenet Classification with Deep Convolutional Neural Networks”. In Advances in Neural Information Processing Systems, (pages 1097–1105).
[4] Chang, Wo L (2018), “NIST Big Data Interoperability Framework: Volume 3, Use Cases and General Requirements”. In National Institute of Standards and Technology (NIST) Special Publication 1500-3r1
[5] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015), \"Deep learning\" In Nature (vol. 521, pages 436-444).
[6] Jaswal, Deepika & Vishvanathan, Sowmya & Kp, Soman (2014), “Image Classification Using Convolutional Neural Networks”. In International Journal of Scientific and Engineering Research. (vol. 5, pages 661-1668)
[7] Meiyin Wu and Li Chen (2015), \"Image recognition based on deep learning\". In 2015 Chinese Automation Congress (CAC) (pages 542-546).
[8] R. Girshick (2015), “Fast R-CNN,”. In Proc. IEEE Int. Conf. Comput. Vis. (ICCV), (pages 14401448).
[9] S. Ren, K. He, R. Girshick (2017), and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks”. In IEEE Trans. Pattern Anal. Mach. Intell(vol. 39, no. 6, pages 1137-1149).
[10] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017), Mask R-CNN”. In Proc. IEEE ICCV, (pages 2980-2988).
[11] G. Ghiasi, T.-Y. Lin, R. Pang, and Q. V. Le (2019), “NAS-FPN: Learning scalable feature pyramid architecture for object detection”. (pages 7029-7038).