Food Dish Pedictor using YOLOv4

Authors: Preeti Bailke, Krisha Patel, Aayushi Patel, Rohan More, Sudhanshu Pathrabe, Shreyash Patil

DOI Link: https://doi.org/10.22214/ijraset.2023.53481

Abstract

In recent years, the demand for automated food recognition systems has increased due to the large variety of dishes available especially in Indian cuisine and also due to the growing awareness of the importance of a healthy diet. Identifying multiple food items from an image is a challenging task, especially when it comes to Indian cuisine, which is known for its diverse range of dishes and ingredients. The goal of this paper is to create a food dish predictor for five common Indian cuisines utilizing the cutting-edge object detection algorithm - YOLOv4. The five foods selected for this study are Aloo paratha, Biryani, Poha, Khichdi, and Chapati. A dataset is created by taking the images of selected dishes from various platforms such as social media and forming classes, which was then used to train the YOLOv4 model. To accurately train the machine to identify the dishes, the dataset was manually labeled. The YOLOv4-based food dish predictor that has been proposed could be put to use in a number of applications. The predictor can assure proper order fulfillment in food delivery services by recognising food plates in real-time from customer-provided photos. The predictor\'s capacity to automatically extract dish information can assist menu recognition software, saving restaurant personnel time and allowing for rapid menu revisions. Meal recommendation engines can utilize the predictor to provide personalized meal choices based on user preferences and previous eating experiences. Overall, the YOLOv4-based food dish predictor helps multiple sectors of the food business to increase productivity, consumer experiences, and decision-making capacities. The findings of this investigation show how well YOLOv4 can recognise and identify food items, particularly Indian food items.

Introduction

I. INTRODUCTION

India has a rich and diverse culinary heritage and with a vast variety of flavors and spices, Indian cuisine is both rich and eclectic. There is an increasing demand to automatically identify and recognise food items present in a dish from an image of that dish with the growth of meal delivery services and restaurant recommendation systems, especially for Indian cuisine. These systems can be integrated with several applications to enhance user experience, such as in restaurant recommendation systems, food ordering apps and recipe websites. For restaurants and catering services, these systems can help in menu planning and cost estimation. By predicting the food items in a meal, chefs and catering services can better plan for ingredients and quantities required, leading to better cost estimation and less food waste. For this ingredients are identified from meals, so as per sell quantity inventory can be managed and as size of inventory is known it will help in cost estimation also.

With the help of these systems, users can simply take a picture of a meal and receive information about all the dishes in the picture, along with their ingredients and nutritional information. By analyzing the food items in a meal and their nutritional information, these systems can provide personalized health and nutrition recommendations and can be helpful for people with dietary restrictions, allergies, or health conditions to identify suitable meal options. Also, these systems can educate people about the different Indian dishes and their origins, promoting cultural understanding and appreciation. The accuracy and effectiveness of the system depend on the quality of the models and algorithms used. Systems for predicting food dish names from an image can greatly enhance our interactions with food and make food-related tasks more efficient.

Indian food dishes usually come in as a cluster of multiple items, which makes it even more difficult to recognize the names of each item accurately. Several object detection algorithms such as YOLO, SSD, R-CNN, etc. have been extensively used to identify multiple objects in photos, including food dishes. These models can identify the location and type of different objects in the image by generating bounding boxes around them.

In this study, a food dish predictor is suggested which utilizes YOLOv4 for detecting five well-known Indian dishes: Aloo paratha, Biryani, Poha, Khichdi, and Chapati. Due to their widespread appeal and variety within Indian cuisine, these dishes were picked. The suggested model trains the YOLOv4 model to reliably detect and recognise the five Indian food items using a sizable dataset of annotated photos. To make sure that the model is trained on high-quality data, which is essential for producing accurate predictions, the dataset was manually annotated.

The YOLOv4-based food dish predictor presented in this research holds significant potential for various applications, such as automated restaurant menu recognition and meal delivery services. By leveraging the capabilities of YOLOv4, this predictor offers an efficient and accurate solution for streamlining food-related processes in diverse settings, ultimately enhancing efficiency and user experience in the domain of Indian food recognition.

II. LITERATURE REVIEW

This paper presents an automatic food detection system for Indian cuisine using color and shape features. RGB to HSV color space conversion is performed, followed by effective pre-processing. The system employs KNN and SVM for classification, with SVM demonstrating superior performance compared to KNN. The proposed system enhances the identification of Indian food items, offering accurate results and contributing to the advancement of food detection technology in the domain of food recognition.
The paper introduces two labeled datasets, namely Indian-Food10 and Indian-Food20, which serve as valuable resources for food detection research. Indian-Food10 consists of over 12,000 photos capturing 10 common Indian food dishes, while Indian-Food20 extends the collection with 10 additional Indian food items. The food detection task is accomplished using YOLOv4, a state-of-the-art model. The proposed model achieves an impressive mean Average Precision (mAP) score of 91.8% and an f1-score of 0.9 for the Indian-Food10 dataset. These datasets and the high-performing YOLOv4 model contribute to advancing the field of food detection, specifically for Indian cuisine.
This paper presents CD-DIET, a system designed to predict chronic diseases based on patient-provided symptoms and provide corresponding diet plans incorporating traditional Indian food. The system utilizes two datasets: a disease prediction dataset generated from discharge summaries of patients in New York hospitals, and a Food dataset consisting of lists of traditional Indian meals recommended by systems like AYUSH. Convolutional Neural Network (CNN) is employed for disease prediction, while a fast bully algorithm is utilized for diet recommendation. CD-DIET offers a comprehensive approach to mitigate the impact of chronic diseases by leveraging symptom-based prediction and tailored diet plans rooted in traditional Indian medical systems.
This system offers the capability to identify food product ingredients through barcode scanning or direct input from the ingredient table. It also provides allergen alerts and health suggestions. The system consists of three major components: the ingredient scanning module, health suggestion module, and the food & health database. Utilizing the Tesseract-OCR engine, the system achieves high accuracy in text recognition, albeit at a slightly slower speed. By integrating these components, users can effortlessly obtain ingredient information, receive alerts for allergens, and access valuable health suggestions, ensuring a more informed and personalized approach to food choices and well-being.
This paper introduces a novel approach that goes beyond allergen classification by incorporating product-to-store mapping. When users utilize a safety-check application, the system fetches their location data and creates records for mapping purposes. This crowd-sourcing app allows other users to easily find safe food options by providing store information. To streamline the process and avoid excessive calls to the label API, product information is stored in an eatables database. This efficient mapping and data storage system enhances the user experience, promotes food safety awareness, and facilitates the discovery of allergen-free products at nearby stores.
This paper presents a novel dietary tracking system that enables users to capture photos of their meals, allowing the system to automatically recognize the food and analyze its nutritional values. The system focuses on Singapore hawker food and utilizes a carefully constructed dataset with 249 food categories covering a wide range of items from the region. Food recognition is performed using the ResNet-50 model. The system achieves a recognition accuracy of 75.2% on Top-1 and 93.1% on Top-5 for popular food categories. This dietary tracking system offers an efficient and accurate solution for monitoring nutritional intake based on visual meal analysis.
In their paper, Teng, Lin, and Adamic present a method for recipe recommendation using ingredient networks. The authors construct ingredient co-occurrence networks from a large collection of recipes and use them to represent the relationships between ingredients. They then propose a recommendation algorithm based on the similarity of ingredient networks between a user's preferred ingredients and those in the target recipe. The authors evaluate their approach using a dataset of 1,000 recipes and show that it outperforms several baseline methods in terms of accuracy and diversity of recommended recipes. The paper provides a promising approach to personalized recipe recommendation that takes into account the relationships between ingredients.
Freyne and Berkovsky present an approach to personalized recipe recommendation called "intelligent food planning". The authors propose a hybrid recommendation algorithm that combines content-based and collaborative filtering methods to generate personalized recipe suggestions for users. The system takes into account the user's food preferences, dietary requirements, and available ingredients, and recommends recipes that meet these criteria. The authors evaluate their approach using a user study and show that it outperforms several baseline methods in terms of user satisfaction and recipe relevance. The paper provides a promising approach to personalized recipe recommendation that takes into account user preferences and constraints.

III. METHODOLOGY

The project started off with the process of data preparation that is converting the images, removing the unwanted classes from the datasets and then adding the label files so that the YOLOv4 model can take this for training. Additionally, it had 10 classes of data of Indian cuisines so as to remove the 5 classes we started off by writing functions to remove these images and then remove the corresponding ID given to each class and the renaming the ID in correct order so that they can be used for training. Initially we started off by taking around 1000 images per class for training so that the cfg file was changed correspondingly to the correct values of filters, number of epochs, and then the batch size which is to be kept 80% and 90 % of the input size respectively.

After completing the data pre-processing part for training, the next step was creating the custom input data files that are obj.data and obj.names.

The first file basically contains all the paths of the data such as the training path, testing path then the directory to store the backup and the weights file. The later one that is the obj.names is the file containing the classes in the same order to that of they are numbered. Any additional class or wrong number would lead to wrong training.

For splitting the data into training/ testing the technique used is adding all the paths of the images to a text file. After that using pandas these files are converted into data frames which are then divided into training :70% testing: 30%

Google colab platform is used for training purposes on which using the darknet directory one can train YOLOv4. So as to train YOLOv4 on a custom dataset a change in certain files is needed such that all the images are to be added in the data folder under a sub folder called obj.

This obj folder is later used as reference to all the names and data files. Once this is done and the obj.data and obj.names file are created and the training of the model is started. GPU is a must for training the YOLOv4 model.

A. Algorithm

We can model CNN to the prediction of YOLOv4. The output of YOLOv4 can be fed into the CNN model after object detection has already been completed. One strategy is to use the bounding boxes produced by YOLOv4 to clip the identified items from the original image, and then run the CNN model on the cropped images to extract features.

The following approach can be implemented :

The input video or picture is converted into a grid of cells. Each cell is in charge of identifying items in a certain area of the image.
To identify the items and their accompanying bounding boxes, apply the YOLOv4 object detection method to the supplied image.
Use the bounding box coordinates to crop the matching area from the original image for each object that was detected.
Use a pre-trained CNN model to extract information from each cropped image.
To forecast the characteristics or traits of the identified objects, such as their kind, orientation, size, or color, feed the collected features into a classifier model.
To obtain the final item detection results, combine the predicted attributes with the bounding boxes generated by YOLOv4.

The YOLOv4 algorithm's final result is a set of bounding boxes, each of which represents an object that was found in the image or video. Each bounding box additionally has a class label and a confidence score that express the degree of trust the neural network has in the classification of the object.

By utilizing a CNN model to extract more particular and discriminative characteristics from the objects, this method may be able to increase the object detection algorithm's accuracy and specificity. However, because the CNN model must analyze numerous cropped images, it may significantly raise computing costs and memory utilization.

B. Flowchart

Figure 1 represents the overall flow of how the names of the food items present in the given image is detected. It shows the working of complete model in which step 1 is to fed the input image to the YOLOv4 model the results of the model are stored and the bounding box which is made by the YOLOv4 model around the detected object is cropped which serves as the input for the second model which is a CNN model then ultimately the results of both the models are combined to give one final output.

IV. RESULTS AND DISCUSSION

With the use of customized datasets, the potent object detection system YOLOv4 can be trained to recognise certain things. For evaluating an object detection model's accuracy is mean average precision (mAP).The mean average precision (mAP) is a measurement used to assess object detection models like R-CNN and YOLO. The mAP calculates a score by comparing the detected box to the ground-truth bounding box. The model's detections are more precise the higher the score.

In figure 2 we can see the progress of training, that is the mAP graph and the loss graph both increasing and decreasing simultaneously for 3600 iterations. In this model mAP is calculated after the first 1000 iterations so from the figure it can be seen that at 1000 iterations it started with 22% mAP score. Following the same graph in figure 3 the mAP score after 7000 iterations went to 78% which then till 9000 iterations goes up to 87% and from the figure 4 which shows the last part of the training that is the last 1000 iterations the mAP score reaches 86 to 87 again.

The loss which gradually decreases along all the iterations can be seen in all the three figures 1,2 and 3. The blue curve represents the training error or loss (more specifically, the Complete Intersection-Over-Union or CIoU loss for YOLOv4) on the training dataset. The blue curve represents the training error or loss (more specifically, the Complete Intersection-Over-Union or CIoU loss for YOLOv4) on the training dataset.

V. FUTURE SCOPE

Extension of the food dish predictor to recognise other Indian foods and perhaps other cuisines could be the subject of future research. Using larger datasets and implementing additional pre-processing methods to increase the quality of the training images could also help the model perform even better.

Conclusion

We have created a food dish predictor using YOLOv4 to recognise five well-known Indian dishes: Aloo paratha, Biryani, Poha, Khichdi, and Chapati. The suggested model demonstrated great accuracy in detecting and identifying Indian cuisine items after being trained on a sizable dataset of annotated photos, identifying items in a certain area of the image. Table 1. Loss and mAP values Iteration Loss mAP 1k 3.9 22 2k 3.8 50 3.6k 2.5 69 In table 1,the loss and mAP values are shown. The training process is repeated 3600 times, with the loss and mAP values recorded every 1,000 iterations. As training goes, we can see that the loss lowers and the mAP improves, indicating that the model is growing and becoming better at identifying things. The findings of this investigation show how well YOLOv4 recognises and classifies food items, particularly Indian cuisine items. The potential applications for the suggested food dish predictor include meal delivery services, restaurant menu recognition, and food recommendation systems notifying items in a certain area of the image. Overall, the creation of a YOLOv4-based food dish predictor has huge ramifications for the food sector and has the potential to change the way we recognise and identify food meals.

References

[1] Dr. Damala Dayakar Rao, Nanuri Pandu Ranga, “Indian Food Image Classification Using K-Nearest-Neighbour and Support-Vector-Machines”, International Journal for Advanced Research In Science and Technology, Volume 09, Issue 06, Jun 2021. [2] Deepanshu Pandey, Purva Parmar, Gauri Toshniwal, Mansi Goel, Vishesh Agrawal, Shivangi Dhiman, Lavanya Gupta, Ganesh Bagler, “Object Detection in Indian Food Platters usingTransfer Learning with YOLOv4”, May 2022, arXiv:2205.04841. [3] C.B Selvalakshmi, R.Amriitha, U.Fathima Shafa, M.Karpaga Meena, R.B Vandhana, “CD-DIET: A Prediction and Food Recommendation System for Chronic Diseases”, International Journal of Innovative Science and Research Technology, Volume 5, Issue 5 - May 2020. [4] Man Wong, Qing Ye, Yuk Chan Kylar, Wai-Man Pang, Kin Kwan, “A Mobile Adviser of Healthy Eating by Reading Ingredient Labels”, Wireless Mobile Communication and Healthcare, 6th International Conference, MobiHealth 2016. [5] Prajapati, Sabina, “Eatable: An Application That Helps People With Food Allergies Check and Locate Allergen-Free Food Products”, Master\'s thesis, Harvard Extension School, 2017. [6] Zhao-Yan Ming, Jingjing Chen, Yu Cao, Ciarán Forde, Chong-Wah Ngo, Tat Seng Chua, “Food Photo Recognition for Dietary Tracking: System and Experiment”, International Conference on Multimedia Modeling, 2018. [7] C.-Y. Teng, Y.-R. Lin, and L. A. Adamic, “Recipe recommendation using ingredient networks”, Annual ACM Web Science Conference, 2012. [8] J. Freyne, S. Berkovsky, “Intelligent food planning: personalized recipe recommendation”, International conference on Intelligent user interface, 2010.

Copyright

Copyright © 2023 Preeti Bailke, Krisha Patel, Aayushi Patel, Rohan More, Sudhanshu Pathrabe, Shreyash Patil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET53481

Publish Date : 2023-05-31

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here