Identification of Medicinal Plants using Deep Learning

Authors: R. Upendar Rao, M. Sai Lahari , K. Pavana Sri, K. Yaminee Srujana , D. Yaswanth

DOI Link: https://doi.org/10.22214/ijraset.2022.41190

Abstract

Identification of the correct medicinal plants that goes in to the preparation of a medicine is very important in ayurvedic, folk and herbal medicinal industry. The main features required to identify a medicinal plant is its leaf shape, color and texture. Color and texture from both sides of the leaf contain deterministic parameters to identify the species. In this project we explore feature vectors from both the front and back side of a green leaf along with morphological features to arrive at a unique optimum combination of features that maximizes the identification rate. A database of medicinal plant leaves is created from scanned images of front and back side of leaves of commonly used medicinal plants. The leaves are classified based on the shape and dimension combination. It is expected that for the automatic identification of medicinal plants this system will help the community people to develop their knowledge on medicinal plants, help taxonomists to develop more efficient species identification techniques and also participate significantly in the pharmaceutical drug manufacturing.

Introduction

I. INTRODUCTION

A. Introduction

The world bears thousands of plant species, many of which have medicinal values, others are close to extinction, and still others that are harmful to man. Not only are plants an essential resource for human beings, but they form the base of all food chains. The medicinal plants are used mostly in herbal, ayurvedic and folk medicinal manufacturing.

Herbal plants are plants that can be used for alternatives to cure diseases naturally. About 80% of people in the world still depend on traditional medicine. Meanwhile, according to herbal plants are plants whose plant parts (leaves, stems, or roots) have properties that can be used as raw materials in making modern medicines or traditional medicines. These medicinal plants are often found in the forest. There are various types of herbal plants that we can know through the identification of these herbs, one of which is using identification through the leaves. and protect plant species, it is crucial to study and classify plants correctly. Combinations of a small subset amounting to 1500 of these plants are used in Herbal medicines of different systems of India. Specifically, commercial Ayurvedic preparations use 500 of these plants. Over 80% of plants used in ayurvedic formulations are collected from the forests and wastelands whereas the remaining are cultivated in agricultural lands. More than 8000 plants of Indian origin have been found to be of medicinal value.

B. Motivation

In the ancient past, the Ayurvedic physicians themselves picked the medicinal plants and prepared the medicines for their patients. Today only a few practitioners follow this practice. The manufacturing and marketing of Ayurvedic drugs has become a thriving industry whose turnover exceeds Rs. 4000 crores. The number of licensed Ayurvedic medicine manufacturers in India easily exceeds 8500. This commercialization of Ayurvedic sector has brought in to focus several questions regarding the quality of raw materials used for Ayurvedic medicines. Today the plants are collected by women and children from forest areas; those are not professionally trained in identifying correct medicinal plants. Manufacturing units often receive incorrect or substituted medicinal plants. Most of these units lack adequate quality control mechanisms to screen these plants. In addition to this, confusion due to variations in local name is also rampant. Some plants arrive in dried form and this make the manual identification task much more difficult. Incorrect use of medicinal plants makes the Ayurvedic medicine ineffective. It may produce unpredictable side effects also. In this situation, strict measures for quality control must be enforced on Ayurvedic medicines and raw materials used by the industry in order to sustain the present growth of industry by maintaining the efficacy and credibility of medicines.

A trained Botanist looks for all the available features of the plants such as leaves, flowers, seeds, root and stem to identify plants. Except for the leaf, all others are 3D objects and increase the complexity of analysis by computer. However, plant leaves are 2D objects and carry sufficient information to identify the plant. Leaves can be collected easily and image acquisition may be carried out using inexpensive digital cameras, mobile phones or document scanners. It is available at any time of the year in contrast to flowers and seeds. Leaves acquire a specific colour, texture and shape when it grows and these changes are relatively insignificant. Plant recognition based on leaves depends on finding exact descriptors and extracting the feature vectors from it. Then the feature vectors of the training samples are compared with the feature vectors of the test sample to find the degree of similarity using an appropriate classifier.

C. Problem Definition

Deep learning is one of the major subfields of machine learning framework. Machine learning is the study of design of algorithms, inspired from the model of human brain. Deep learning is becoming more popular in data science fields like robotics, artificial intelligence (AI), audio & video recognition and image recognition. Artificial neural network is the core of deep learning methodologies. Deep learning is supported by various libraries such as Theano, TensorFlow, Caffe, Mx net etc., Keras is one of the most powerful and easy to use python library, which is built on top of popular deep learning libraries like TensorFlow, Theano, etc., for creating deep learning models. Detection of correct medicinal leaves can help botanists, taxonomists and drug manufacturers to make quality drug and can reduce the side effects caused by the wrong drug delivery. To identify the leaves of the plants, a type of artificial neural network called convolutional neural network (CNN) is used. The architecture we used here is Densenet121, which is a convolutional neural network that is a powerful model capable of achieving high accuracies on challenging datasets.

II. SYSTEM ANALYSIS

A. Existing System

To overcome the problems of conventional classification algorithms in recognizing medicinal plants, Kan et al [2017] suggested an automated classification approach depending on leaf images of medicinal herbs. The technique will initially preprocess medicinal plant leaf images, then calculate 10 shape features and 5 texture features, and then identify medicinal plant leaves through a support vector machine (SVM) classification. The classification model was used to identify twelve various medicinal plant leaf images, with an average rate of success of 93.3 %. The findings demonstrate that utilizing multi-feature extortion of leaf pictures in connection with SVM, it is possible to automatically categorize medicinal herbs. The report presents a useful conceptual framework for medicinal herbs categorization model research and development.

Alimboyong et al. [2018] suggested a computer vision method for recognizing ayurvedic medicinal plant species present in India's Western Ghats. A mixture of Surface plot and HOG attributes derived from leaf images, as well as a categorization utilizing a k-NN classifier, are used in the suggested approach. Studies have yielded findings that appear to be adequate for developing apps for real-world use. Prasad and Singh [2017] developed an information exchange from object recognition to plant genetic analysis using deep attributes to describe the original plant leaf image. Such deep attributes have been shown to outperform the current in plant species identification in experiments. The study demonstrated a novel and effective leaf collection method. The image is then translated into a device-independent lab color space, which is then utilized to construct the VGG-16 feature map. To boost the effectiveness of species identification, this feature map is re-projected to PCA subspace. The study utilizes two kinds of plant leaf collections to demonstrate the sturdiness.

Turkoglu and Hanbay [2019] employed image processing methods like color, vein attributes, Fourier Descriptors , and Gray-Level Co-occurrence Matrix (GLCM) approaches to retrieve features from images. Rather than obtaining attributes for the entire leaf, research advises using features collected from leaves separated into 2 or four pieces. Extreme Learning Machines (ELM) classifier calculates the separate and aggregate performances of every attribute extortion approach. The Flavia leaf database was used to test the proposed method. The suggested technique's efficiency was evaluated using 10-fold cross-validation, that was then contrasted and tabulated with approaches from other studies. Also on Fluvial leaf database, the suggested method's results were evaluated to be 99.10 %

Nuril Aslina, Nursuriati Jamil et al. used Scale Invariant Feature Transform (SIFT) as a shape descriptor and colour moments. The image is decomposed in to HSV planes and each plane is divided into 9 grids. Colour moments are calculated for each grid of every plane and used as feature vector. Least Euclidean distance between test and training sets are used for identification. Database is created by the authors by acquiring 40 leaf images of Malaysian herbs from natural habitat in natural light. An accuracy of 87.5% is obtained independent of scaling and rotation of images. SIFT is computationally intensive when used to extract key point features.

GLCM: Developed plant identification method using texture features alone. 10 texture features are extracted using GLCM of leaf image. These leaves are used for classification using dissimilarity method. The model is trained with 63 samples and tested with 33 leaves. As it was tested only on limited samples, the accuracy was low and it was not fully invariant to rotation of leaf.
Root Mean Square Error: In this method the leaf shape features and edges are extracted using predefined structural elements. Root mean square error method is used between captured image feature vectors and image in database that is trained. 10 different species of plants are considered with 40 leaf samples in the dataset. The identification rate is comparatively low to be in practical use.
SIFT: SIFT is used as a shape descriptor and color moments. Image is divided into HSV planes and each plane into 9 parts. Color moments are used as feature vectors. Least Euclidean distance is used between training and testing data set. 40 leaf species are considered.
Geometric Features: In this method five geometric features are used and a boundary descriptor named Directional fragment histogram for identification. 3070 scanned images are used and 897 scan like images. Another method used four geometrical features, hue invariant methods and polar fourier transform coefficients as shape descriptor for identification. The colour descriptor used is mean, standard deviation, skewness and kurtosis of RGB color planes. 100 samples are considered out of which 50 images are used for testing. This method is not fully invariant to rotation.

B. Proposed System

A novel method for identification of medicinal plants from images of different angles of both front and backside of the leaves have been proposed. The work is based on a database of leaf images of medicinal plants. Unique features of texture and shape combinations of morphological features have been identified, that maximizes the identification rate of green leaves. By using this method, when the image of any plant leaf is given to the system it gives whether the leaf belongs to the medicinal plant or not, local name, scientific name and properties of the leaf or the disease it cures along with the image of the leaf are displayed. In this method Dense Net type of Convolutional Neural Network(CNN) is used because of its several compelling advantages like it strengthens the feature propagation and also encourages feature reuse, which in turn increases the efficiency and decrease the loss of valuation. Here Keras is used for training the data to the model.

Data: The leaf samples of medicinal plants were collected. At least 30 leaves of 50 different medicinal plant species were collected. Perform a basic manual sampling and remove severely damaged leaves. Select 30 leaves of each species for scanning and 60 image samples per medicinal plant species are generated. The top and bottom faces of the leaves are captured. The below table is the list of leaves collected.

S.No	Local name	Scientific name
1.	Asthma plant	Euphorbia hirta
2.	Avaram	Senna auriculata
3.	Balloon vine	Cardiospermum halicacabum
4.	Bellyache bush (Green)	Jatropa gossypiifolia
5.	Benghal day flower	Commelia benghalensis
6.	Big caltrops	Gokru bada
7.	Black honey shrub	Phyllanthus recticulatus
8.	Bristly wild grape	Cyphostemma setosum
9.	Butterfly pea	Clitoria ternatea
10.	Cape gooseberry	Physalis peruviana
11.	Coat buttons	Tridax procumbens
12.	Common wireweed	Sida acuta
13.	Country mallow	Sida cortifolia
14.	Crown flower	Caltropis gigantea
15.	Green chireta	Andrographis paniculate
16.	Heart leaved moon seed	Tinospora cardifolia
17.	Holy basil	Ocimum tenuiflorum
18.	Indian copperleaf	Acalypha indica
19.	Indian jujube	Ziziphus maritiana
20.	Indian sarsaparilla	Hemidesmus indicus
21.	Indian stinging nettle	Urtica diocia
22.	Indian thorn apple	Datura inoxia
23.	Indian worm weed	Artemisia absinthium
24.	Ivy gourd	Coccinia grandis
25.	Kokilaksha	Hygrophila auriculata
26.	Land caltrops	Tribulus terrestris
27.	Madagascar periwrinkle	Catharanthus roseus
28.	Madras pea pumpkin	Cucumis maderaspatnus
29.	Malabar catmint	Anisomeles indica
30.	Mexican mint	Coleus amboinicus
31.	Mexican prickly poppy	Argemone Mexicana
32.	Mountain knotgrass	Aerva lanata
33.	Nalta jute	Corchorus olitorius
34.	Night blooming cereus	Selenicereus grandiflorus
35.	Panicled foldwing	Dicleptera paniculate
36.	Prickly chaff flower	Achyranthus aspera
37.	Punarnava	Boerhavia diffusa
38.	Purple fruited eggplant	Solanum toruvum
39.	Purple tephrosia	Tephorsia purpurea
40.	Rosary pea	Abrus precatorious
41.	Shaggy button weed	Spermacoce hispida linn
42.	Small water clover	Marsilea minuta
43.	Spider wisp	Cleome gynandra
44.	Square stalked wine	Cissus quadrangularise
45.	Stinking passion flower	Passiflora foetida
46.	Sweet basil	Ocimum basillium
47.	Sweet flag	Acorus calamus
48.	Tinnevelly senna	Aleexandrian senna
49.	Trellis vine	Liana
50.	Velvet bean	Mucunna prureins

Table 2.1 Leaf dataset used for training

III. ARCHITECTURE

A. Algorithm

Step 1: The input is taken in two ways- Using camera and the images.
Step 2: The front and back side of the leaves are scanned using a scanner that has the maximum possible resolution. These images are stored in leaf image dataset.
Step 3: These images are pre-processed. The dimensions of the images in the dataset are set to the required size.
Step 4: Now the pre-processed dataset is divided into testing and training dataset.
Step 5: Training data set is now driven as input to the Convolutional neural network.
Step 6: The output of the CNN layer along with the testing dataset is provided as input for the performance assessment. In this step the accuracy and loss of the model and validation set is considered, the accuracy and loss graphs are plotted accordingly using confusion matrix.
Step 7: The image i.e., the result of the output layer of convolutional neural network is displayed.

B. Flowchart

IV. SOFTWARE

Python 3.9 version is used as software and the IDE used is Jupyter Notebook. Keras is used to train the model. It is a high-level neural network library which trains the deep learning model by using epochs and back propagation. Epochs means considering the data into batches and training them through iterations, while training it checks for minimum loss and maximum accuracy. DenseNet is the type of CNN used and the library used for the numerical calculations in DenseNet is Tensorflow. It is an open source software library which performs computations using dataflow graphs and provides multiple application interfaces. The input activation function used in first layer of cnn is ReLU and the output activation function used in last layer of cnn is Softmax. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise it will output zero. Softmax scales the input values between 0 and 1 i.e., it is used to normalize the output. The optimizer type used is Adam and the learning rate is 0.001. Adam optimizer is a stochastic gradient descent method that is based on the estimation of first order and second order moments.

V. OUTPUT

A. Outputs For Image Datset

Training the Data: The considered dataset of 3777 images are trained using epochs and the accuracy and loss are calculated for each epoch. Epochs is the number of times a learning algorithm sees the complete dataset. One Epoch is when an entire dataset is passed forward and backward through the neural network only once. Loss is the summation of errors in the model and accuracy is the ratio of correct predictions to the total predictions.This process takes place through back propogation.

2. Displaying the Training Images: During the training process, some of the images are displyed that are used for training.

3. Input and Output: The path of the image from the test data is given in the path and the program is run. The output image is displayed with it’s local name, scientific name and the propoerties of the leaf or the disease it cures is displayed along with it’s image.

4. Output 1

Local name – Butterfly pea

Scientific name – Clitoria ternatea

Uses – rich in antioxidants

5. Output 2

Local name – Balloon vine

Scientific name – Cardiospermum halicacbum

Uses – treats nervous disorders and stiffness in limbs

6. Output 3

Local name – Bellyache bush

Scientific name – Jatropa gossypiifolia

Uses – contains antimicrobial, antidiabetic properties

7. Output 4

Local name – Purple fruited pea eggplant

Scientific name – Solanum torurum

Uses – rich in antioxidants, used in herbal teas and cosmetics

8. Output 5

Local name – Asthma plant

Scientific name – Euphoria hirta

Uses – cures breathing disorders

9. Output 6

Local name – Big caltrops

Scientific name – Gokru bada

Uses – Used in treatment of asthma, cough, edema

B. Output For Camera Capturing

Input-Capturing the leaf Using Camera: Initially when the code is run, the capturing window opens to capture the leaf as shown in the figure below.

2. Capturing the Frames: The leaf is placed in front of the camera and the image is written into the model on pressing the spacebar. When the image is written into the model it is displayed as “opencv_frame_0.png written!”. Maximum of three frames can be written into the model.

3. Displaying the Output: After capturing and reading into the model, escape button is pressed to close the camera window. The leaf name is displayed immediately after the camera window is closed. The closure of the camera window is also displayed as “escape hit, closing…”. Also, the number of frames captured is also displayed along with the leaf name.

4. Output-1

Bellyache Bush(Green) leaf is considered here and is captured using camera. The camera displays the ouput as shown in the figure.

5. Output-2

Mexican mint leaf is considered here and is captured using camera. The camera displays the output as shown in the figure.

6. Output-3

Nalta jute leaf is considered here and is captured using camera. The camera displays the ouput as shown in the figure.

C. Training And Validation Accuracy Graph

After the process of training,the graph between training accuracy, validation accuracy and 30 epochs is plotted. Here the blue line represents validation accuracy and the red line represents training accuracy. Accuracy is calculated using the below formula.

Number of correct predictions – positive predictions

Total numbers of predictions – sum of positive and negative predictions

Here positive predictions are the correctly predicted values according to the desired input and negative predictions are the error values or the incorrectly predicted values.

Here the values of the training accuracy and validation accuracy with respect to the epochs are mentioned in the below table.

	Model accuracy	Validation accuracy
1	0.178	0.444
2	0.436	0.600
3	0.557	0.665
4	0.620	0.739
5	0.682	0.772
6	0.701	0.784
7	0.736	0.796
8	0.748	0.798
9	0.770	0.804
10	0.782	0.808
11	0.793	0.821
12	0.811	0.8240
13	0.832	0.8246
14	0.830	0.829
15	0.849	0.846
16	0.835	0.813
17	0.863	0.843
18	0.867	0.839
19	0.876	0.843
20	0.871	0.836
21	0.887	0.847
22	0.874	0.837
23	0.895	0.854
24	0.891	0.839
25	0.898	0.826
26	0.908	0.848
27	0.908	0.845
28	0.904	0.844
29	0.902	0.843
30	0916	0.841

Table 5.1 Epoch vs model accuracy and validation accuracy

D. Training And Validation Loss Graph

Formula to calculate loss:

Yi – The actual output

Yi – The predicted output

n – number of inputs

i – Iteration

Here the values of the training accuracy and validation accuracy with respect to the epochs are mentioned in the below table.

VI. FUTURE SCOPE

The proposed methods are not suitable for tiny leaves or plants without a proper leaf. Efforts may be made to develop methods to identify these types of plants. The algorithms may be implemented on a standalone single board computer connected to a scanner. A portable system may be developed for field use.In future research in the area of plants identification, improved machine learning classifier with some pre-processing and feature selection models will be used to solve the accuracy related issues and enhance the performance.

Conclusion

Plants are necessary for human survival. Herbs, particularly, are employed by indigenous populations as folk medicines from old period. Herbs are typically recognized by clinicians based on decades of intimate sensory or olfactory experience. Recent improvements in analytical technology have made it much easier to identify herbs depending on scientific evidence. This helps a lot of individuals, particularly those are not used to recognising herbs. additionally for time-consuming methods, laboratory-based analysis necessitates expertise in sample healing and data explanation. As a result, a simple and reliable method for identifying herbs is required. Herbal identification anticipated to benefit from the combination of computation and statistical examination. This non-destructive technique will be the preferred approach for quickly identifying herbs, especially for individuals who cannot able to use expensive analytical equipment. This work reviews about different methods for plants recognition and also reviews their advantages and disadvantages.

References

[1] Kunttu. I, Lepisto L. & Visa. A, \"Image Correlogram in Image Database Indexing and Retrieval\". Proceedings of 4th European Workshop on Image Analysis for Multimedia Interaction Services, pp.88-91, London, 2003. [2] Q-K. Man, C-H Zheng. X-F. Wang & F-Y.Lin, \"Recognition of Plant Leaves Using Support Vector Machine\", International Conference on Intelligent Computing, pp. 192-199, Shanghai, 2008 [3] Choras S. \"Image Feature Extraction Techniques and Their Application for CBIR and Biometrics Systems“ International Journal of Biology and Biomedical Engineering, vol. 1 (1), pp. 6-16, 2007, [4] Abdul Kadir, Lukito Edi Nugroho, Adhi Susanto and Paulus Insap Santosa, \"Leaf Classification Using Shape, Color, and Texture Features\", in International Journal of Computer Trends and Technology-July to Aug Issue 2011, [5] Du J, Huang. D, Wang X, and Gu X, \"Shape recognition based on radial basis probabilistic neural network and application to plant species identification,\" in Proceedings of 2005 International Symposium of Neural Networks, ser.LNCS 3497. Springer, 2005. [6] Tulasi Sathwik, Aravamudhan Gopal, Roshini Venkatesh and Renduchintala Yasaswini, \"Classification Of Selected Medicinal Plant Leaves using Texture Analysis submitted to ICCCNT.2013 [7] R. Janani and A. Gopal, \"Identification of selected medicinal plant leaves using image features and ANN,\" 2013 International Conference on Advanced Electronic Systems (ICAES), 2013, pp. 238-242, doi: 10.1109/ICAES.2013.6659400. [8] A. Gopal, S. Prudhveeswar Reddy and V. Gayatri, \"Classification of selected medicinal plants leaf using image processing,\" 2012 International Conference on Machine Vision and Image Processing (MVIP), 2012, pp. 5-8, doi: 10.1109/MVIP.2012.6428747.

Copyright

Copyright © 2022 R. Upendar Rao, M. Sai Lahari , K. Pavana Sri, K. Yaminee Srujana , D. Yaswanth . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET41190

Publish Date : 2022-04-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here