Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Alfy Thomas, Kavya Shaji, Kalyani C Nair
DOI Link: https://doi.org/10.22214/ijraset.2023.55333
Certificate: View Certificate
Global cancer statistics shows that Lung cancer is the 2nd most common cancer and the leading cause of cancer-related mortality worldwide in 2020.Histopathology image analysis act as the prominent technique for cancer diagnosis. The assessment of whole-slide histopathology images remains a limiting factor for timely treatments. The automation of histopathology analysis is a much-needed solution to alleviate the burden of workload and tackle the problem of medical personnel scarcity in underserved regions and populations. To help make the classification of cancer-type techniques less formidable to at least a certain extent, this report presents a comparative analysis of deep learning algorithms for the efficient classification of lung cancer types. We have compared six different pre-trained Convolutional neural networks (CNN) to find the one best suitable for the classification of lung cancer types. The models include ResNet50 , VGG-16 , EfficientNet-B0, InceptionV3 , DenseNet121, and NasNetLarge . The models were trained using histopathology images from dataset LC25000. 4 different evaluation metrics, which are accuracy, precision, recall, and F1-score were used. We could achieve the highest accuracy using EfficientNetB0, which was 99.77%, while ResNet50 yielded an accuracy of 99.66% and VGG-16 gave an accuracy of 93.50%. Overall, we can say that this research can aid in implementing models that can increase the efficiency and accuracy in the lung cancer type classification in biomedical fields.
I. INTRODUCTION
Lung cancer ranks as the second most prevalent cancer globally, with men being more commonly affected and it is the most common cancer in their case, while women rank it as the second most prevalent cancer. According to recent statistics, in 2020 alone, there were over 2.2 million new cases of lung cancer.[1].Smoking, air pollution, radon gas exposure, etc are considered to cause lung cancer. The two types of lung cancer are small cell lung cancer(15%) and non-small cell lung cancer (85%) [2]. Adenocarcinomas (ADCs) and squamous cell carcinomas (SCCs) are the most common subtypes of Nonsmall cell lung cancer. Among various diagnosis techniques like histopathology, X-ray, CT scans, and magnetic resonance imaging (MRI), histopathological image examination is most preferred. The treatment for lung cancer is decided by the type of cancer patients have and hence identification of cancer type is crucial for treatment. This is done by histopathological image examination, in which hematoxylin and eosin (H&E) stained lung tissue slides are checked by pathologists through a microscope for abnormalities. This task takes a lot of time and effort and in developing countries like India and Bangladesh etc, skilled medical professionals are scarce. This will badly affect the in-time cancer diagnosis and treatment. Automation of cancer-type identification from histopathology images is the solution to this problem. Deep learning models using a convolutional neural network(CNN) can be used for this.
II. LITERATURE SURVEY
In [3] Huan Yang et al. developed a deep learning-based six-type classifier for histopathological WSI classification of lung adenocarcinoma, squamous cell carcinoma, small cell lung carcinoma, tuberculosis, pneumonia, and normal lung. The EfficientNet-B5-based model performed well than ResNet-50 and they could achieve AUCs of 0.970, 0.918, 0.963, and 0.978 on testing on 1067 samples obtained from 4 different medical centers respectively. The model demonstrated strong agreement with the pathologist’s finding and the ground truth with high correlation coefficients of 0.87.
Xi Wang et al. presented a weakly supervised approach in [4] for fast and effective classification of the whole slide lung cancer images. Their method could successfully classify Whole slide lung cancer images with fewer annotations than current methods. They find that Fully-connected networks (FCN) are efficient in finding cancer-prone regions from images.
They proposed different context-aware block selection and feature aggregation strategies to obtain an effective holistic feature representation of the WSI. Their proposed method outperformed the state-of-the-art methods on TCGA and SUCC datasets.
In [5], Chi-Long Chen et al. proposed a method for training neural networks based lung cancer type classifier on entire WSIs using only slide-level knowledge. The proposed method achieved AUC of 0.9594 and 0.9414 for adenocarcinoma and squamous cell carcinoma classification on the testing data of 9662 WSI. This method outperformed multiple-instance learning and showed good localization results for small lesions through class activation mapping.
Mishra S et al in their research paper [6] demonstrated that the EfficientNet-B0 model as a feature extractor and the model with additional layers classified lung cancer images. The model achieved an accuracy of 99.15% on the training set, 99.14% on the validation, and 98.67% on the test. In this work, they used the LC25000 dataset which includes 25000 color images of five classes.
In [7], Baranwal et al proposed a trinary classification of lung cancer images using ResNet 50, VGG-19, Inception, ResNet V2 and DenseNet for the feature extraction and triplet loss to guide the CNN such that it increases inter-cluster distance and reduces intra-cluster distance. Inception-ResNetv2 outperformed VGG19, ResNet50, and DenseNet121 with 99.7% test accuracy while the of rest are 92%,99%, and 99.4% respectively.
Abbas et al [8] studied pre-trained convolutional neural networks used to classify the histopathological slides into three classes, benign lung tissue, squamous cell carcinoma, and adenocarcinoma using the LC25000 dataset. The F-1 scores of AlexNet, VGG-19, ResNet-18, ResNet-34, ResNet-50, and ResNet-101, on the test dataset, show the result of 0.973, 0.997, 0.986, 0.992, 0.999, and 0.999 respectively.
Wenqing Sun et al in their study [9], demonstrated the feasibility of using deep structured algorithms in lung cancer image diagnosis. They evaluated the performance of three different deep learning algorithms: CNN, Deep Belief Network (DBN), and Stacked Denoising Auto Encoding (SDAE), and DBN outperformed with an accuracy of 0.8119 which is higher than traditional CADx system accuracy of 0.79. This research showed the very promising performance of deep learning algorithms in lung cancer image classification and great potential for medical imaging applications.
III. METHODOLOGY
A. Block Diagram of the System
B. Dataset Description
In this research, a publicly available dataset LC25000 [10] is used. This dataset includes 25000 color histopathological images with five classes of 5000 images each. It has 3 types of lung cancer and 2 types of colon cancer. This dataset originally contained 750 total images of lung tissue (250 benign lung tissue, 250 lung adenocarcinomas, and 250 lung squamous cell carcinomas), and 500 total images of colon tissue (250 benign colon tissue and 250 colon adenocarcinomas) captured from pathology glass slides. All images in this dataset are already cropped to sizes of 768 x 768 pixels from the original 1024 x 768 pixels, then expanded to 25,000 images by the following augmentations: left and right rotations (up to 25 degrees) and horizontal and vertical flips.
1) Lung benign tissue (Non-cancer): A benign lung tumor is an abnormal growth of tissue that serves no purpose and is found not to be cancerous. They may grow from many different structures in the lung. Determining whether a nodule is a benign tumor or an early stage of cancer is very important. Its growth rate is very slow and stops after some time. It does not spread to other parts of the body.
D. Proposed Architectures
In this study, we used the pre-trained models like EfficientNetB0 [11] and InceptionV3 [12]. These pre-trained models are already trained on ImageNet dataset which consist of 14 million images of 1000 classes. Using transfer learning, we utilize pre-trained models’ learned features to our problem. Another advantages of pre-trained models are they helps to reduce the training time needed for our new task and data requirements for training. We used Python’s TensorFlow Keras framework to implement the model.
IV. RESULT ANALYSIS
Accuracy, precision, recall, and F1-score are used in this paper as performance metrics to evaluate the models.
EfficientNetB0 is fine-tuned for 18 epochs. Here both the training accuracy and the validation accuracy are increasing with the epochs, and both the validation loss graph and training loss graph show a steady decrease. The validation accuracy reaches an impressive 99.85% at the end of the 18th epoch and corresponding model weights are saved. The model correctly classified all 300 images of Adenocarcinoma and 300 images of normal lung tissue. Only 2 images of squamous cell carcinoma are misclassified.
We have used 20 epochs for fine-tuning ResNet50. Training accuracy rapidly increased to 0.93 in the first epoch and gradually increase to 0.99 over 20 epochs, and the validation accuracy graph also increased from 0.97 to 0 .99. The training loss graph, shows a sharp decrease after the first epoch and slowly decrease to loss value 0.017 over 20 epochs. The validation loss graph is gradually decreasing from 0.07 to 0.013 with the epochs. Model weights at 13th epoch is taken as Best model since it has the minimum validation loss of 0.01 for which has validation accuracy of 99.7%.
VGG-16 is retrained for 15 epochs. The training accuracy increased from 36% to 94.66% over 15 epochs and validation accuracy sharply increased after the first epoch and gradually increases to 95.96% accuracy. Training loss suddenly decrease after first epoch and have a steady decrease for later epochs. Validation loss shows a steadier decrease during training. Best model weights are saved based on validation accuracy and, at 15th epoch, model has maximum validation accuracy of 95.96% and corresponding model weights are saved.
We have applied 20 epochs for InceptionV3. Training accuracy increased from 90% to 92.3% over 20 epochs and validation accuracy also increased to same value. Training loss and validation loss also decreased over 20 epochs. Best model is saved at the 13th epoch for which validation accuracy is maximum 91%. Out of the 300 Adenocarcinoma images, 283 have been correctly classified and 17 misclassified. 284 images of benign lung tissues and 264 images of squamous cell carcinoma are correctly classified while 16 images of benign lung tissues and 36 images of squamous cell carcinoma are misclassified.
DenseNet201 is finetuned with 20 epochs. Here the training accuracy increase slowly during the retraining phase and validation accuracy rapidly rise after first epoch then increase eventually for rest of the epochs to value of 96.8%. Training and validation loss also decreases with epochs . Best model is saved based on validation accuracy and at 18 the epoch validation accuracy has maximum score and corresponding model weights are saved. Out of 300 Adenocarcinoma images, only 4 are predicted incorrectly by the model while All the 300 images of Benign lung tissues are classified correctly. 189 squamous cell carcinoma images were correctly identified, but 8 of the same class were misclassified.
NasNetLarge is retrained for 30 epochs and training and validation accuracy found to be increased during retraining. Both training and validation loss decreased over 30 epochs. Model weights at 27 th epoch is taken as the best since it has the maximum validation accuracy of 92% . Out of 300 Adenocarcinoma images 270 are correctly predicted by model. Only 9 out of 300 benign lung tissue images are misclassified. 284 images of 300 squamous cell carcinoma are identified correctly while 16 are recognized incorrectly.
TABLE III
Measures For Different Algorithms In %
Model |
Category of Lung Cancer |
Precision (%) |
Recall (%) |
F1-Score (%) |
Accuracy (%) |
ResNet50 |
Adenocarcinoma |
99 |
100 |
100 |
99.66 |
Benign Tissue |
100 |
100 |
100 |
||
Squamous cell carcinoma |
100 |
99 |
99 |
||
VGG-16 |
Adenocarcinoma |
93 |
94 |
94 |
95.77 |
Benign Tissue |
98 |
99 |
99 |
||
Squamous cell carcinoma |
96 |
94 |
95 |
||
EfficientNetB0 |
Adenocarcinoma |
99 |
100 |
100 |
99.77 |
Benign Tissue |
100 |
100 |
100 |
||
Squamous cell carcinoma |
100 |
99 |
100 |
||
InceptionV3 |
Adenocarcinoma |
84 |
94 |
89 |
92.33 |
Benign Tissue |
99 |
95 |
97 |
||
Squamous cell carcinoma |
95 |
88 |
91 |
||
DenseNet201 |
Adenocarcinoma |
96 |
99 |
98 |
98.33 |
Benign Tissue |
100 |
100 |
100 |
||
Squamous cell carcinoma |
99 |
96 |
97 |
||
NetNetLarge |
Adenocarcinoma |
92 |
90 |
91 |
93.88 |
Benign Tissue |
99 |
97 |
98 |
||
Squamous cell carcinoma |
91 |
95 |
93 |
In this study, we have experimented 6 fine-tuned deep learning models- EfficientNetB0, InceptionV3, ResNet50 ,VGG-16 ,DenseNet121 and NasNetLarge for the classification of lung cancer types. All of the six models used have achieved testing accuracy greater than 90%. Out of the six models used, EfficientNetB0 outperformed rest of the models with testing accuracy of 99.77% and ResNet50 is just behind EfficientNetB0 with accuracy of 99.66%. It is InceptionV3 performed poorly with 92.33% accuracy. It is evident from this study that EfficientNet-B0 is best suitable for building lung cancer type classifier.
[1] WCRF International. (2022, April 14). Lung cancer statistics — World Cancer Research Fund International. https://www.wcrf.org/cancer-trends/lung-cancer-statistics/ [2] Collins, Hanes, Perkel Enck, R. E. (n.d.). Lung Cancer: Diagnosis and Management. Lung Cancer: Diagnosis and Management — AAFP. https://www.aafp.org/pubs/afp/issues/2007/0101/p56.html In-Text Citation: (COLLINS et al., n.d.). [3] Yang, H., Chen, L., Cheng, Z. et al. Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study. BMC Med 19, 80 (2021). https://doi.org/10.1186/s12916-021-01953-2 R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997. [4] Wang X, Chen H, Gan C, Lin H, Dou Q, Tsougenis E, Huang Q, Cai M, Heng PA. Weakly Supervised Deep Learning for Whole Slide Lung Cancer Image Analysis. IEEE Trans Cybern. 2020 Sep;50(9):3950-3962. doi: 10.1109/TCYB.2019.2935141. Epub 2019 Sep 2. PMID: 31484154. [5] Chen, CL., Chen, CC., Yu, WH. et al. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning. Nat Commun 12, 1193 (2021). https://doi.org/10.1038/s41467-021-21467-yFLEXChip Signal Processor (MC68175/D), Motorola, 1996. [6] Mishra S, et al.. Lung Cancer Detection (LCD) from Histopathological Images using Fine-Tuned Deep Neural Network. Ann Med Health Sci Res. 2022;12:1-7 [7] Baranwal, N., Doravari, P., Kachhoria, R. (2021). Classification of Histopathology Images of Lung Cancer Using Convolutional Neural Network (CNN). ArXiv. /abs/2112.13553J. Padhye, V. Firoiu, and D. Towsley, “A stochastic model of TCP Reno congestion avoidance and control,” Univ. of Massachusetts, Amherst, MA, CMPSCI Tech. Rep. 99-02, 1999. [8] Abbas, M., Abdollahi, M., Syed, A., Shah, S. a. A. (2020). The Histopathological Diagnosis of Adenocarcinoma Squamous Cells Carcinoma of Lungs by Artificial intelligence: A comparative study of convolutional neural networks. medRxiv (Cold Spring Harbor Laboratory). [9] Sun, Wenqing, Bin Zheng, and Wei Qian. ”Computer aided lung cancer diagnosis with deep learning algorithms.” Medical imaging 2016: computer-aided diagnosis. Vol. 9785. SPIE, 2016 [10] Borkowski, A. A., Bui, M. M., Thomas, L. B., Wilson, C. P., DeLand, L. A., Mastorides, S. M. (2019). Lung and Colon Cancer Histopathological Image Dataset (LC25000). ArXiv. /abs/1912.12142 [11] Tan, M., Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv. /abs/1905.1194 [12] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. ArXiv. /abs/1512.00567
Copyright © 2023 Alfy Thomas, Kavya Shaji, Kalyani C Nair. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET55333
Publish Date : 2023-08-13
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here