With the advent of internet as a major medium right from inter-personal communication to how business transactions occur, the need to focus on research and development of various cyber security techniques rises. Phishing is one of the most common social engineering methods faced by public internet users, governments and businesses. There are several types of phishing by which users are directed to fake websites that resemble the legit ones. These fake websites collect the user credentials and use them for illegal purposes. In this project, a three-fold detection model using deep learning techniques like (Gradient Booster) for lexical and domain feature extraction ,(LSTM) for hyperlink extraction and (CNN) for content based extraction to detect phishing websites is proposed. This model is deployed by developing a google chrome extension that classifies and alerts the user when he/she is directed to a website that is not legit.
Introduction
I. INTRODUCTION
Phishing nowadays is one of the most serious and dangerous online threat in the domain of cybersecurity. The use of social networks, e-commerce, electronic banking, and other online services has been increased immensely due to the rapid development of internet technologies. At present, internet penetration stands at 59.5 percent which provides an opportunity to make money for a phishing attacker by blackmailing and stealing confidential information from internet users. The attacker develops a fraudulent website and sends links to online platforms like Facebook, Twitter, emails, etc by conveying a message of panic, urgency, or a financial bid, and instructs the recipient to take immediate action. When a user unwittingly clicks the link and updates any clicks the link and updates any sensitive credentials, cyber attackers gain access to the user’s information like financial data, personal information, username, password, etc. This stolen information is used by cybercriminals for a variety of illegal activities, including blackmailing victims. Many researchers have recently developed various machine learning and deep learning models to detect phishing sites. However, earlier systems have limitations such as blacklisting cannot predict temporary or dynamic urls,Easily evaded and restructured using html and cloned like a legitimate website. In this paper, we propose a solution for the detection of phishing detection using lexical,hyperlink,content-based extraction. Our proposed system utilizes google Chrome Extension named Openmenot which alerts the user when a phishing url is detected.
This paper aims to discuss the limitations and issues of earlier url phishing detection systems, present our proposed system, and evaluate its performance using different metrics such as accuracy rate, precision, recall, and F1 score.
II. LITERATURE SURVEY
[1] Ala Mughaid a, Shadi AlZubi, Salah Tamanna researched using machine learning topropose a detection model by splitting the dataset to train the detection model and validating the results using the test data , to capture inherent characteristics of the email text, and other features to be classified as phishing or non-phishing using three different data sets, After making a comparison between them, we obtained that the most number of features used the most accurate and efficient results achieved. the best ML algorithm accuracy were 0.88, 1.00, and 0.97 consecutively for boosted decision tree.
[2]Das guptta. S, Shahirar K.T, Alqatani H proposed a hybrid feature based anti-phishing strategy that extracts features from URL and hyperlink information of client-side only. They also developed a new dataset for the purpose of conducting experiments using popular machine learning classification techniques. Their experimental result shows that the proposed phishing detection approach is more effective having higher detection accuracy of 99.17% with the XG Boost technique than traditional approaches.
[3]Aswathi A, Goel conducted study in 3 phases. First, they included classification using base classifiers, Ensemble classifiers, and then ensemble classifiers are tested with and without cross-validation. Finally, their performance is analyzed, and the results are presented at last to help others use this study for their upcoming research.
III. LIMITATIONS IN THE EXISTING
System
URL’S Are Temporary Or Dynamic: In black-listing the phishing websites are stored in db after analyzing method. Blacklist method result in failure as most URLs and temporary and dynamic.
Tedious Work For End-User:In white-listing,the end-users are requested to check the domain and IP address each time they enter. It cant be applicable always
Manipulated By Programmers:In content-based approach, the contents are easily evaded by restructuring html elements without changing the appearance of the site.
Easily Duplicated:In visuality similarity based approach,the real websites will be cloned ,so that the difference between phishing and legimitate websites cant be found easily.
IV. PROPOSED SYSTEM
Our proposed system for url phishing detection with google chrome extension utilizes Gradient Booster,LSTM and CNN model features to build our deep learning model that leverages multimodel fusion techniques to improve detection accuracy.
A. Input
The first step will be collecting a dataset of phishing urls from various sites which are labelled as legitimate or phishing.
B. Processing
We proposed a unique three-fold method for detection of phishing websites. The three levels of classification and detection are as follows:
Binary Classification using Gradient- classifier based on Lexical and domain features extracted from URL
Binary Classification on the negatives from level 1 using Long-Short Term Memory classifier based on hyperlink features extracted from website crawling Classification using Convolutional Neural Networks based on the extracted web content of the web page linked with the URL
At each level, the evaluation metrics of the algorithm is calculated by using labeled data. This detection model is aimed to be implemented by developing a Google Chrome Extension that will pass every URL that the user is trying to access. If the URL is detected to be malicious, then the user gets a warning alerting him/her.
C. Output
The Google chrome extension detects phishing sites and gives alert to the user. The performance of the model will be evaluated using standard metrics such as accuracy rate, precision,recall, and F1 score.
VII. ACKNOWLEDGEMENT
We are deeply indebted to Dr.V.Govindhasamy, Head of the Department, Department of Information Technology, Puducherry Technological University, Puducherry, India.
Conclusion
In conclusion, URL phishing detection using deep learning has shown promising results in recent studies. Deep learning models, such as Gradient Booster,convolutional neural networks (CNNs) ,Long Short-term memory (LSTM), have been used to extract features from URLs and classify them as either legitimate or phishing. These models have been able to achieve high accuracy rates comparing to existing methods.
1) After Training and testing of model using Gradient Booster obtain the accuracy of 0.981.
2) After Training and testing of model using LSTM obtain the accuracy of 0.969.
3) After Training and testing of model using CNN model obtain the accuracy of 0.973.
References
[1] Erzhou Zhu,Yugang chen.”An Effective Phishing Websites Detection Model Based On Optimal Feature Selection and Neural Network. IEEE ACCESS,7,2019
[2] Elmahgiubi, M., Ennajar, M., Drawil, N., & Elbuni, M. S. (2015). “Sign language translator and gesture recognition”. 201Global Summit on Computer & Information Technology (GSCIT). doi:10.1109/gscit.2015.7353332
[3] Estrada Jiménez, L. A., Benalcázar, M. E., & Sotomayor, N. (2017). “Gesture Recognition and Machine Learning Applied to Sign Language Translation”. IFMBE Proceedings, 233–236. doi:10.1007/978-981-10-4086-3_59
[4] Lean Karlo S. Tolentino, Ronnie O. Serfa Juan, August C. Thio-ac, Maria Abigail B. Pamahoy, Joni Rose R. Forteza,”Sign Language Recognition System” and International Journal of Machine Learning and Computing, Vol. 9, No. 6, December 2019
[5] Mohammad Elham Walizad, Mehreen Hurroo “Sign Language Recognition System using Convolutional Neural Network and Computer Vision”International Journal of Engineering Research & Technology (IJERT)-2020
[6] R.S. Sabeenian, S. Sai Bharathwaj, M. Mohamed Aadhil, “Sign Language Recognition Using Deep Learning and Computer Vision” Journal of Advanced research in Dynamical and Control System(Elsevier)-2020
[7] Bantupalli, Kshitij and Xie, Ying, \"American Sign Language Recognition Using Machine Learning and Computer Vision\" (2019)
[8] Kshitij Bantupalli, Ying Xie, “American Sign Language Recognition using Deep Learning and Computer Vision”IEEE International Conference on Big Data (Big Data)- 2020