Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Garikapati Charan Sai, Akula Rishika, Garlapati Dheeraj Reddy, Reddyvari Venkateswara Reddy, Punyaban Patel
DOI Link: https://doi.org/10.22214/ijraset.2024.59828
Certificate: View Certificate
Detection of phishing scams is a process to detect phishing attempts before they happen, inform administrators and users about them, and most importantly reduce the risk. Phishing is a form of cybercrime in which hackers attempt to trick victims into revealing confidential information, e.g. passwords or financial details, by pretending that they\'re an efficient source for the attack via email and most commonly text messaging. Since phishing is mostly a semantics-based attack that targets human vulnerabilities, identifying these phishing websites can be difficult. The authors suggested this project as a solution to this problem. By identifying and quickly warning users of potential phishing and malware threats, our project aims to improve online security. This all-inclusive method includes Greek alphabet analysis, port forwarding detection, database comparison, grammar analysis utilizing NLP libraries, and many more. A modified version of phishing detection technique has been suggested which integrates Greek alphabet analysis , port forwarding detection and homograph attack detection.
I. INTRODUCTION
The prevalence of online communication and transactions in the quickly developing information technology age made phishing attacks a growing concern to both individuals and organizations. Cybercriminals employ phishing, a deceitful technique, to manipulate users into disclosing sensitive information. Phishing puts the confidentiality and integrity of personal and organizational data at serious risk. The rising sophistication of approaches utilized by malevolent actors means that traditional security solutions frequently fall short of protecting against these dynamic threats. Therefore, the necessity for sophisticated and adaptable systems that can quickly recognize and stop phishing attacks is imperative. This work presents a complete method of phishing detection that combines behavioral analysis, domain-specific heuristics, and state-of-the-art machine learning techniques. By combining these components, our suggested system seeks to improve the precision and effectiveness of spotting phishing attempts on different websites, social media platforms, and emails.
Our suggested solution is intended to improve users' internet security by identifying and swiftly informing them of potential harmful programs and threats. This all-inclusive method includes Greek alphabet analysis, database comparison, port forwarding detection using Python tools, grammar analysis using NLP libraries, and many more.
II. LITERATURE REVIEW
Three techniques were proposed for categorization from the writers of the paper [8], Rishikesh Mahajan and Irfan Siddavatam: Decision Tree, Random Forest, and Support Vector Machine. Their sample included 19,653 phishing URLs and 17,058 benign URLs, each with 16 attributes, collected from the websites of Alexa and PhishTank, respectively. There were two sections in the data sets: identifying and training in the following ratios: 50:50, 70:30, and 90:10, respectively. The rates of false positive and negative and accuracy score were among the metrics used to assess performance. They achieved 97.14% accuracy using the Random Forest algorithm as it has low false negative rate. The study found that increasing the quantity if the training data gest better accuracy. Jitendra Kumar et al.'s study from [9] trained a variety of classifiers, including K Nearest Neighbour, Random Foresst, Naive Bayees Classifier, Logistic Regression, and Decision Trees, using variables extracted from the lexical structure of the URL. The URL dataset was created with volatility, overfitting, biased training, and unbalanced data in mind. The dataset had an equal number of tagged phishing and genuine URLs and was further split into testing and training groups using a 7:3 ratio. Even though the AUC values among all the categorization were almost equal, the Naive Bayees Classifier turned out to be more suitable because it had the highest value.
The most accuracy that Naive Bayes could attain was 98%, with precision=1, recall=0.95, and F1-score= 0.97. Mehmet Korkmaz et al. developed a detection of phishing scams by ML in [10] by utilising eight different algorithms on three different datasets.
The techniques used were XGBoost, Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbour (KNN), Suport Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbour (KNN), and Artificial Neural Network (ANN). It was discovered that the models using LR, SVM, and NB had low accuracy rates.
The techniques used were XGBoost, Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbour (KNN), Suport Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbour (KNN), and Artificial Neural Network (ANN). It was discovered that the models using LR, SVM, and NB had low accuracy rates.
The 32-feature Kaggle dataset underwent principal component analysis (PCA) and various feature selection procedures. The dataset benefits from feature selection since it eliminates unnecessary or redundant data. The proposed model used the REF, Relief-F, IG, and GR algorithms for feature selection before turning to PCA. Random Forest achieved an accuracy rate of 97%.
It handled the over-fitting issue better and was less erratic. Using the UCI dataset, Abdulhamit Subasi et al. launched an intelligent phishing detection software in [12]. To recognize phishing websites, a numerous machine learning strategies were used among the classifiers are Artificial Neural Networks (ANN), K-Nearest Neighbours (K-NN), Support Vector Machines (SVM), C4.5 Decision Trees, Random Forests (RF), and Rotation Forests (RoF).
III. METHODOLOGY
A website that enables visitors to thoroughly examine the legitimacy of the URL using multiple layers of analysis. Greek Alphabet Analysis, Database Comparison, Grammar Analysis, and Port Forwarding Detection are among the procedures.
Grammar analysis looks for irregularities or inconsistencies in the linguistic structure of textual information. Grammar analysis detects dishonest language patterns suggestive of fraudulent conduct, which improves the accuracy of phishing attempt detection.
2. Database Comparison
Database comparison is the technique of comparing features and patterns between suspicious data and pre-existing phishing databases to find fraudulent information more rapidly. As a result, detection of phishing software are more successful overall.
3. Port Forwarding Detection
Monitoring network traffic for instances of port forwarding is referred to as port forwarding detection, and it is an essential component of phishing detection since it can reveal any attempts to reroute users to malicious servers, strengthening defenses against phishing assaults.
4. Greek Alphabet Analysis
Greek alphabet analysis in phishing detection entails searching for Greek characters in the text since hackers commonly use them to circumvent standard filters. The ability of the system to identify and neutralize phishing threats is enhanced by locating these anomalies.
5. Using IP address
Verifying if the URL contains an IP address. Phishing websites frequently use direct IP addresses to conceal their genuine identities.
6. Long URL
Recognizing unusually lengthy URLs, which are frequently used in phishing efforts to trick people.
7. Short URL
Examining abbreviated URLs, which are frequently used to hide their destination and can direct users to phishing websites.
8. Symbol@
Searching for the "@" symbol, as this could signal an attempt to trick consumers by imitating official email correspondence.
9. Redirecting//
Recognizing the usage of excessive redirects. It is an tactic that phishing websites frequently utilize to trick visitors.
10. Prefix/ Suffix
Look closely at URL prefixes and suffixes, as phishers may use these to craft phony site addresses.
11. Subdomains
Counting the subdomains is important since an unusually high number could indicate a phishing effort.
12. HTTPS
Check if HTTPS is being used, as many phishing sites do not use secure connections, although genuine websites frequently do.
13. Domain Registration Length
Examining how long a domain has been registered, since shorter periods may be a sign of a fraudulent website.
14. Favicon
Verifying the existence and authenticity of the favicon connected to the website, since phishers can trick users with phony favicons.
15. Non- Standard Ports
Identifying non-standard ports used in URLs is important since it might be a sign of phishing activity. This is referred to as a Non-Standard Port.
16. HTTPS Domain URL
Verifying that, while using HTTPS, the domain and the URL are consistent with each another, as differences could indicate phishing efforts.
17. Requested URL
Examining the requested URL's validity because phishing websites may tamper with it to deceive users.
18. Anchor URL
Analyzing the anchor links' destination because they may lead to fraudulent websites.
19. Links in Script Tags
Finding links contained within script tags. It is an tactic that attackers frequently use to conceal harmful URLs.
20. Server Form Handler
Examine the form handler on the server for any anomalies in form processing or other possible indications of phishing.
21. Information of Email
Recognizing shady email addresses which are connected to the website, since phishing websites could utilize phony contact details.
22. Abnormal URL
Recognizing URLs that don't follow standard patterns and indicating possible phishing attempts.
23. Website forwarding
Recognizing whether a website is sending users to another, since this may be a phishing technique.
24. Status Bar Cust
Examining changes made to the status bar that could become accustomed to trick users about where a link leads.
25. Disabled Right Click
Tracking efforts to turn off the capacity for right-click on web pages. It is the common tactic to keep users from using browser functions that would reveal phishing attempts.
26. Using Popup Window
Analyzing the application of popup windows, which phishing websites may employ to divert attention or deceive visitors.
27. Iframe Redirection
Determining whether iframe redirection—a method of sending viewers to phishing pages—is present.
28. Age of Domain
Examining the domain's age is important because younger domains might be more connected to phishing.
29. DNS Recording
Verifying any irregularities in DNS recording, since phishers can alter DNS records to trick users.
30. Website traffic analysis
The traffic features of phishing websites frequently diverge from those of trustworthy websites.
31. PageRank
Determining a website's page rank; phishing sites may have artificially inflated or decreased ratings.
32. Google Index
Verifying whether Google has indexed the website since reputable websites have a higher chance of being listed.
33. Links Pointing To Page
Analyzing how many external links lead to a particular website to determine its legitimacy or potential for phishing.
34. Stats Report
Analyzing the website's statistical reports and searching for any anomalies that might point to phishing activities.
35. class
The class label that, after the previously specified components are looked at, indicates whether or not the situation is considered to be a phishing attempt.
IV. USER INTERFACE
User Interaction: The user hits the submit button to obtain the outcome after the analytic procedures.
- Output: The user sees a full result that summarizes the information gleaned from every level of investigation. This contains details on the URL's grammatical accuracy, database presence, port forwarding detection, and Greek character usage.
To raise the identification of phishing attempts, this methodology offers a multidimensional approach to URL legitimacy assessment by integrating linguistic analysis, historical database comparison, and unique threat indicators.
V. RESULT & DISCUSSIONS
.
VI. BENEFITS
These are the benefits for using a multi-layered approach for Phishing detection are
A. More Accurate Detection
B. Detecting Unknown Attacks
C. Alerting Users
D. Multi-Layered Approach
VII. FUTURE SCOPE
Phishing attacks continue to raise serious challenges to cybersecurity, necessitating continuous improvements in detection technologies and the incorporation of new functionality. In this work, a future scope for improving the security of Phishing Detection software, with an emphasis on the development and implementation of unique features such as an Extension for E-mail and Message Applications, User Education initiatives, and an Authentication Page. These enhancements aim to strengthen existing detection techniques while also addressing new attack pathways and user vulnerabilities. Our suggested paradigm stresses the meeting of technology breakthroughs and user-centric techniques to strengthen businesses' resilience to phishing threats.
In conclusion, our project is a noteworthy leap forward in online security by introducing a multi-layered approach to counter phishing and malware threats. By incorporating advanced techniques to detect semantic anomalies and staying vigilant against evolving cybercriminal strategies, our system ensures a robust defense mechanism. The prompt alerting of users adds an extra layer of protection, empowering individuals to make well briefed decisions in real time. With the overarching goal of substantially reducing success rates for both phishing and malware attacks, our contribution aims to establish a safer digital environment. This creative strategy not only addresses current threats but also positions our system to adapt and evolve alongside the dynamic landscape of cyber threats, making a valuable contribution to the ongoing efforts to enhance online security.
[1] Anti-phishing Working Group (APWG) Phishing Activity Trends Report, 4th quarter 2020, https://docs.apwg.org/reports/apwg trends report q4 2020.pdf [2] FBI Internet Crime Report 2020, https://www.ic3.gov/Media/PDF/AnnualReport/2020 IC3Report.pdf [3] Verizon 2020 Data Breach Investigation Report, https://enterprise.verizon.com/resources/reports/2020-databreachinvestigations-report.pdf [4] World Health Organization, Communicating for Health, Cyber Security, https://www.who.int/about/communications/cyber-security [5] N. Abdelhamid, A. Ayesh, and F. ?habtah, “Phishing detection-based associative classification data mining,” Expert Systems with Applications, vol. 41, no. 13, pp. 5948–5959, 2014. [6] K. L. Chiew, E. H. Chang, W. K. Tiong et al., “Utilization of website logos for phishing detection,” Computers & Security, vol. 54, pp. 16–26, 2015. [7] \"Identifying fraudulent websites through fuzzy logic,\" K. M. Kumar and K. Alekhya, ?nternational Journal of Advanced Research in Computer Engineering Technology, 2016. [8] Rishikesh Mahajan and Irfan Siddavatam, “Phishng website detection using machine learning algorithms,” ?nternationl Journal of Computer Applications, vol. 181, no. 23, 2018. [9] Jitendra Kumar, A. Santhanavijayan, B. Janet, Balaji Rajendran, and Bindhumadhava BS, “Phishing website classification and detection using machine learning,” Internationl Conference on Computer Communication and Informatics, 2020 [10] Mehmet Korkmaz, Ozgur Koray Sahingoz, and Banu Diri, “Detection of phishing websites by using machine learning-based URL analysis,” 11th ?nternationl Conference on Computing, Communication, and Networking Technologies , 2020 [11] Mohammad Nazmul Alam, Dhiman Sarma et al., “Phishing attacks detection using a ML approach,” 3rd ?nternationl Conference on Smart Systems and Inventive Technology , 2020 [12] Abdulhamit Subasi, Esraa Molah, Fatin Almkallawi, and Touseef J. Chaudhery, “Intelligent phishing website detection using a Random Forest classifier,” ?nternational Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017. [13] Structure of a URL: image, https://towardsdatascience.com/phishingdomain-detection-with-ml5be9c99293e5 [14] Rami M. Mohammad, Fadi and Lee McCluskey, “Phishing Website Features,”
Copyright © 2024 Garikapati Charan Sai, Akula Rishika, Garlapati Dheeraj Reddy, Reddyvari Venkateswara Reddy, Punyaban Patel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET59828
Publish Date : 2024-04-04
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here