Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Bhaskar P, Shashikala K, Susma Swaraj V, Gayathri A, Madhavi G, Venkata Juhitha C
DOI Link: https://doi.org/10.22214/ijraset.2024.60932
Certificate: View Certificate
The back-end database is pivotal to the storage of the massive size of big data Internet exchanges stemming from cloud-hosted web applications to Internet of Things (IoT) smart devices. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder’s exploit of choice on vulnerable web applications to pilfer confidential data from the database with potentially damaging consequences. The existing solutions of mostly signature approaches were all before the recent challenges of big data mining and at such lacks the functionality and ability to cope with new signatures concealed in web requests. An alternative Machine Learning (ML) predictive analytics provides a functional and scalable mining to big data in detection and prevention of SQLIA. Unfortunately, lack of availability of readymade robust corpus or data set with patterns and historical data items to train a classifier are issues well known in SQLIA research. In this project, we explore the generation of data set containing extraction from known attack patterns including SQL tokens and symbols present at injection points. Also, as a test case, we build a web application that expects dictionary word list as vector variables to demonstrate massive quantities of learning data. The data set is pre-processed, labelled and feature hashing for supervised learning. The trained classifier to be deployed as a web service that is consumed in a custom dot NET application implementing a web proxy Application Programming Interface (API) to intercept and accurately predict SQLIA in web requests thereby preventing malicious web requests from reaching the protected back-end database. This project demonstrates a full proof of concept implementation of an ML predictive analytics and deployment of resultant web service that accurately predicts and prevents SQLIA with empirical evaluations presented in Confusion Matrix (CM) and Receiver Operating Curve (ROC).
I. INTRODUCTION
SQL injection is an attack technique that exploits a security vulnerability occurring in the database layer of an application. Hackers use injections to obtain unauthorized access to the underlying data, structure, and DBMS. By an SQL injection attacker can embed a malicious code in a poorly-designed application and then passed to the backend database. The malicious data then produces database query results or actions that should never have been executed. By using an SQL Injection vulnerability, given the right circumstances, an attacker can use it to bypass a web application’s authentication and authorization mechanisms and retrieve the contents of an entire database. SQL Injection can also be used to add, modify and delete records in a database, affecting data integrity. To such an extent, SQL Injection can provide an attacker with unauthorized access to sensitive data. SQL injection is a code injection technique, used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution (e.g. to dump the database contents to the attacker). SQL injection must exploit a security vulnerability in an application’s software.
II. LITERATURE SURVEY
A. Enhancing SVM-Based SQLIA Prediction with Comprehensive Data Sets and Pre-Processing Techniques
A critical aspect of improving SVM-based SQLIA prediction lies in the quality of data sets and the effectiveness of text pre-processing techniques. This section delves into strategies for generating robust training data, incorporating diverse patterns, and optimizing text pre-processing to enhance the accuracy of SVM classifiers in predicting SQLIA.Data augmentation techniques can further enrich the training data for SVM-based SQLIA prediction. Methods such as oversampling minority classes, synthetic data generation through techniques like SMOTE (Synthetic Minority Over-sampling Technique).
B. Challenges in Data Engineering for SVM-Based SQLIA Mitigation
One of the primary hurdles in utilizing Support Vector Machine (SVM) machine learning for SQL Injection Attack (SQLIA) mitigation lies in the realm of data engineering.
While SVM holds promise for bolstering security measures, its effectiveness is heavily contingent upon the quality of data preprocessing and feature extraction. Existing SVM-based approaches frequently encounter shortcomings in adequately processing textual data, a critical aspect in detecting SQLIA. The lack of comprehensive text preprocessing techniques often leads to the inability to accurately capture the nuanced patterns indicative of SQL injection attempts. Consequently, these inadequacies undermine the efficacy of SVM classifiers in combatting SQLIA, perpetuating the vulnerability of web applications to such attacks. Addressing these challenges necessitates a concerted effort to refine data engineering practices, with a focus on robust text preprocessing methodologies tailored specifically to the intricacies of SQL injection detection. By enhancing the quality of input data and optimizing text preprocessing techniques, SVM-based approaches can be better equipped to identify and mitigate SQLIA, thereby fortifying the security posture of web applications against malicious exploitation.
C. Gap in ML Application for Predicting SQLIA in Big Data Contexts
To date, there has been a significant oversight in discussing the application of machine learning (ML) for predicting SQLIA within the realm of big data. The focus on patterns and text pre-processing within the Multi-Aspect Multi-Layer (MAML) architecture remains unexplored territory, despite the potential for significant improvements in prediction accuracy.
III. EXISTING SYSTEM
Existing systems for the SQL language syntax closely resembles plain English and the SQLIA keywords are also in plain text. Therefore, the SQLIA problem in a big data context is a plausible candidate for predictive analytics of a supervised learning model trained via both known historical attack signatures and safe web requests patterns. The attack signatures at injection points will contain patterns of SQL tokens and symbols as SQLIA positive while valid web requests would take the form of expected data from the application. In this project, we build a predictive analytics web application with quantities of learning data to train a classifier. The learning data are labelled vector matrix, or features of both patterns of dictionary word list (SQLIA negative) and SQL tokens (SQLIA positive).
IV. PROPOSED SYSTEM
In proposed system, we make a prophetic analytics web operation with amounts of learning data to train a classifier. The literacy data are labelled vector matrix, or features of both patterns of dictionary word list (SQLIA negative) and SQL commemoratives (SQLIA positive). The beneficials this project makes give a representative data set that suffer point mincing to train a supervised literacy model enforcing Support Vector Machine (SVM) algorithm that directly predicts SQLIA thereby precluding vicious web requests from reaching the target back- end database. It also offers a environment of SQLIA discovery the big data internet. Also, this project presents a evidence of conception of a working prototype using ML algorithms of Two- Class Support Vector Machine (TCSVM) enforced on Microsoft Azure Machine Learning (MAML) to prognosticate SQLIA. This methodology also forms the subject of the empirical evaluation in Receiver Operating Curve (ROC).
V. SYSTEM ARCHITECTURE
The trained model exposed as a web service. The web service is called in a custom built dot NET application for this research named NETSQLIA for an ongoing SQLIA detection and prevention. Critical to the deployment in every new domain, the administrator or system expert need to feed the data engineering or text pre-processing module with a new rule that matches the patterns present in the new data set which triggers the retraining of the classifier to adapt to a new environment
VI. STEPS TO IMPLEMENT PROPOSED MODEL
VIII. ACKNOWLEDGEMENT
We are deeply thankful to Mr..P.Bhaskar for his support and encouragement. He provided us with the necessary resources and environment to pursue academic excellence and we are able to complete this research under his mentorship successfully.
In this Project we demonstrated that applied predictive analytics to SQLIA detection and prevention in big data context with an excellent result that is empirically evaluated in the confusion matrix and the ROC graph. In benchmarking this project against existing works, the methodology proposed here is functional in a big data context which is lacking in existing works before now on SQLIA to our knowledge. Future work involves employing multi-class classifier to identify and group the different SQLIA types as they are predicted.
[1] Prediction Of Covid-19 Infection Based on Lifestyle Habits Employing Random Forest Algorithm FS Mahammad, P Bhaskar, A Prudvi, NY Reddy, PJ Reddy journal of algebraic statistics 13 (3), 40-45. [2] Machine Learning Based Predictive Model for Closed Loop Air Filtering SystemP Bhaskar, FS Mahammad, AH Kumar, DR Kumar, SMA Khadar, ... Journal of Algebraic Statistics 13 (3), 609-616 [3] Devi, M. S., Mahammad, F. S., Bhavana, D., Sukanya, D., Thanusha, T. S., Chandrakala, M., & Swathi, P. V. (2022).” Machine Learning Based Classification and Clustering Analysis of Efficiency of Exercise Against Covid-19 Infection.” Journal of Algebraic Statistics, 13(3), 112-117. [4] Devi, M. M. S., & Gangadhar, M. Y. (2012).” A comparative Study of Classification Algorithm for Printed Telugu Character Recognition.” International Journal of Electronics Communication and Computer Engineering, 3(3), 633-641. [5] Devi, M. S., Meghana, A. I., Susmitha, M., Mounika, G., Vineela, G., & Padmavathi, M. MISSING CHILD IDENTIFICATION SYSTEM USING DEEP LEARNING. [6] Kumar, M. S., Harika, A., Sushama, C., & Neelima, P. (2022). Automated Extraction of Non?Functional Requirements From Text Files: A Supervised Learning Approach. Handbook of Intelligent Computing and Optimization for Sustainable Development, 149-170. [7] Devi, M. S., Poojitha, M., Sucharitha, R., Keerthi, K., Manideepika, P., & Vasudha, C. Extracting and Analyzing Features in Natural Language Processing for Deep Learning with English Language. [8] B.Krishna Naga Deepthi, Dr.M.V.Subramanyam,” Analysis And Optimization Of Power And Area Of Domino Full Adder And Its Applications”, Iosr Journal Of Electronics And Communication Engineering, Vol.10,No.3,Pp.55-63,2015. [9] Y.Murali Mohan Babu, Dr.M.V.Subramanyam,M.N. Giri Prasad,” A New Approach For Sar Image Denoising”, International Journal Of Electrical And Computer Engineering, Vol.5,No.5,Pp.984-991,2015. (Scopus Indexed) [10] Ch.Nagaraju, Dr.Anil Kumar Sharma, Dr.M.V.Subramanyam,” A Review On Ber Performance Analysis And Papr Mitigation In Mimo Ofdm Systems”, International Journal Of Engineering Technology And Computer Research, Vol.3,No.3,Pp.237-238, June, 2015. [11] D.Lakshmaiah, Dr.M.Subramanyam, Dr.K.Satya Prasad,” Design Of Low Power 4- Bit Cmos Braun Multiplier Based On Threshold Voltage Techniques”, Global JOURNAL OF RESEARCH IN ENGINEERING, VOL.14(9),PP.1125-1131,2014. [12] R Sumalatha, Dr.M.Subramanyam, “Image Denoising Using Spatial Adaptive Mask Filter”, Ieee International Conference On Electrical, Electronics, Signals, Communication &Amp; Optimization (Eesco-2015), Organized Byvignans Institute Of Information Technology, Vishakapatnam, 24 Th To 26th January 2015. (Scopus Indexed) [13] P.Balamurali Krishna, Dr.M.V.Subramanyam, Dr.K.Satya Prasad, “Hybrid Genetic Optimization To Mitigate Starvation In Wireless Mesh Networks”, Indian Journal Of Science And Technology,Vol.8,No.23,2015. (Scopus Indexed)
Copyright © 2024 Bhaskar P, Shashikala K, Susma Swaraj V, Gayathri A, Madhavi G, Venkata Juhitha C. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET60932
Publish Date : 2024-04-24
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here