Credit Cards are quite useful for day to day life. The main aim of this project is to detect fraud accurately. With the increase in fraud rates, researchers have started using different machine learning methods to detect and analyze frauds in online transactions. \'Fraud\' in credit card transactions is unauthorized and unwanted usage of an account by someone other than the owner of that account. Fraud detection involves monitoring the activities of users in whole in order to estimate, perceive or avoid objectionable behavior, which consist of fraud, intrusion, and defaulting. The problem itself is more challenging with respect to data science since the number of valid transactions far outnumber fraudulent ones. Also, the transaction patterns often change their statistical properties over the course of time. However, the massive stream of payment requests is quickly scanned by automatic tools that determine which transactions to authorize. Also, Messages are generated to confirm with the owner about the transactions. Machine learning algorithms are employed to analyze all the authorized transactions and report the suspicious ones. These reports are investigated by professionals who contact the cardholders to confirm if the transaction was genuine or fraudulent. This project also designs and develops a novel fraud detection method for Streaming Transaction Data, with an objective, to analyze the past transaction details of the customers and extract the behavioral patterns. To name a few techniques which we are going to implement are Isolation Forest Algorithm, Random Forest Algorithm, Logistic Regression, Confusion Matrix and Sliding-Window method.
Introduction
I. INTRODUCTION
Credit card generally refers to a card that is assigned to the customer (cardholder), usually allowing them to purchase goods and services within the credit limit or withdraw cash in advance. Fraud in credit card transactions is the unbidden use of someone’s account without the owner being aware of it. Credit card frauds are easy targets. Fraudsters always try to make every fraudulent transaction legitimate, which makes fraud detection a very challenging and difficult task to detect.
There are many machine learning techniques to overcome this problem. Our project aims to prevent such fraudulent practices by analyzing and studying these fraud transactions to avoid similar situations in the upcoming transactions.
II. LITERATURE SURVEY
The 1st we referred to is Credit Card Fraud Detection using Machine Learning Algorithms1. by Vaishnavi Nath Dornadulaa, Geetha S, published by IJCSMC in the year 2019. We observed that the Matthews Correlation Coefficient was the better parameter to deal with the imbalance dataset. By applying the SMOTE, we tried balancing the dataset, where we found that the classifiers were performing better than before. The other way of handling an imbalanced dataset is to use one-class classifiers like one-class SVM. We finally observed that Logistic regression, decision tree and random forest are the algorithms that gave better results
The 2nd paper referred to is Machine Learning For Credit Card Fraud Detection System2 by Lakshmi S, Selvani deepthi kavila, published by research India Publications in the year 2018, Sensitivity, Specificity, accuracy and error rate are used to evaluate the performance for the proposed system. The accuracy for logistic regression, Decision tree and random forest classifier are 90.0, 94.3, and 95.5 respectively. By comparing all the methods, we found that the random forest classifier is better than the logistic regression and decision tree. The 3rd paper referred is Credit Card Fraud Detection using Machine Learning and Data Science3 by S P Maniraj, Aditya Saini, Swarna Deep Sarkar Shadab Ahmed,published by IJERT, in 2019.In this paper, the algorithm does reach over 99.6% accuracy, its precision remains only at 28% when a tenth of the data set is taken into consideration. However, when the entire dataset is fed into the algorithm, the precision rises to 33%. This project allows for multiple algorithms to be integrated together as modules and their results can be combined to increase the accuracy of the final result. This model can further be improved with the addition of more algorithms into it. The 4th paper is Credit card fraud detection using machine learning: A survey4 by Dr. Yvan Lucas, Dr. Johannes Jurgovsky, arXiv, 2020.Different approaches for tackling each of these challenges are highlighted in this survey and for each of these approaches at minimum one research work is described in full detail. The goal is to provide the reader with useful information on all the different research subjects introduced here.
III. METHODOLOGY
We have gathered data from the kaggle website. The dataset is trained and tested using the following techniques: logistic regression, random forests with decision trees, xgboost, isolation forest and confusion matrix . If our algorithm is applied into bank credit card fraud detection systems, the probability of fraud transactions can be predicted soon after credit card transaction occurs. Thereafter a series of anti-fraud strategies can be adopted to prevent banks from great losses and reduce risks.
A. System Architecture
This is a block diagram for our system. It is the actual representation of the algorithm which we implemented.
The first step is to read the data set and then it is sent for sampling. Training and testing of the data set is done.
After the feature selection, the data will be sent to the algorithm which is the combination of the logistic regression, decision tree, random forest & MCC. The resultant data is stored in test sample data. The prediction of outcome is done Based on test sample data & the result of the combined algorithm.
Later the performance & accuracy results are plotted. It has a feature which validates the results if the transaction is legitimate then the transaction is said to be true or else it is false. In case of false transactions the bank is made aware of it.
B. Requirements
The implementation requirements are given in this section.
Software
a. Operating system: Windows 8/10.
b. IDE Tool : Visual Studios and Jupyter notebook
c. Coding Language : Python 3.6 & up
d. APIs : Numpy, pandas, Matplotlib,imblearn
e. Framework : Streamlit
2. Hardware
a. Processor: Pentium i3 or higher.
b. RAM : 4 GB or higher.
c. Hard Disk Drive : 20 GB (free).
3. Algorithms
a. Logistic Regression
Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable.
Accuracy on Training data : 0.9440914866581956
Accuracy score on Test Data : 0.9543147208121827
b. Local Outlier Factor
In anomaly detection, the local outlier factor is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbors.
Following is the result of applying local outlier factor on our dataset
c. Isolation Forest Algorithm
Isolation forest is an anomaly detection algorithm. It detects anomalies using isolation, rather than modeling the normal points.
Following is the result after implementing the isolation forest algorithm on our dataset
\d. Confusion Matrix
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one
Following is our confusion matrix
e. XGBoost
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.
Conclusion
Credit card fraud is an act of criminal dishonesty. This paper. This paper has various machine learning algorithms. All these techniques are tested based on accuracy and precision. We have selected supervised learning technique Random Forest to classify the alert as fraudulent or authorized. This classifier will be trained using feedback and delayed supervised samples. Next it will aggregate each probability to detect alerts. Further we proposed a learning to rank approach where alerts will be ranked based on priority. The suggested method will be able to solve the class imbalance and concept drift problem. Future work will include applying semi-supervised learning methods for classification of alerts in fraud detection systems.
References
[1] Vaishnavi Nath Dornadulaa, Geetha S. Credit Card Fraud Detection using Machine Learning Algorithms. IJCSMC, 2019, https://www.sciencedirect.com/science/article/pii/S187705092030065X.
[2] Lakshmi S V S S, Selvani Deepthi Kavila, (2018), Machine Learning For Credit Card Fraud Detection System, Research India Publications
[3] S P Maniraj, Aditya Saini, Swarna Deep Sarkar Shadab Ahmed, Credit Card Fraud Detection using Machine Learning and Data Science, published by IJERT, in 2019
[4] Credit card fraud detection using machine learning: A survey by Dr. Yvan Lucas, Dr. Johannes Jurgovsky, arXiv, 2020
[5] Roy, Abhimanyu, et al. “Deep Learning Detecting Fraud in Credit Card Transactions.” 2018 Systems and Information Engineering Design Symposium (SIEDS), 2018, doi:10.1109/sieds.2018.8374722.
[6] Xuan, Shiyang, et al. “Random Forest for Credit Card Fraud Detection.” 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), 2018, doi:10.1109/icnsc.2018.8361343.
[7] Awoyemi, John O., et al. “Credit Card Fraud Detection Using Machine Learning Techniques: A Comparative Analysis.” 2017 International Conference on Computing Networking and Informatics (ICCNI), 2017, doi:10.1109/iccni.2017.8123782.
[8] Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-Supervised Learning Techniques.” 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), 2017, doi:10.1109/colcomcon.2017.8088206.
[9] http://www.rbi.org.in/Circular/CreditCard
[10] https://www.ftc.gov/news-events/press-releases/2019/02/im poster-scams-top-complaints-made-ftc-2018
[11] https://www.kaggle.com/mlg-ulb/creditcardfraud
[12] https://www.kaggle.com/uciml/default-of-credit-card-clients-dat aset
[13] https://www.kaggle.com/ntnu-testimon/paysim1/home