Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Saurabh Kumar Sen, Anuradha Deolase
DOI Link: https://doi.org/10.22214/ijraset.2022.40663
Certificate: View Certificate
Ransomware cyberattacks have grown in severity, effectiveness to cause damage, and ease of execution during the last decade. Advanced ransomware detection technologies must be included with traditional anti-malware procedures. The results of a study and analysis of ransomware attack risk are presented in this work, with the goal of identifying the characteristics that separate ransomware from other malware and benign executable files with the help of detected logs. The ransomware\'s normal behaviour and structure are determined by statically and dynamically analysing the executable binaries. Ransomware-specific features are extracted from executable files using dynamic and static analysis techniques. This study shows that graph representation of attacks with a collection of datasets for malware detection improves when using machine learning techniques.
I. INTRODUCTION
Ransomware is a type of software that encrypts files on a computer and prevents the user or organisation from accessing them. This malware encrypts files and demands a ransom payment for the decryption key, putting businesses in a position where paying the ransom is the simplest and cheapest method to recover access to their data. Some ransomware variations have introduced extra capabilities, such as data theft, to entice ransomware victims to pay the ransom.
Ransomware has quickly risen to prominence as the most visible and well-known sort of malware. Recent ransomware attacks have harmed hospitals' capacity to offer critical services, paralysed city government systems, and wreaked havoc on a variety of enterprises.
The modern ransomware craze began with the WannaCry outbreak in 2017[1]. This large-scale, highly configurable attack has proven that ransomware attacks are both possible and profitable. Since then, dozens of ransomware variants have been developed and used in various attacks. The COVID19 pandemic has also contributed to the recent rise in ransomware. As organizations rapidly transition to working remotely, holes have been created in their cyber defences. Cybercriminals exploited these vulnerabilities to launch ransomware, leading to a wave of ransomware attacks. In Q3 2020, ransomware attacks increased by 50% compared to the first half of that year.
II. WORKING STEPS OF RANSOMWARE
In order to be successful, ransomware needs to gain access to a target system, encrypt the files there, and demand a ransom from the victim [2]. While the specifics of implementation differ from one ransomware variation to the next, all three stages remain the same.
Ransomware, like any other type of malware, can acquire access to a company's systems in a variety of methods. Ransomware authors, on the other hand, tend to favour a few unique infection vectors.
Phishing emails are one of them. A malicious email may include a link to a website that hosts a malicious download or an attachment with built-in downloader functionality. The ransomware is downloaded and executed on the PC if the email recipient falls for the scam. Another common ransomware infection vector makes use of services like the Remote Desktop Protocol (RDP) (RDP). An attacker can utilise RDP to authenticate to and remotely access a machine on the company network if they have stolen or guessed an employee's login credentials [4]. With this access, the attacker can download malware directly and run it on the machine they control. Others may try to actively infect computers, as WannaCry did with the Eternal Blue vulnerability [1]. The majority of ransomware versions have a variety of infection routes.
2. Step 2. Data Encryption
After gaining access to a machine, ransomware might begin encrypting its files. Because an operating system includes encryption, all that is required is accessing files, encrypting them with an attacker-controlled key, and replacing the originals with the encrypted copies. To maintain system stability, most ransomware strains pick carefully which files to encrypt. To make recovery without the decryption key more difficult, certain variants would erase backup and shadow copies of files [4].
3. Step 3. Ransom Demand
Once the files have been encrypted, the ransomware is ready to demand a payment [3]. Different ransomware variations implement this in a variety of ways, but it's fairly uncommon for the display background to be changed to a ransom note or for text files to be inserted in each encrypted directory with the ransom note. In exchange for access to the victim's files, these messages usually demand a specific sum of cryptocurrency. The ransomware operator will either supply a copy of the private key used to safeguard the symmetric encryption key or a copy of the symmetric encryption key itself if the ransom is paid. This information can be entered into a decryptor tool (also provided by the cybercriminal) to reverse the encryption and give the user access to their data again [2].
While all ransomware variations have these three essential phases, individual ransomware can have various implementations or additional steps. Ransomware variants like as Maze, for example, perform file scanning, registry information theft, and data theft before encrypting data, and WannaCry hunts for more vulnerable machines to infect and encrypt [1].
How to Protect Against Ransomware
A. Utilize Best Practices
The cost and impact of a ransomware assault can be drastically reduced with proper planning. Using the following best practises, an organization's susceptibility to ransomware can be reduced and its effects minimised:
B. Reduce Attack Surface
Because of the huge potential cost of a ransomware attack, the best ransomware mitigation technique is prevention. This can be achieved by reducing the attack surface by addressing [3]:
C. Deploy Anti-ransomware Solution
The need to encrypt all of a user’s files means that ransomware has a unique fingerprint when running on a system. Anti-ransomware solutions are built to identify those fingerprints. Common characteristics of a good anti-ransomware solution include [3]: Wide variant detection, Fast detection, Automatic restoration, Restoration mechanism not based on common built-in tools (like ‘Shadow Copy’, which is targeted by some ransomware variants).
III. METHODS TO REMOVE RANSOMWARE
A ransom note on a computer is not something anyone wants to see since it indicates that a ransomware infection has been effective. At this point, several responses to an active ransomware outbreak can be taken, and a company must decide whether or not to pay the ransom.
A. How to Mitigate an Active Ransomware Infection
After data encryption is complete and a ransom letter is shown on the infected computer's screen, many successful ransomware operations are only identified. The encrypted files are likely unrecoverable at this stage; however, some procedures should be performed right away:
IV. MACHINE LEARNING ALGORITHM: LOGISTIC ALGORITHM
A. Let's take a Closer Look at this Diagram
First and foremost, as previously stated, Logistic Regression models are classification models; specifically, binary classification models (they can only be used to distinguish between two categories, such as whether a person is obese or not based on their weight, or whether a house is large or small based on its size). This suggests that our data contains two types of observations (Category 1 and Category 2), as shown in the diagram.
Second, the Y-axis goes from 0 to 1 as can be seen. This is because the sigmoid function always uses these two values as maximum and minimum, which is ideal for our purpose of categorising data into two groups. We acquire a probability (between 0 and 1 obviously) of an observation belonging to one of the two categories by computing the sigmoid function of X (that is, a weighted sum of the input features, much like in Linear Regression).
The sigmoid function has the following formula
V. IMPLEMENTION OF LOGISTIC REGRESSION ALGORITHM
During the implementation, require to import dataset or libraries. The dataset was taken manually and describes information about a Malware being detected through a malware detection tools such as Symantec, Trend micro.
We will be predicting the value of detected ransomware attack and consider a single feature, threat to predict the values of Malware. You can have multiple features as well.
We would import the following modules:
make_classification: available in sklearn.datasets and used to generate dataset
matplotlib.pyplot: for plotting
LogisticRegression: this is imported from sklearn.linear_model. Used for performing logistic regression
train_test_split: imported from sklearn.model_selection and used to split dataset into training and test datasets
First, Import the libraries and dataset to analyse cyberattack risk in the organization.
# Import required modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import dataset
df = pd.read_csv('Ransom.csv')
# Show plot
sns.countplot('Security Threat', data=df)
plt.show()
Import the train test split, Logistic regression and classification report.
1. Step 1: Import Packages, Functions, and Classes
from sklearn.model selection import train_test_split
from sklearn.linear_model import Logistic Regression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix, accuracy_score
2. Step 2: Generate the dataset
Now you need to generate the dataset using the make classification () function. You need to specify the number of samples, the number of features, number of classes and other parameters.
The code for the make_classification is given below:
# Generate and dataset for Logistic Regression
x, y = make_classification(
n_samples=100,
n_features=1,
n_classes=2,
n_clusters_per_class=1,
flip_y=0.03,
n_informative=1,
n_redundant=0,
n_repeated=0)
3. Step 3: Visualize the Data
# Create a scatter plot
plt.scatter(x, y, c=y, cmap='rainbow')
plt.title('Scatter Plot of Logistic Regression')
plt.show()
4. Step 4: Split the Dataset
Now we would split the dataset into training dataset and test dataset. The training dataset is used to train the model while the test dataset is used to test the model’s performance on new data.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=1)
logmodel= LogisticRegression()
5. Step 5: Perform Logistic Regression
Here we would create a LogistiRegression object and fit it without dataset.
#Step 3: Create a Model and Train It
model = LogisticRegression(solver='liblinear', random_state=0)
model.fit(x, y)
The logistic regression output is given below:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100,multi_class='auto', n_jobs=None, penalty='l2', random_state=0, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
You can view the logistic regression coefficient and intercept using the code below:
6. Step 6: Make prediction using the model
We now use the model to predict the outputs given the test dataset.
score ()
takes the input and output as arguments and returns the ratio of the number of correct predictions to the number of observations.
7. Step 7: Display the Confusion Matrix
The accuracy of the model with a confusion matrix. In the case of binary classification, the confusion matrix shows the numbers of the following:
a. True negatives in the upper-left position
b. False negatives in the lower-left position
c. False positives in the upper-right position
d. True positives in the lower-right position
To create the confusion matrix, you can use confusion matrix() and provide the actual and predicted outputs as the arguments:
#Show the Confusion Matrix
>>> confusion_matrix(y, model.predict(x))
The output is:
array ([[3, 1],
[0, 6]], dtype=int64)
We can deduce from the confusion matrix that:
# True positive: 3 (upper-left) – Number of positives we predicted correctly
# True negative: 6 (lower-right) – Number of negatives we predicted correctly
# False positive: 1 (top-right) – Number of positives we predicted wrongly
# False negative: 0 (lower-left) – Number of negatives we predicted wrongly
A. Classification Report
B. Advantages
Because of its simple and efficient nature, it does not require a lot of computing power, is simple to execute and analyse, and is extensively utilised by data analysts and scientists. It also does not necessitate feature scaling. For each observation, logistic regression generates a probability score.
C. Disadvantages
Logistic regression can't handle a large number of categorical features/variables. It's susceptible to being overfitted. Also, logistic regression cannot address a non-linear problem, which is why non-linear characteristics must be transformed. In logistic regression, independent variables that are not associated with the target variable but are very similar or correlated to each other will not perform well.
D. Observation
We analyse a lot of details about logistic regression where we build respective models, to visualize results and some of the theoretical background information. Also covered the concepts such as sigmoid function, maximum likelihood, confusion matrix, F1-Score, accuracy. It also supports multiple features. Because the input values must be in a specified format, they were reshaped before being trained with the fit approach. The accuracy using this is 90%, which is very close to the accuracy of our model.
We analyse the malware risk at organization with the help of occurred ransomware dataset where we found that there is lots of ransomware signature to infect the system and encrypt the data of any organization. After the analysis, we got 90% chances of cyber-attack or infect the network. Therefore, required to implement the major security infrastructure in the organization to secure the organisation with the employee awareness program. Also, aware against the cyber-attack and mitigation procedure.
[1] A Study of Ransomware Detection and Prevention at Organizations-IRJET, Saurabh Kumar Sen, Nidhi Chourey, VITM Indore (M.P) [2] A Brief Study of Wannacry Threat: Ransomware Attack 2017- IJARCS Savita Mohrule,Manisha patil, MITACSC, Alandi, Pune, India. [3] Fundamental of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies by John Kelleher, Brian Mac Namee, and Aoife D’Arcy. [4] Ransomware: A Research and a personal case study of dealing with this nasty malware-Azad Ali, Indiana University of Pennsylvania. [5] Detection of ransomware on windows operating systems-Jaan Priisalu, TALLINN UNIVERSITY OF TECHNOLOGY.
Copyright © 2022 Saurabh Kumar Sen, Anuradha Deolase. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET40663
Publish Date : 2022-03-07
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here