Among the more well-known types of automatic detection technology, malware detection includes detection and protection methods against viruses caused by viruses, worms, Trojan horses, spyware, and various types of malicious code. Failure to find a malware program at its inception leaves a space where it will send a significant threat and value to online security not only for individuals, organizations but also the community and the nation. And it seems that antivirus software may fail to detect viruses if it is not updated on a website with an anti-virus engine. The great struggle is to find a new virus because when it encounters new malware behavior, it takes a series of steps based on established law. If the rule of law determines that the new behavior is safe the virus remains undetected.
So in order to increase the discovery of more malware and precision with greater volume and variety or non- computer-assisted programming, this project uses machine learning techniques and the Neural Network conversion known as deep learning strategies. So in order to find a malware program in usable portable (PE) files we use ML techniques to first determine whether a file is malicious or not and then if it is malicious we put it on our CNN to extract a local feature.
Our binary category shows about 98% accuracy in any given PE file. It uses discarded data in a PE file .Our CNN model shows a 94% accuracy rate in identifying malicious and dangerous codes. It also shows us that CNN is very good at finding source code and binary code, can detect malicious software hidden in the wrong code, and leave malware without a hiding place. This project not only provides a simple network management solution for the detection of malicious software but also provides an easy-to-use GUI so that the average person with little technical knowledge can take security measures in a timely manner.
Introduction
I. INTRODUCTION
Attacks on malware and malware have been on the rise with the progress and development of the internet. According to the AV test report which shows that the malware is growing
99.71 million to 1205.25 million in the last 10 years [1]. And the complexity of the malware, its file type, and the structure of the malware have changed dramatically. In the current world there are source code files, binary files, Perl script files, create files, shell texts and read me files in malicious source files as well as a few references and position references as they are in malicious source files. In today's world when criminals or criminals hack into a malware program they use a malicious computer program. There are malicious files included in the wrong files.
So to make the discovery of a malware program effective use of in- depth learning methods to detect malware. In recent years in-depth learning methods such as Convolutional Neural Networks and Recurrent Neural Networks have been used in the fields of information security and provide greater results than traditional methods. In particular the use of CNN has proven to be a very powerful tool. If we can transfer different types of malware to images and make it as input data for CNN to be known. This will help the system administrator to develop an effective way to detect malicious software. We also provide an easy-to-use GUI interface where any person with missing technical knowledge can take timely security measures and make arrangements to track potential internet attacks.
Finally, this project has objectives that need to be achieved:
Check whether the file is secure or malicious.
If malicious, remove local features from CNN-generated images and separate them into separate non-computer program families.
Provide a user-friendly GUI so that anyone can check whether the file is malicious or not and provide security measures and access to your equipment.
II. LITERATURE SURVEY
A. Malware Detection
While researching computer hacking detection methods we have come to realize that there are 2 methods which are static analysis and dynamic analysis such as malware functionality [2] .Static analysis has revealed the facts about the system used by the compiler to ensure its accuracy. and effective translation from the source language to the target language [3].Dynamic analysis refers to large techniques that makes deduction based a code by noticing the runtime implementation conduct[4].
B. Implementation Of Random Forest
Decision trees are the structures of random forests and clever models. We can think of decision trees as a series of yes / no questions about our data that ultimately lead to the predicted phase (continuous value in the case of retreat). This is a descriptive model because it makes the distinction very similar to ours. What you can do: We run a series of queries from available data until we find a solution (appropriately) [18]. It is important to note that the drug did not make any errors in the training data. We hope this is because we have provided feedback on the tree and there is no limit to the maximum depth (number of levels). The learning model assumes that you have a good generalization of new data that you have never seen before.
The random forest is a model made of many trees to cut down. Rather than simply predicting the prediction of trees (which we can call “forest”), this model uses two key concepts that give it a random name:
Random sampling of training data areas when constructing trees
Random subsets of the factors considered when dividing nodes
C. Convolutional Neural Network Of DeepLearning
In the early years of artificial intelligence, starting in the 1950s, computer scientists have been working to create computers that can interpret visual input. Yann LeCun, a post-doctoral computer science researcher, initially presented convolutional neural networks, or ConvNets, in the 1980s. Over the past ten years, the Convolutional Neural Network has shown promise in a number of pattern recognition-related fields, including word recognition and image processing [5]. In the last few years, deep learning has shown promise in a wide range of domains, including speech recognition, natural language processing, and visual perception. Convolutional neural networks are the most researched kind of deep neural networks among all of them. A notable enhancement in image processing and a rise in the quantity of annotations.
D. MalwareDetection Of Deep Learning
Machine learning is used in the selection of pictures, voice and words. Nataraj translated a malicious computer program into a gray image and divided the 9,458 separate malware into the same group [7]. Although Nataraj has isolated a malware program outside of the neural network, the visual approach is attracting the attention of researchers. By using in-depth learning, you can save process time on feature removal and reduce the risk of sandbox analysis.
III. OVERVIEWOFAPPROCH
A malware program is malicious software that creates a legitimate computer program. It is installed in a variety of ways, but the most common are e-mail spam, fake installer, infected attachments, and criminality identity links. Hackers make a malicious program appear to convince users to install it. Often, users do not know that the program is malware because it looks legitimate. Basically, it is the way malware is installed on a computer. Once installed, a malware program hides in various folders on the computer. If it is an advanced version of a computer that is not compatible with the computer, it can directly access the operating system.
Then start encrypting files and recording personal information. The detection of malware is important for the prevalence of malware online as it serves as an early warning system for malware with malware and cyber attacks. It keeps cybercriminals from infringing on your computer and prevents information from becoming compromised. In fact, there are many file types in the package of malicious files after unzip.
There is a source code file, readme file, dictionary file, script, binary file, usable file and DLL file. And there is a large directory or consecutive directory in the harmful file package. It is not enough to use Nataraj's method alone. As a result, we suggest an improved way to detect malware.
VI. ACKNOWLEDGMENT
The authors would like to thank Prof. Rupesh Mishra for providing support by making available all the equipment and a suitable workplace to discuss and implement the idea. Also, the authors thank him for their guidance on the chosen topic.
Conclusion
This project proposes a feasible solution for network administrators to efficiently identify malware at the very inception in the severe network environment nowadays. It also provides a easy user interface so that common public can scan their files and install a solution to stop the spread of malware on their systems.
This study achieves the following objectives:
1) Checking whether the file is secure or malicious.
2) Find a malicious computer program hidden behind benign files.
3) If malicious then extract local features from image generated using CNN and classify into different malware families.
4) Provide a User-friendly GUI so that anyone can check whether file is malicious or not and provide steps to secure and recover your machines.
Besides, in session result and evaluation, we can realize that the proposed method has high true positive rate and low false positive rate and all the accuracy are higher than 95% in whole project.
If we can collect malicious APP and malware in the region of Internet of Things ?IoT? continuously. We will have a much powerful predict model. With the predict model, system administrator can exam the unknown sample quickly and efficiently in the future.
References
[1] https://www.avtest.org/en/statistics/malware/
[2] Stefan Katzenbeisser,Johannes Kinder and Helmut Veith\"Malware Detection\"Springer, Boston, MA. Encyclopedia of Cryptography and Security 2011 Edition pp.978-1-4419-5905-8
[3] David Brumley\"Static Analysis\"Springer, Boston, MA. Encyclopedia of Cryptography and Security 2011 Edition pp.978-1-4419-5905-8
[4] Mihai Christodorescu, Vinod Ganapathy\"Static Analysis\"Springer, Boston, MA. Encyclopedia of Cryptography and Security 2011 Edition pp.978-1- 4419-5905-8
[5] Saad ALBAWI , Tareq Abed MOHAMMED, Saad AL- ZAWI\"Understanding of a Convolutional Neural Network\"2017 International Conference on Engineering and Technology (ICET), 2017 pp.978-1-5386-1949-0
[6] https://towardsdatascience.com/malware-classification- using-convolutional-neural-networks-step-by-step- tutorial-a3e8d97122f
[7] Chia-Mei Chen, Shi-Hao Wang, Dan-Wei Wen
[8] “Applying Convolutional Neural Network for Malware Detection” in 2019 IEEE 8th Joint International Information Technology and Deep Learning Conference (ITADL 2019)
[9] Malware Detection on Byte Streams of PDF Files Using Convolutional Neural Networks Young-Seob
[10] Jeong , Jiyoung Woo , and Ah Reum Kang SCH Media Labs,Soonchunhyang University, Asan 31538, Republic of Korea (3rd April 2019)
[11] Detecting Malware with an Ensemble Method Based on Deep Neural Network Jinpei Yan,1 Yong Qi,1 and Qifan Rao1, Department of Computer Science andTechnology, Xi’an Jiaotong University, Xi’an, Shaanxi, China(12 Mar 2018)
[12] https://www.kaggle.com/c/malware-classification
[13] https://www.hindawi.com/journals/scn/2018/7247095/h ttps://towardsdatascience.com/autodeploy-fastapi-app- to-heroku-via-git-in-these-5-easy-steps-8c7958ef5d41
[14] https://tensorflow-object-detection-api tutorial.readthedocs.io/en/latest/
[15] https://fastapi.tiangolo.com/advanced/custom-response/
[16] https://fastapi.tiangolo.com/async/
[17] https://fastapi.tiangolo.com/
[18] https://www.smashingmagazine.com/2018/01/understan ding-using-rest-api/
[19] https://towardsdatascience.com/an-implementation-and- explanation-of-the-random-forest-in-python- 77bf308a9b76
[20] https://github.com/tiangolo/fastapi/issues/426
[21] https://ianrufus.com/blog/2020/12/fastapi-file-upload/
[22] https://stackoverflow.com/questions/63580229/how-to- save-uploadfile-in-fastapi