Flood Prediction Using ML Classification Methods on Rainfall Data

Authors: Thegeshwar Sivamoorthy, Asif Mohammed Ansari, Dr. B. Sivakumar, V. Nallarasan

DOI Link: https://doi.org/10.22214/ijraset.2022.41297

Abstract

Floods are among the most destructive, most complex natural disasters to mimic. Research on the development of flood forecast models has contributed to risk reduction, a policy proposal, reduction of human lives, and mitigation of flood-related property damage. To mimic the complex statistical manifestations of natural flood processes, over the past two decades, neural network approaches have contributed significantly to the development of predictive systems that provide better performance and cost-effective solutions. To prevent this problem predict the occurrence of floods or not with a rain database by investigating neural network-based strategies. Database analysis by Multi-Layer Perceptron Classifier (MLP) to capture a number of details such as dynamic identification, deficit treatment, data validation, and data cleaning/preparation will be done across the given database. To apply flood forecasts for or without accurate calculation in the class division report, find the confusion matrix and the result shows the efficiency of the python frame-based flask based on the given attributes.

Introduction

I. INTRODUCTION

The flood problem is as old as time. However, although the natural floods of large areas did not create the most dangerous conditions in the prehistoric world, the increase in human activity and cities has led to the prevention of flood damage. Since the end of the eighteenth century, with the advent of the industrial era, there have been two phases of action: hydraulic activities in the area, such as land reclamation activities, which often undermine global balance-based, river flow, especially in mountainous and mountainous areas, leading to flooding at first.

II. LITERATURE SURVEY

The timely ongoing flood alarms classification in industrial monitoring systems is essential to be able to provide a safe and efficient operation. It can provide online support so that plant operators can take action in time, without having to wait for the alarm. A data-driven approach is proposed this article to address the issue of premature classification of non-label history data. In order to give foremost importance to pre-set alarms and take utmost benefit of the alarm time alert information, a vector representation exponentially attenuated component (EAC) is used to forecast alarm levels. In this article, a slow-moving approach to GMM-based data was planned to address the problem of premature segregation of continuous alarm floods with not documented historic data. Also, a vector presentation called the EAC was used to modify the sequence of alarm levels into vectors, which abridged the computational complexity they ran into in online mining methods. It cannot better determine the frequency of rainfall data and obtain more accurate flood prediction results due to a lack of results analysis in the form of confusion matrix

III. PREDICTION TECHNIQUES USED

Logistic Regression: This predicts the outcome of a class-based variance. Therefore, the result will be a grouping or different value. Either yes or no or 0 or 1, true or false, etc. but instead of giving a straight forward value such as 0 and 1, it provides probable values ??between 0 and 1.
Decision Tree: Decision Tree is a supervised learning method which can be used for classification and regression problems, both. It is a tree-shaped divider, where the internal nodes signify the elements of the dataset, the branches signify the rules of decision and each leaf node signifies the result.
Random Forest: This is a unique method in machine learning used to resolve classification and regression difficulties. It uses ensemble (integrated) learning. Ensemble learning is a multidisciplinary approach to providing answers to multifaceted problems. This algorithm contains many decision trees.
SVM: One of the most popular algorithms to solve Classification and Regression problems which is built with Supervised Learning in mind, it is mainly used for Classification Problems in Machine Learning.
KNN: This is a non-parametric supervised learning method. It is used for classification and regression. The input consists of the k closest training instances in a data set for both Classification and Regression. Whether K-NN is used for classification or regression depends on the output. The output is a class membership in K-NN classification.
Naïve Bayes: It is a division of "probabilistic classifiers" which is based on relating Bayes' theorem with strong independence expectations between the structures. They belong to basic Bayesian network models, however it is combined with kernel density approximation, High accuracy level can be achieved.
Standard Scaler (Enhancement Algorithm): It is a technique which can be used to regulate the range of variables or features which are not dependent of data. As the collection of values of fresh data differs, in few machine learning algorithms, objective functions do not work correctly without normalization.

IV. DATA DESCRIPTION

In this project we took rain data from a well-known database site called Kaggle. The database (.CSV) has a size of 597KB and contains monthly rainfall details of less than 36 parts of India's meteorological metrological data. The data consists of 641 lines and 21 columns indicating the amount of rainfall in each region in India received in 1951-2000. Each column has a data parameter such as the name of the region, the month of data collection, the total rainfall of the year, the occurrence of floods and the estimates of specific months. We select this dataset to analyze and predict flood events. A small section of the dataset is displayed in Table (i).

V. LIST OF MODULES

A. Data Pre-processing

B. Data Analysis of Visualization

C. Implementation of Logistic Regression

D. Implementation of Random Forest

E. SVM Implementation

F. Decision Tree Implementation

G. KNN Implementation

H. Naïve Bayes Implementation

I. Standard Scalar Random Forest (Enhancement) Implementation

J. Deployment Using Flask

VI. RESULT AND DISCUSSION

We have considered previous rainfall data where rainfall patterns for the months January and February, March to May, June to September and October to December are the characteristics used to predict future flood emergencies. We used four flood prediction methods such as Linear Regression Prediction, Decision Tree, Random Forest and SVM and collected the accuracy and effectiveness of these four methods. We then compared which algorithm had the most accuracy and best performance, and used the algorithm with the best performance in the machine learning model.

VII. THE FUTURE WORK

Predicting flood using data from other natural occurrences such as:

Water level data
Seismic data

Conclusion

The systematic process initiated by the process of data cleaning, processing the missing values, exploratory analysis and in the end building the model for evaluation. The performance and accuracy on test data is taken into consideration and the model with highest performance and accuracy is implemented in the machine learning model. This application can help to predict future floods due to rainfall. The Random Forest algorithm has been implemented in the project website using Flask as it has the best accuracy and performance out of all the four algorithms tested.

References

[1] Pengzhan Cui, Yeqing Guan, Ying Zhu. \"Flood Loss Prediction of Coastal City Based on GM-ANN\" International Conference on Grey Systems and Intelligent Services (GSIS):2017 [2] Swapnil Bande, Virendra V. Shete. \"Smart flood disaster prediction system using IOT & Neural Networks\" International Conference On Smart Technologies For Smart Nation (SmartTechCon):2017 [3] Febus Reidj G. Cruz, Matthew G. Binag, Marlou Ryan G, Francis Aldrine A. \"Flood Prediction Using Multi-Layer Artificial Neural Network in Monitoring System with Rain Gauge, Water Level, Soil Moisture Sensors\" TENCON 2018 - 2018 IEEE Region 10 Conference:2018 [4] Indrastanti R. Widiasari, Lukito Edi Nugoho, Widyawan, Rissal Efendi. \"Context-based Hydrology Time Series Data for A Flood Prediction Model Using LSTM\" 5th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE):2018 [5] Haniyeh Seyed Alinezhad, Jun Shang and Tongwen Chen. ”Early Classification of Industrial Alarm Floods Based on Semi-Supervised Learning” IEEE :2021 [6] T Gurleen Kaur, Anju Bala. ” An Efficient Automated Hybrid Algorithm to Predict Floods in Cloud Environment” IEEE Xplore :2019 [7] Manomy K V, Meghna C S, Renuka Jayan1, Raghi R Menon. ”Flood Prediction and Tracking Trapped” IJERT Vol 9,Issue 6 :2020 [8] J. E. Reynolds, S. Halldin, J. Seibert, C.Y. Xu & T. Grabs .”Flood prediction using parameters calibrated on limited discharge data and uncertain rainfall scenarios” Hydrological Sciences Journal, 65:9 :2020 [9] Xiaodong Ming, Qiuhua Liang, Xilin Xia, Dingmin Li, and Hayley J. Fowler . ”Real?Time Flood Forecasting Based on a High? Performance 2?D Hydrodynamic Model and Numerical Weather Predictions” AGU Water Resources Research Vol 56 Issue 7 :2019 [10] Sharad Kumar Jain, Pankaj Mani, Sanjay K. Jain, Pavithra Prakash, Vijay P. Singh, Desiree Tullos, Sanjay Kumar, S. P. Agarwal & A. P. Dimri. ”A Brief review of flood forecasting techniques and their applications.” DOI:10.1080/15715124.2017.1411920 :2017

Copyright

Copyright © 2022 Thegeshwar Sivamoorthy, Asif Mohammed Ansari, Dr. B. Sivakumar, V. Nallarasan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET41297

Publish Date : 2022-04-07

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here