Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Sumit Sarkar, Ayush Srivastava, Er. Avneet Kaur
DOI Link: https://doi.org/10.22214/ijraset.2023.49852
Certificate: View Certificate
Weather forecasting is one of the many widely used applications of artificial intelligence. Forecasting precipitation is one of the most popular research topics because it results in a great deal of property damage and numerous fatalities. Large-scale flooding can have an impact on a variety of social and practical spheres, including agriculture and disaster preparedness. Even with the most advanced mathematical techniques, older, widely used precipitation prediction models were unable to achieve higher classification rates. This article introduces a cutting-edge new technique for forecasting monthly precipitation that makes use of linear regression analysis. Using quantitative data about the state of the atmosphere, forecast when it will rain. Complex information can be recognized by some machine learning systems. a mapping that joins inputs and outputs with a small number of samples. Because of how quickly the atmosphere may change, it is challenging to anticipate precipitation with absolute confidence. The variation in conditions from the previous year should be used to forecast the likelihood of precipitation. For several factors like temperature, humidity, and wind, I advise utilizing linear regression. Given that the suggested model frequently estimates precipitation based on historical data for a specific geographic area, this forecast should be more accurate. Comparing the model\'s performance to well-known methods for precipitation prediction, it performs more accurately.
I. INTRODUCTION
The natural environment's most significant characteristic, precipitation, has an impact on a variety of things, including agriculture, water supply, and climate change. Decision-making in a range of businesses depends on accurate precipitation forecasts. Regression analysis is a statistical method for predicting how different variables will interact. Regression analysis may be used to forecast precipitation in this situation.
In a regression study, the best-fit line or curve that illustrates the connection between two or more variables is sought after. The variables that matter for predicting rainfall include the amount of rain, the passing of time, and other meteorological factors including temperature, humidity, and wind speed. A regression model may be created to forecast future rainfall patterns by examining the historical data for these factors. Obtaining pertinent data is the first stage in the process of utilizing regression analysis to forecast rainfall. This information covers past rainfall patterns, timing, and other meteorological elements, including temperature, humidity, and wind speed. In order to forecast future rainfall patterns, a regression model is created using this data. A regression model may then be used to forecast future precipitation patterns. For instance, to forecast probable weekly or monthly precipitation patterns, a regression model can be utilized. This information is useful for many industries, including agriculture.[2][6] It is a tool that farmers can use to plan planting and harvesting operations. Although it may be used to accurately anticipate precipitation patterns, regression analysis is not a perfect method. Forecast accuracy is impacted by a number of factors, including changing climatic patterns and the limitations of the data used to build the model.
To categorize the input data and forecast when it would rain, linear regression was applied. The suggested model may be used to forecast precipitation, lessen different social effects, and proactively plan for disaster aid. Both the categorization of images and the forecasting of precipitation employ a linear regression methodology.[4][8] This section's remaining content is as follows: Part II contains the literature review, Section III contains the articles' techniques, and Part II contains their diverse outcomes. The report's conclusion outlines significant future research that might be incorporated into or added to the suggested study.
II. LITERATURE REVIEW
Several researchers have worked to increase the precision of the machine learning algorithms used in weather forecasting during the past 20 years. Here, a few pertinent research articles are mentioned. The researcher's ANN-based technique for forecasting atmospheric conditions was presented in [18]. Several meteorological variables, including humidity, temperature, and wind speed, were included in the dataset used for forecasting.
Hu (1964) was the first to develop his ANN, a crucial soft computing method for forecasting the weather.[1][9] During the past two decades, significant improvements in the variety of ANNs have led to the development of novel techniques for forecasting environmental occurrences (Gardener and Darling, 1998; Hsiesh and Tang, 1998). Michaelides et al. (1995) examined the efficacy of ANNs using multiple linear regression to extrapolate the missing precipitation data for Cyprus. Kalogirou et al. (1997) employed ANN to replicate the precipitation over time dataset in Cyprus. Lee et al. (1998) split the available data into uniform subpopulations for the purpose of forecasting rainfall. The ambiguous rule base was developed by Wong et al. (1999) using backpropagation neural networks and SOM.[14] The rule base was then utilized to develop a spatially interpolated prec1p1tation forecast model for Switzerland. Toth et al. (2000) thought about using a model for forecasting short-term precipitation to predict floods in real time. was found to be accurate for lead times greater than 3 hours but insufficient to reproduce low rainfall. [17] Several structures of autoregressive moving average models (ARMA), ANNs, and nearest neighbor methods with lead times of 1 to 6 hours were used to forecast storms over the Sieve river basin in Italy between 1992 and 1996. Using information from weather stations, radars, satellites, and the Asian Spectral Model of the Japan Meteorological Agency (JMA), Koizumi (1999) created an ANN model. He used his data from the previous year to train the model. We discovered that for the prediction of precipitation, linear regression, and persistence, the ANN feature performs better than numerical models (after 3 hours). As the ANN model was trained with only 1 year of data, the results were limited. The authors predicted that as more training data became available, neural network performance would increase. It is still unknown how much each predictor impacted the prognosis and how much recent data had an impact.
Abraham et al. in (2001) used the scaled conjugate gradient algorithm (ANN-SCGA) and ANN with evolving fuzzy neural networks to predict precipitation time series (EfuNN). In this work, the training model's input data set included monthly precipitation. The authors looked at his 87-year precipitation data in Kerala, the southernmost state on the Indian peninsula. According to empirical results, pure neural network techniques are outperformed by neuro-fuzzy systems in terms of run time and error rate (5). Precipitation, however, is one of the twenty most complicated and challenging components of the hydrological cycle to comprehend and predict due to its extremely changeable unpredictability over a wide variety of geographic and temporal scales (French et al., 1992).
Research on Precipitation Prediction in Chennai Using Multiple Regression Analysis by S. Sivasankari and M. Punithavallis, 2019: In this work, precipitation in Chennai, India, was predicted using multiple regression analysis. The authors' three-variable multiple regression model, which takes into account the monsoon, the southwest monsoon, and the northeast monsoon, produced the best results when used with data from the Indian Meteorological Department. " Precipitation Forecasting Using Multiple Linear Regression and Artificial Neural Networks "by J. C. Olaniyan and O.A. Ajayis (2020):
This study tested the efficacy of multivariate linear regression and artificial neural networks to forecast precipitation in Nigeria. In tenns of prediction, the authors found that the artificial neural network model performed better than the multiple linear regression model. Utilizing artificial neural networks and linear regression analysis, the study "Vaticination of Rainfall," D.K. Singh and A. Kumar, 2019. [11] In this study, the Indian state of Uttar Pradesh's demise was predicted using artificial neural networks and direct retrogression analysis. According to the authors, neither of the two models exhibited significantly greater performance than the artificial neural network model, although both models had excellent prediction. 2018 saw the publication of O.A. Adeoye and A.A. Olawale's essay, "Rainfall Prediction Using Retrogression Analysis: A Case Study of Nigeria." The decline of Nigeria was described in this paper using retrogression analysis. According to the authors, when comparing the individual performances of the different models, the boxy model fared better than the direct, quadratic, and boxy models.
III. RELATED WORK
Meteorology and hydrology have both employed regression analysis to forecast precipitation. In order to develop an accurate and trustworthy precipitation forecast model, several investigations have been carried out.
The following are some relevant efforts in this area:
Utilizing various machine learning approaches, several research proposals are being made by various scientists. Artificial neural networks were used in research by Deepak Ranjan Nayak [15] to forecast rain. At Pondicherry, rain was forecast by Akash D. Dubey [16]. The quantity of tumor cells in the liver was detennined using data from a study by Rui Lu et al. [15]. Using CT images, tumor cell borders were identified. Although this technique requires a lot of computing time, it is believed to be quite effective at slicing tumor cells and determining their volume. Kostas Haris wrote a piece for Hybrid His Image in [16]. His segmentation method, which blends catchment morphological procedures with edge-based and region-based methodologies,
The method was successful since it decreased the number of erroneous edge detections. Noise emission indirectly increased computation processing time. For supervised image co-segmentation, Fanman Meng [19] created an effective and reliable active contour model and color reward technique. J. Zhao et al. (2019), "Predicting daily rainfall in the Yangtze River basin," used a hybrid ensemble strategy to estimate daily rainfall in the Yangtze River basin in China using multiple regression, an supported vector machine and also used artificial neural network models.
Amiri et al. (2016) stated the following in research titled "Prediction of Rainfall Using Multiple Linear Regression and Artificial Neural Network Models": This study examines the effectiveness of multivariate linear regression and artificial neural network models for predicting rainfall. The findings indicate that the accuracy of the multiple linear regression model is below that of the artificial neural network model.
Rathod et al., "Rainfall forecast using hybrid artificial intelligence methods."[l 0] This paper suggests using genetic algorithms with artificial neural networks as hybrid artificial intelligence methodologies. The findings demonstrate that the suggested models can predict rainfall with accuracy.
Gradient boosting and random forest ensemble learning algorithms are suggested by Jena and Mahapatra in their 2020 work, "Rainfall prediction using ensemble learning approaches," for predicting rainfall. The findings demonstrate that the suggested models outperform conventional regression models and are very accurate in forecasting rainfall.
Regression analysis has been extensively studied for the purpose of predicting rainfall, with much of this research emphasizing the development of accurate and reliable models. [13]In one similar study, Khan et a2020) proposed a mongrel model grounded on linear regression and support vector regression ways for prognosticating yearly downfall.
The study found that the hybrid model perfonned better than traditional regression models.
In a different study, Sujatha and Shobha (2019) developed a model for the seasonal rainfall forecast in India using multiple linear regression. Two of the five predictor variables that the authors utilized were the air pressure and the sea surface temperature, and they reported a 92% forecast accuracy.
In a recent work, Tiwari et al. presented a machine learning-based method to anticipate rainfall in the Indian Himalayan region (2021). The authors' predictions, which used the gradient boosting regression method, were 95.17% accurate. The study emphasizes how approaches based on machine learning might improve rainfall forecast accuracy.
Last but not least, a number of regression-bed models have been developed to predict rainfall with accuracy. The use of machine learning techniques, on the other hand, has gained popularity recently and gives better accuracy and reliability when predicting rainfall.
IV. PROPOSED APPROACH
The linear regression method is used by the suggested system. We used a range of city datasets for our investigation. The toolbox reads the file after receiving it as input. The first year's rainfall graphic is then created once the dataset is translated into a distinctive data text. Different parameters are given to start the whole network when using linear regression to construct the dataset. While the expected values are determined using the toolbox, the projected values are displayed on graphs. To choose the data, two criteria are applied. A diversity of climates, from severely dry to highly wet, is the first prerequisite.[5] In addition, the cities were chosen for their distinctive geographic settings. So, although being far apart, the selected places could have similar climates. the desire to make a decision in light of these. We don't favour any one type of environment or location over another in our trials, which is one of two criteria. Also, this enables us to look for patterns and evaluate whether certain climates are easier to predict than others.
The rainfall dataset is used as input during the pre processing stage of the linear regression model.
The feature is extracted using a linear regression model. [20] The main goal of this research is to evaluate the several methodologies presented by the authors in order to create a real-time rainfall forecast system that corrects the flaws in earlier approaches and provides the most accurate solution.
The smaller and larger forecasts are shown in Figure 4.
A. Data Exploration and Analysis
Data analysis is done to ensure that future discoveries will be near, making forecasts more reliable and accurate. It is only possible to obtain this assurance that the data was collected properly once the raw data has been examined and reviewed for anomalies. A prediction model can benefit from identifying the variables that include irrelevant properties.
B. Data Pre-processing
A data mmmg approach called pre-processing transforms inaccurate, unstructured input into a format that the model can use and interpret. Raw data has many flaws, is untrustworthy, and contains several inaccuracies. Through data exploration and analysis, we discovered that a large fraction of the null values in the raw data for our model needed to be replaced with their mean values.[21] We may also take care of the missing values by removing any excess columns or rows. Categorical data must be converted into numeric form as the model is based on mathematical calculations and equations. Just the characteristics relevant to our model for predicting rainfall are chosen in the pre-processing step of feature selection. As a result, the model's accuracy increases and training time is decreased. Feature scaling, the final stage of the pre-processing procedure, entails putting the independent variables into a pre-specified range such that no one variable dominates the others.
C. Modelling
The suggested paradigm organizes redeemed meteorological data following purification, pre processing, and organization. The criteria of the Indian Meteorological Department are used to classify rainfall data into a number of classes. In this paper, we describe a rain prediction system that uses machine learning classification techniques. 70% of the pre-processed data is used for training, while 30% is used for testing. Each of the four separate machine learning algorithms' outputs from the portioned data is examined before revealing the precise, final result. The next part goes through how each classifier functions.
The first stage is selecting randomly selected samples from a certain dataset.
Step 2: Each data sample's decision tree is created, and then each decision tree is used to make a prediction.
Step 3: Voting will then be held on each predicted result.
Step 4: Choose the forecasted result that will earn the most votes.
4. Decision Tree: The decision tree algorithm is a classification method that functions on both categorical and numerical data. It creates tree-like structures and analyzes the information in a graph that resembles a tree. This method aids in splitting the data into two or more groups that are connected based on the most important indications. To divide the data into predictors with the largest information gain or lowest entropy, we first calculate the entropy of each characteristic. The results are easier to read and comprehend. This method outperforms others in terms of accuracy since it examines the dataset in a tree-like graph. The decision tree method is applied to both classification and regression in machine learning. In a decision tree, each branch knot stands for an essential decision, whereas splint bumps signify a resolution. Decision Wood effectively uses both categorical and continuous variables, despite the fact that our item variable (downfall) in the current study is a double categorical variable. It is known that decision trees are built using the methods C5.0, Chi-squared Automatic Interaction Detection (CHID), ID3, Quest, Bracket and Retrogression Trees (WAIN), and C4.5.[12] The C5.0 was picked for the current discussion and used with the three training-to-testing rates. The C5.0 algorithm is a more complex version of the TD3 and C4.5 algorithms.
5. Multiple linear Regression: Is the process of creating a regression model to forecast the dependent variable using a large number of independent variables (rainfall). The model assumes a linear relationship between the independent factors and the dependent variable. The magnitude and axis of the connection between the independent variables and rainfall are displayed in the model's coefficients. Statistics like R-squared and modified R-squared can be used to evaluate the model's accuracy.
6. Polynomial Regression: Using this technique, a polynomial equation may be fitted to the rainfall data by adding polynomial components to the linear regression model. The higher-order tenns can reflect the more complex relationships between the independent variables and rainfall. In the case of overfitting, which occurs when there are too many polynomial terms incorporated, the model may perform well on training data but poorly on test data.
7. A method known as "time series analysis" looks at previous trends in rainfall data over time to predict future precipitation. With the data, time-series models may spot seasonality, trends, and cyclical patterns. These models may be evaluated using metrics such as mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE).
D. Evaluation
The efficacy of various algorithms may be examined using a wide range of assessment criteria. [18] The current study focuses on the confusion matrix, which serves as the foundation for the metrics mentioned above as well as accuracy, precision, recall, and f measure. As a result, the measures are characterized as follows:
E. Confusion Matrix
A matrix that summarizes the effectiveness of the model is the output of the confusion matrix, as shown in Table 1. Where:
Inferring the mathematical formula for assessment measures like recall, precision, accuracy, and the Fl score from equations 7, 8, and 9 and 10 suffices.
V. EXPERIMENTAL RESULTS
The experimental findings for the provided model were acquired using Jupyter Notebook. Using historical rainfall data for the years FY 2018-2022, the model was trained. It also included a number of other weather-related details from various years.
The model-building procedure is divided into four steps, which take place in that order:
5. Making a comparison between the desired outcome and the anticipated result
The suggested linear regression model in the research classified the types of rainfall into various categories. It should be noted that there are different dry and wet zones. We demonstrate the suggested model's correctness. The graph shows that, during the period FY 2018-2022, there is comparatively minimal misinterpretation of rainfall projections. Comparing the suggested paradigm to other systems, it is incredibly effective.
To categorise and forecast the rainfall, the model used linear regression. Using the linear regression function, the entire model was built in Jupyter Notebook. After building the model utilising the software's various characteristics, the training dataset was loaded. On the model that provided the features, training was made available. The findings are given in the next figure, which indicates how successfully the network operated. The model was successful in predicting the amount of rainfall for a specific time frame.
These distinctive experimental findings highlight the adaptability of regression analysis and the potential use of various forecasting strategies for precipitation. Remember that the kind, quantity, and regression technique of the data used, as well as the calibre and causality of the data itself, all affect how accurate the predicting is.
The conclusions are given as figures after being compared to the actual and projected data. Algorithm accuracy scores are created through efficient computing and entered into confusion matrices to discover causes.
While just over 20% of the entire data set is made up of test data, 197 predictions are created. For example, in Figure No. 5. (v), the Score for Logistic Regression Accuracy reveals that 159, 2 and 30 are the accurate predictions, however 8 are erroneous.
The accuracy score percentage for logistic regression is around 95.6%, compared to 96.6% for linear regression. By dividing correct predictions by all guesses, the accuracy score is determined.
Overall, regression analysis has been shown to be an effective method for predicting rainfall, and researchers continue to explore different models and techniques to improve the accuracy of these prediction.
Regression analysis is a useful statistical technique for forecasting precipitation, and it has been used in several studies to determine the relationship between precipitation and other meteorological factors including temperature, humidity, wind speed, and air pressure. These studies\' findings have demonstrated that regression analysis may accurately predict rainfall and be used to further our knowledge of weather patterns and how they affect the environment and civilization. Yet, the accuracy of these forecasts may vary on the quality and volume of the data utilised in the study, as well as the specific models and methodology applied to perform the research. So, further research is required to refine and improve these models in order to provide more accurate and reliable predictions of rainfall. A statistical approach called regression analysis aims to discover the link between a dependent variable (in this case, rainfall) and one or more independent variables (such as temperature, humidity, wind speed, etc). (such as temperature, humidity, wind speed, etc.). The link may be used to forecast future values of the dependent variable based on the values of the independent variables. Regression analysis may be used to find the meteorological elements that have the best relationships to rainfall and then be used to develop a regression model that can anticipate future precipitation based on those variables. The quantity and quality of the data gathered, the suitability of the independent variables, and the regression technique selected all have an impact on how reliable the forecast is. Many regression methods, such as logistic regression, multiple regression, and linear regression, can be used to predict rainfall. Each approach has its own benefits and drawbacks, and the selection of a technique depends on the specifics of the research question and the qualities of the data. In addition to regression analysis, various machine learning approaches such as Artificial Neural Networks (ANN), Decision Trees, and Random Forest may also be used to forecast rainfall. These algorithms have been widely applied in meteorology as they have been demonstrated to be good in forecasting rainfall in diverse regions. Overall, the ability to forecast rainfall through the use of regression analysis or other machine learning algorithms is a crucial field of study with implications in agriculture, water resource management, and disaster planning.
[1] M.J.C., Hu, Application of ADALINE system to weather forecasting,Technical Report, Stanford Electron, 1964. PP- 2 [2] Michael ides, S. C., Neocleous, C. C. & Schizas, C. N. \"Artificial neural networks and multiple linear regression in estimating missing rainfall data.\" In: Proceedings of the DSP95 International Conference on Digital Signal Processing, Limassol, Cyprus. 1995. PP- 1 [3] Kalogirou, S. A., Neocleous, C., Constantinos, C. N., Michaelides, S. C.& Schizas, C. N.,\"A time series construction of precipitation records using artificial neural networks. In: Proceedings of EUFIT \'97 Conference, 8-11 September, Aachen, Gernrnny. 1997.PP-5 [4] Lee, S., Cho, S.& Wong, P.M.,\"Rainfall prediction using artificial neural network.\",J. Geog. Inf. Decision Anal. 1998. PP- 2 [5] Wong, K. W., Wong, P. M., Gedeon, T. D. & Fung, C. C., \"Rainfall Prediction Using Neural Fuzzy Technique.\" 1999. PP- 3 [6] Koizumi, K.: \"An objective method to modify numerical model forecasts with newly given weather data using an artificial neural network\", Weather Forecast., 1999. PP- 1 [7] Ben Krose and Patrick van der Smagt , \"An introduction to neural networks\", Eighth edition, November 1996. PP- 5 [8] Ajith Abraham, Dan Steinberg and Ninan Sajeeth Philip,\"Rainfall Forecasting Using Soft Computing Models and Multivariate Adaptive Regression Splines\", 2001. PP- 2 [9] Paras, Sanjay Mathur, Avinash Kumar, and Mahesh Chandra, \"A feature based on weather prediction using ANN\"World Academy of Science, Engineering and Technology 2007. PP- 2 [10] E.Toth, A.Brath, A.Montanari,\" Comparison of short-term rainfall prediction models for real-time flood forecasting\", Journal of Hydrology 239 (2000). PP- 3 [11] L. L. Lai, H. Braun, Q. P. Zhang, Q. Wu, Y. N. Ma, W. C. Sun, and L. Yang, \"Intelligent weather forecast,\" in Proc. IEEE 2004 International Conference on Machine Leaming and Cybernetics, 2004. PP- 2 [12] N. Hasan, M. T. Uddin, and N. K. Chowdhury, \"Automated weather event analysis with machine learning,\" in Proc. IEEE 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET), 2016. PP- 2 [13] 1. Rahman, M. M., Bhattacharya, P., & Desai, B. C. (2007). A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback. IEEE Transactions on Information Technology 111 Biomedicine, 11(1). PP- 6 [14] Delhi Weather Data. [Online]. Available: https://www.kaggle.com/mahirkukreja/delh i weatherdata/home. PP- 3 [15] Morales, M., Tapia, L., Pearce, R., Rodriguez, S., & Amato,N. M. (2004). A machine learning approach for featuresensitive motion planning. In Algorithmic Foundations of Robotics VI. Springer, Berlin, Heidelberg. PP- 4 [16] Ireland, G., Volpi, M., & Petropoulos, G. P. (2015). Examining the capability of supervised machine learning classifiers in extracting flooded areas from Landsat TM imagery: a case study from a Mediterranean flood. Remote sensing, 7(3). PP- 5 [17] N. Q. Hung, M. S. Babel, S. Weesakul, and N. K. Tripathi \"An Artificial Neural network Model for rainfall Forercastingin Bangkok,Thailand\", Hydro!. Earth Syst. Sci., 2009. PP- 6 [18] Kyaw Kyaw Htike and Othman 0. Khalifa, \"Research paper on ANN model using focused time delay learning\", International Conference on Computer and Communication Engineering (ICCCE 2010), 11-13 May 2010, Kuala Lumpur. PP- 3 [19] Dr S. Santosh Baboo and I. Khadar Shareef, \"An efficient Weather Forecasting Model using Artificial Neural Network\", International Journal of Environmental Science and Development, Vol. I, No. 4,October 2010. PP- 2 [20] Enireddy Vamsidhar et. al.,\"Prediction of rainfall Using Backpropagation Neural Network Model\", International Journal on Computer Science and Engineering Vol. 02, No. 04, 2010. PP- 3 [21] A. G. Salman, B. Kanigoro, and Y. Heryadi, \"Weather forecasting using deep learning techniques,\" in Proc. IEEE 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2015. PP- 3
Copyright © 2023 Sumit Sarkar, Ayush Srivastava, Er. Avneet Kaur. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET49852
Publish Date : 2023-03-27
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here