A Survey on Time Series Forecasting Approaches and Applications

Authors: Apoorva Thakur, Sandeep Monga

DOI Link: https://doi.org/10.22214/ijraset.2022.41879

Abstract

This Time series forecasting (TSF) assists in making better strategic decisions under uncertain circumstances so that financial crisis can be avoided, wise investments can be made, under/over contracting of utility can be avoided, staffs can be scheduled appropriately, service providers can provide better service, mankind can get prepared for natural disasters and many more.However, the accuracy in forecasting plays a vital role and achieving such is a challenging task owing to the vagueness and nonlinearity associated with most of the real world time series. Therefore, improving the forecasting accuracy has become a keen area of interest among the forecasters from different domains of science and engineering. In this work a survey on time series forecasting approaches on various applications has been performed.

Introduction

I. INTRODUCTION

Forecasting models are a mathematical equation that represents that occurs between the final output and the various elements that are interacting with them. Forecasting Models in agriculture are equating using a non-universal assumption to find the result based on the existing data that are sorted visibly, arranged statistically or advanced techniques like Data Mining, Neural Scheme Models. A longitudinal measure in which the process generating returns is identical over time can be termed as a Stationary Time Series. The data in the series does not depend on time. Stationarity implies the statistical property and its nature has a constant value and remains unchanged in its condition. A stationary time series comprises of properties marked with constant mean, autocorrelation, variance over time. Most forecasting methods based on stochasticity are based on the postulates that time series can disperse approximately stationarity with the help of mathematical function. The series becomes convenient to predict as their statistical properties will remain the same in the future as reflected in the past. The obtained values under the stationary series can be “reversed” by manipulating the mathematical transformation that was previously employed in order to retrieve the values of the original series, which is solved mostly by modern statistical software. Thereby, the process to find the transformation sequence in order to stationaries the time series leads to vital leads in the finding of an appropriate forecasting model. The properties with respect to time do change in a stationary time series (Makridakis et al., 1998). The stationarity in a time series under the stochastic process is marked with lag for the variance and autocovariance not depend on time (Harvey, 1993). ARIMA, (Auto-Regressive Integrated Moving Average) is the most general class of stationary models for forecasting in a time series. A time series is an order of data points being documented at definite times Time series analysis deals with the science of data collected through time called “historical data”, it has a vast application extending in varied areas of academics especially in science and engineering with special reference to statistics and signal processing. Time series data possess natural temporal ordering making it distinct from other data issues with no natural ordering of observation, it also differs from spatial data analysis where geographical locations are generally related to. Time-series Analysis originated in 1880 when TN Thiele formulated and analyzed the time series model of Brownian motion assumed as the sum of regression component and white noise. A strict stationary process is always a covariance process while a covariance process is a strictly stationary process if and only if the covariance process is normally distributed. In practice, it is enough to take covariance stationary processes, the term stationary corresponds to covariance stationery (Sariaslan, 2010). The prediction achieved in the process which in time came to know a “Kalman”’ filtering revealed a ground-breaking development of forecasting techniques today generally known as Time Series Analysis. The most prevalent and regularly used stochastic time series models are the Autoregressive Integrated Moving Average (ARIMA) model. It is assumed that time series is linear and a particular distribution is followed by it. ARIMA model has subclasses of other models, such as Autoregressive (AR), Moving Average (MA) and Autoregressive Moving Average (ARMA) models. For seasonal time series forecasting, Box and Jenkins had proposed a quite successful variation of the ARIMA model, viz. The Seasonal ARIMA (SARIMA). ARIMA finds its popularity due to its flexible nature and capability to represent variety of time series. But the severe limitation of these models is the pre-assumed linear form of the associated time series which becomes inadequate in many practical situations.

To overcome this drawback, various non-linear stochastic models have been proposed in the literature; however, from the implementation point of view, these are not as straightforward and simple as the ARIMA models. Development of forecasting models in this space received major contributions from Yule (1927) who spearheaded the development of stochasticity in time series by postulating that every one such series can be considered as a realization of a stochastic process. This idea has influenced several similar works, prominent among them are Slutsky, Walker, Yaglom, responsible for formulation of AR (Auto-Regressive) and MA (Moving Average) models along with Yule. Linear Forecasting model development was led by Kolmogorov (1941) based on the Wold’s decomposition Theorem. This development has helped in the emergence of a considerable amount of literature dealing with parameter estimation, model identification methodology and checking, and subsequent forecasting. Time series models and its application in forecasting have created an impressive impact on practical domains which has attracted a lot of research works. Time-series data analysis finds application in a wide variety of temporal ordering forecasting models, mainly in econometrics and agricultural crop models.

II. LITERATURE REVIEW

A. Linear and Nonlinear Regression Growth Models

Growth rates analyses are widely employed for describing the long-term trends in variables over time in various agricultural crops [1]. The relationship within various variables in agricultural sciences is vastly “non-linear” in nature. Regression growth models are a set of equations representing how a system behaves. These models with agricultural parameters introduced as variables can help in determining the flow of data patterns that in turn predicts the value of such parameters in the future. This characteristic is largely applied in agriculture-related activities to forecast the production of crops and help understand the market dynamics. Such regression models can be linear or nonlinear depending on the linearity of the parameter. Linear regression mainly deals with finding the best-fitted line through the data points. This line is termed as the Regression line. Non -linear regression is one such wherein variable or variables exhibit non-linearity. The most commonly used nonlinear growth models, viz. Gompertz, Monomolecular, Logistic, Weibull, Richards, MMF, Exponential, and Von Bertalanffy. Kumar et al (2012) [2] applied linear regression models with other multiple regression models for paddy, sugarcane wheat and paddy yield forecasting for two districts of Gujarat (Navsari and Bharuch). The statistical data on weather and crop yields from Navsari 31 years (1980-2010) and Bharuch 27 years (1984-2010) was used. The yield data and weather variables generated for the period of 27 years and 23 years for the district of Navsari’ and Bharuch were used to construct the model for both districts. Depending on the strongest R2 and substantial P-value, major weather variables are retrieved. The trial and error framework was used to implement a multiple regression evaluation. The models were evaluated from the two districts using 4 years of independent data collected from (2007-2011). Throughout the verification period, paddy, sugarcane, and wheat model discrepancies in the Navsari district ranged from -7.30 to 3.41 percent, 1.68 to 2.05 percent, and -8.27 to 11.51 percent. Likewise, paddy, sugarcane and wheat variations in the Bharuch model ranged from 5.35 to 11.76 percent, between-12.65 to 7.18 percent and from-12.07 to 6.86 percent overall. Sanchez et al. (2011) [3] tested the economic capacity of the vineyard by observing the vineyard leaf surface as variable and by exemplifying the thermal and light microenvironment of the grapevine vegetation in their analysis termed ”Estimation of vineyard leaf area by linear regression.” the researcher applied linear regression model to verify the methodology of Lopes and Pinto to determine the leaf area of the vineyards in central Spain and with the varieties of the area, The results obtained were contrasted by a conventional and reliable but much more laborious non-destructive direct approach to those given. Regression analysis of data extracted for the Lopes and Pinto approach indicated that for the examined shoots of each type only three field-measured variables should be recorded: the region of the largest leaf, the region of the smallest leaf and the number of leaves. Perez et al. (2013) [4] in their study titled ”Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression” used the linear regression model and artificial neural networks model to estimate the cetane number of biodiesel from their fatty acid methyl ester composition. For the obtaining of models to predict the cetane number, experimental data from literature reports that covers 48 and 15 biodiesels in the modeling-training step and validation step respectively were taken. A model was generated for estimating cetane numbers using an artificial neural network with higher accuracy than 92 percent, excluding one anomaly. A backpropagation network (11:5:1) working on the process Levenberg–Marquardt algorithm for the second phase of the network testing and exhibiting R= 0.9544 for the verification data was the optimal neural network for forecasting the cetane number. Chandler et al (2016) [5] in their research termed “Predicting hyperketonemia prevalence in Jersey herds from milk composition and cow test-day information using multiple linear regression” implemented linear regression to verify and forecast the presence of hyperketonemia among Holstein Cows. It required however separate models for Jersey Cows considering differences in hyperketonemia and composition of milk.

It made use of multiple linear regression models to estimate the level of beta-hydroxybutyrate (BHBA) in the composition of milk and contrast it with samples from Jersey to serve them as diagnostic tools for the determination of ketosis in herds. Milk samples along with blood samples were collected from about 468 Jersey cows in six selected dairy farms. Colorimetric Assay was used to measure the level of Serum BHBA concentrations. Milk components like BHBA, Acetone, and Fatty acids were analyzed using the Fourier Transform Infrared (FTIR) spectrometry from Milkoscan FT+ along with regular milk analysis variables. Test variable data was exported from DairyComp305 on a daily basis. The analyzing models were built using the REG procedure of SAS 9.4 combining stepwise selection by omitting variables with P-value > 0.15. Models were also built on the basis of selection criterions viz SSS, AIC. Statistical criterions like R2 , root mean square error was analyzed to determine the performance of the model. Hyperketonemia (serum BHBA) greater than eqaul to 1.2 mm was found in the measure 20% within the sample set. Data rigidity led to separate model developments for primiparous and multiparous groups. Post evaluation it was found that model accuracies for multiparous cows was 91% for multiparous cows 5 to 11 DIM (R2= 0.85), 86% for multiparous cows 12 to 20 DIM (R2 = 0.64), 90% for primiparous cows 5 to 11 DIM (R2 = 0.64), and 90% for primiparous cows 12 to 20 DIM (R2 = 0.83). Collectively, models predicted animals with hyperketonemia at the 1.2 mM threshold with 86% accuracy. Results concluded that modeling blood BHBA based on milk composition data and cow-test day information provides a practical tool for monitoring hyperketonemia prevalence in Jersey herds. Bustamante et al. (2014) [6], conducted a research title “Attribute Selection Impact on Linear and Nonlinear Regression Models for Crop Yield Prediction” examined the need for an effective yield estimation model. The study included best practices of contemporary data-driven techniques in the compilation of models, their comparison with analysis on best models to be selected for the forecast. The study made an exception of model selection feature by concentrating on selection based on expert assessment and stress on dimensionality reduction algorithm. This research was concerned with the data-driven modeling techniques in the prediction of crop yields with the help of a comprehensive attribute-based subset designated for each model. The researchers ranked Multiple linear regression, stepwise linear regression, M5 regression trees, and artificial neural networks (ANN) were ranked based on the real data of eight different crops in Mexico. Model validation was done using three forecasting metrics Correlation Factor, Root-Relative Square Error, Relative mean absolute error. The result from the research revealed that ANNs displayed consistencies among the best attribute subset during the learning and training stages having the lowest average RRSE (86.04%) and the highest average correlation factor (0.63). Shen et al (2010) [7] in their research titled “Large-area rice yield forecasting using satellite imageries” applied linear regression models along with other techniques like canopy reflectance band ratio (NIR/RED, NIR/GRN) of paddy rice. The research continued for the period 1999-2005 combined with correlation analysis to develop regression-based yield. forecasting models for the first and second crop. The models were then analyzed with real-time data as noted in 2007 and 2008 in eight different sites encompassing different soil properties, weather parameters, Nitrogen application rates and accumulation of surface reflectance using atmospherically graded SPOT imageries. The test results showed that the root mean square error for forecasting the yield per unit area was less 0.7Tha-1 for both seasons. Kumar et al. (2012) [2] analyzed the non-linear statistical growth process in the forecasting of coffee production in India. The study included six non-linear statistical growth models viz Monomolecular, Gompertz, Logistic, Richards, Weibull, and MMF, which were applied in the historical data of coffee production (lakh tonnes). The most appropriate model was selected based on the goodness of fitness criteria viz MSE, MAPE, AIC, and BIC. The result found that Logistic and MMF models were the most appropriate models in describing the coffee production patterns in India. Furthermore, the study yielded that both the models performed well for the production analysis during the years 2015 and 2020. Basak et al. (2017) [8] Combined analysis of six statistical growth models viz Gompertz, Logistic, Linear, Quadratic, cubic and other non-linear models to evaluate the growth pattern of insect population for West Bengal for the year 2015. The goodness of fit was based on the forecasting criterion viz MAE, MAPE, and BIC values. It was found that the cubic model was the best-fitted model in the forecasting of the insect population.

B. ANN Based Forecasting Methods

Over the past one century, time series forecasting (TSF) is one of the key fields of research in statistic, management studies and now in computer science. Efficient TSF have significant practical value and widespread usage in many fields of endeavor. Traditionally, TSF have been performed predominantly using statistical-based methods. However, over the past few decades, due to unprecedented growth in computational power of machines and availability of variety of time series, machine learning (ML) techniques like ANN have become one of the forefront models in TSF. Compared to traditional statistical techniques, ML techniques like ANN possess several distinctive characteristics viz. nonparametric, data driven, nonlinear, flexible models and thus have a greater capability to capture various underlying complex relationships existing in time series.

Therefore, a variety of TSF methods based on machine learning techniques have been developed and applied in diversified application areas ( as summarized in the review papers [9]–[12]). ANNs are one of the most popular classes of ML models which have gained overwhelming attention in TSF [9], [10] and this thesis also restricts itself to such models. Therefore, in this chapter a review is made to address the issues existing in the literature and further research needs pertaining to the use of ANN in TSF and fuzzy TSF.

Over the last few decades, various class of ANN models viz. feed forward neural networks [13], recurrent neural networks [14], single layered neural networks [15], multilayer neural networks [16], real-valued neural networks [17] and complex-valued neural networks comprising a variety of ANN models viz. multilayer perceptron (MLP) [16] , deep belief network [16], adaline, stochastic neural network, extreme learning machine, radial basis function networks (RBFN) [18], beta basis function neural network (BBFN) [19], generalized regression neural network (GRNN), functional link artificial neural network etc.

Have been used in TSF. To improve the forecasting accuracy of ANN based forecasting methods, several factors have been taken into consideration in different studies. In 1998, Zhang [43] presented the issues affecting the performance of ANN based forecasting methods.

The factors include: (a) Activation/transfer function, (b) Design of ANN architecture, (c) Training algorithm, (d) Pre-processing techniques. Pertaining to the first factor, Groot and Wurtz [20] claimed that activation functions play an important role in the convergence of training algorithms and have a significant effect on the performance of ANN based forecasting methods. The binary sigmoid activation function at hidden layer and linear activation function at output layer has been widely used in ANN based TSF methods [21].

However, to identify the most appropriate activation function in ANN for financial TSF, recently, Gecynalda et al. [22] conducted a comparative study on 12 activation functions and suggested for using Logarithmic activation functions (cloglog, cloglogm and loglog) in financial time series applications with smaller network structure.

C. Taxi Demand Forecasting using Regression Models

With the increasing travel demand, various approaches to predict the transportation demand have been proposed. Conventional forecasting approaches focus mainly on the temporal feature of the taxi demand. As these approaches depend on the time series characteristics of taxi demand, it can be considered a standard time series problem that can be solved using traditional statistical and machine learning algorithms.

For instance, Yang, C. et al. [23] aggregated the raw data by census tract and hours of the day to extract valuable insights from it and thereby employed count regression models (Poisson model, Quasi Poisson model, and Negative Binomial model) to identify spatio-temporal differences between the demand and availability of taxi services. Faghih et al. [24], analyzed taxi demand of New York City using the demand of other modes of transportation as well as weather conditions and presented a model to predict taxi demand by combining a linear regression model with a time series model, which was termed as linear regression with ARMA errors.

The combined model helped to reduce the number of variables, cutting off the expenses of the linear regression model and achieved better R2 values. Liu, Z. et al. [25] identified hotspots and then predicted taxi demand in these hotspots using GPS data and environmental data based on three models - Random Forest, Ridge Regression, and Combination Forecasting Model. Markaou, I. et al. [26] pooled time-series data of taxi records of New York City with textual data (comprising event information of the city) extracted from the web by screen scraping using APIs and thereby predicting taxi demand from the combined data using linear regression and gaussian model.

Antoniades, C. et al. [27], employed linear regression with model selection, Lasso, and Random Forest to predict taxi fare and duration using New York City taxi trip data. Safikhani, et al [28], presented a generalized version of the STARMA model (which reduces the number of parameters compared to the conventional time series model) and utilized the autoregressive part of the Vector Autoregressive (VAR) model to introduce a generalized STAR model for forecasting the spatio-temporal variation of taxi demand in New York City.

They also introduced a penalty function which penalizes the prediction parameters that are distant temporally and spatially. The proposed model with penalty function outperformed the conventional models such as STAR and VAR. However, these methods considered only the temporal features of the demand, not focusing on the other potential factors such as dependence on neighborhood demand. In addition, these approaches fail to capture the non-linear interdependence between spatial and temporal features.

To work on the aforementioned drawbacks, various Neural Network based approaches have gained significant attention in recent years, as they consider both spatial and temporal features along with the non-linear behavior of the demand. For instance, Xu, J. et al. [29] divided New York City into small areas and predicted the demand in each area by using LSTM and RNN with a layer MDN (Mixture Density Network) on top of it. Liu, T. et al. [30], proposed a Convolutional Recurrent Network model for granulated taxi demand prediction in which CNN was combined with Gated Recurrent Unit (GRU) to handle complex non-linear spatio-temporal correlation. Shu, P. et al. [31], proposed a hybrid model integrating CNN and LSTM to predict the short-term taxi demand across different areas. Luo, T. et al. [32], proposed a Multi-Task Deep Learning (MTDL) model using LSTM as a neural unit to predict the need for taxis at the multi-site level.

This paper’s main concentration was improving the performance of the proposed model by multiple hyperparameter optimization methods, including Random search, Grid search, and Bayesian optimization. Ye. J. et al. [33], proposed a CoST-Net model to correlate spatial and temporal demand using a CNN and heterogeneous LSTM along with the incorporation of environmental features to predict the multiple demands simultaneously.

Vanichrujee, U. et al. [34], proposed an ensemble model based on the characteristics of LSTM, GRU, and XGBOOST for the prediction of taxi demand in Bangkok City.

They implemented the model in 7 different area functions, and the prediction results were confirmed by mapping the POI with predicted demand in these areas. Liu, Z et al [35], proposed several models by combining information of Backpropagation Neural Network with Extreme Gradient Boosting to investigate the correlation between online taxi-hailing demand and taxi demand. Next, they introduced a data-driven forecasting approach to analyze the real-time prediction of online taxi-hailing demand. Author Chen, Z. et al [36], predicted taxi demand at a finer spatial level, that is, road section level. To achieve this, a prediction network was devised that considered the local and global relationship between the road sections.

These two spatial relations were established using a graph CNN, whereas temporal characteristics were mapped using LSTM network. Faial, D. et al [37], augmented LSTM with demand knowledge from the neighboring taxi stands along with historical taxi demand count to forecast the pickup demand of a given taxi stand.

Author Guo, X [38], proposed a hybrid model combining CNN with Bidirectional LSTM and the attention mechanism in order to predict taxi demand. He termed this model as CNN-BiLSTM-Attention model. Some approaches used a two-level machine learning frame- work for forecasting the taxi demand. Kim, T. et al. [44], combined multivariate linear regression with LSTM, enabling it to assess a quota system aimed to balance the demand volumes of regular taxis and for-hire vehicles of New York City. Rodrigues et al. [45], presented an analysis of spatio- temporal variation on short-term taxi demand of Lisbon city and studied how they are affected by weather conditions and POI.

They selected a linear statistical model (ARIMA) and a machine learning model (ANN) for forecasting the taxi demand. Liu, X. et al [46], utilized POI and GPS trajectory information for modeling spatial variation of taxi demand in Qingdao city using the Geographically Weighted Regression (GWR) model. They also studied how the taxi demand is influenced by factors like socio-economic, traffic, and land use.

Zhou, Y. et al [47], proposed a method called ST-Vec in which they predicted taxi demand on vital destinations from a given region of New York City. ST-Vec maps regions with dense low dimensional vectors such that the vectors of more likely destination regions will be nearer, and hence the spatio- temporal relationship of zones can be found out in terms of similarity between these vectors. Hu, B. et al [48], in their work, initially studied the spatio-temporal distribution of job-housing-travel and traveling characteristics of inhabitants, and on the basis of this, they introduced a metric system to evaluate jobs-housing-taxi demand and regional development level index.

Next, a Coupling Coordination Degree Model (CCDM) was built, which makes use of the entropy weight method to examine the coupling relationship between regional taxi demand and socio-economic development. Faial, D. et al [49], presented a data stream mining framework for predicting the taxi demand by adopting an approach to handle continuous data under using batch and stream machine learning algorithms.

Moreover, Davis, N. et al. [50], approached the prediction of taxi demand as a clustering problem and proposed a multi-level clustering method to model taxi demand density at various locations in Bengaluru city. Each location was addressed by six alphanumeric characters called geohash, enclosing an area of 0.72 km squared. The clustering was first achieved by analyzing the correlation between the nearest geohashes, and demand for every geohash was expressed as a fraction of its total cluster demand forming a percentage time-series data. The prediction on percentage time series data was then multiplied with the prediction of whole cluster demand giving the final prediction.

TABLE 1

Comparison of previous approaches proposed in field of times series forecasting

S.No.	Title [Referencce]	Year	Approach	Results
1.	A scalable approach based on deep learning for big data time series forecasting [39]	2018	The solution consists in splittingthe problem into h forecasting sub-problems, being h the number of samples to be simultaneously predicted. Thus, the best predictio model for each subproblem can be obtained, making easier its parallelization and adaptation to the big data context.	Obtained a mean relative error less than a 2%
2.	A Novel Hybrid Algorithm to Forecast Functional Time Series Based on Pattern Sequence Similarity with Application to Electricity Demand [40]	2018	It integrates a well-known clustering functional data algorithm into a forecasting strategy based on pattern sequence similarity. The new approach assumes that some patterns are repeated over time, and it attempts to discover them and evaluate their immediate future	Low RMSE as compared to other approaches mentioned in paper
3.	Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting [41]	2018	Unsupervised data clustering and frequent pattern mining analysis on energy time series, and Bayesian network prediction for energy usage forecasting	The accuracy results of identifying appliance usage patterns using the proposed model outperformed Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) at each stage while attaining a combined accuracy of 81.82%, 85.90%, 89.58% for 25%, 50% and 75% of the training data size respectively
4.	Big data time series forecasting based on nearest neighbours distributed computing with Spark [43]	2018	Proposed a new approach based on the kWNN for big data time series forecasting. Due to the high computational cost of finding k nearest neighbours, we develop the algorithm using an efficient distributed computing	We evaluate and compare the prediction accuracy of the proposed algorithm with five state-of-the-art big data forecasting approaches such as deep learning, decision tree, gradient-boosted tree, random forest and linear regression improving the other comparing methods, on average, by 39.68%
5.	Big data solar power forecasting based on deep learning and multiple data sources [42]	2019	A deep learning approach based on feed forward neural networks, for predicting the generated PV solar power. DL decomposes the multi-step ahead forecasting problem into sub-problems and also uses distributed computing to reduce the computational cost of training a deep neural network and to process big data time series	DL achieves its best MAE (109.52 kW) when using PV + WF and best RMSE (128.66 kW) when using PV only

Conclusion

Time series forecasting has remained a keen area of interest among the researchers from different domains of engineering and science. Several forecasting methods have been developed to deal with TSF. However, because of improved computing power of machines, availability of large amount of data, universal approximation capability and ability to handle nonlinear patterns efficiently, the past few decades evidenced abundant use of ML techniques like ANN in various time series forecast- ing applications. Despite of nearly three decades of research on ML based forecasting methods and empirical forecasting competitions like M3 and NN3 indicating promising capability of ML based forecasting methods, these methods have not yet been established as a reliable tool to predict a variety of time series. This is because of several factors such as: (a) current heuristic and ad hoc modeling procedure, (b) training algorithm used for determining the model parameters, (c) pre- processing techniques, (d) incapability to efficiently capture the linear patterns existing in time series. Thus this survey shows applications of machine learning algorithms on various time series problems and discusses their usability.

References

[1] V. Panse, “Yield trends of rice and wheat in first two five-year plans in india,” Journal of India Society of Agricultural Statistics, vol. 16, pp. 1–150, 1964. [2] T. L. M. Kumar, C. S. S. Gowda, M. B. Darshan, and S. S. Rani, “Coffee production modelling in india using nonlinear statistical growth models, ”Agriculture Update, vol. 7, no. 1-2, 2012. [3] P. Sánchez-de Miguel, P. Junquera, M. De la Fuente Lloreda, L. Jimenez, R. Linares, P. Baeza, and J. R. Lissarrague, “Estimation of vineyard leaf area by linear regression,” SPANISH JOURNAL OF AGRICULTURAL RESEARCH, vol. 1, pp. 202–212, 01 2011. [4] R. Piloto-Rodr??guez, Y. Sánchez-Borroto, M. Lapuerta, L. Goyos-Pérez, and S. Verhelst, “Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression,” Energy Conversion and Management, vol. 65, pp. 255–261, 2013, global Conference on Renewable energy and Energy Efficiency for Desert Regions 2011 ”GCREEDER 2011”. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0196890412003093 [5] T. Chandler, N. Zhang, M. Skiba, S. Moore, M. Caldeira, S. Poock, G. Oetzel, C. Wolfe, R. Fourdraine, and H. White, “0128 predicting hyperketonemia prevalence in jersey herds from milk composition and cow test-day information using multiple linear regression.” vol. 94, 01 2016, p. 60. [6] A. González-Sanchez, J. Frausto-Solis, and W. Ojeda, “Attribute selection impact on linear and nonlinear regression models for crop yield prediction,” The Scientific World Journal, 05 2014. [7] Y.-P. Wang, K.-W. Chang, R.-K. Chen, J.-C. Lo, and Y. Shen, “Large area rice yield forecasting using satellite imageries,” International Journal of Applied Earth Observation and Geoinformation, vol. 12, pp. 27–35, 02 2010. [8] S. Basak and P. Guha, “Use of predictive model to describe sporicidal and cell viability efficacy of betel leaf (piper betle l.) essential oil on aspergillus flavus and penicillium expansum and its antifungal activity in raw apple juice,” LWT, vol. 80, pp. 510–516, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S002364381730169X [9] M. Q. Raza and A. Khosravi, “A review on artificia intelligence based load demand forecasting techniques for smart grid and buildings,” Renewable and Sustainable Energy Reviews, vol. 50, pp. 1352–1372, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1364032115003354 [10] R. Weron, “Electricity price forecasting: A review of the state-of-the-art with a look into the future,” International Journal of Forecasting, vol. 30, no. 4, pp. 1030–1081, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169207014001083 [11] G. Zhang, B. Eddy Patuwo, and M. Y. Hu, “Forecasting with artificial neural networks:: The state of the art,” International Journal of Forecasting, vol. 14, no. 1, pp. 35–62, 1998. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169207097000447 [12] S. Crone and P. Graffeille, “An evaluation framework for publication on artificial neural networks in sales forecasting.” vol. 1, 01 2004, pp. 221–227. [13] G. Zhang, “Time series forecasting using a hybrid arima and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231201007020 [14] V. Goh, M. Chen, D. Popovicá, K. Aihara, D. Obradovic, and D. Mandic, “Complex-valued forecasting of wind profile,” Renewable Energy, vol. 31, pp. 1733–1750, 09 2006. [15] C. P. Chen, S. R. LeClair, and Y.-H. Pao, “An incremental adaptive implementation of functional-link processing for function approximation, time-series prediction, and system identification,” Neurocomputing, vol. 18, no. 1-3, pp. 11–31, 1998. [16] “Time series forecasting using a deep belief network with restricted boltzmann machines,” Neurocomputing, vol. 137, pp. 47–56, 2014, advanced Intelligent Computing Theories and Methodologies. [17] J. Peralta Donate and P. Cortez, “Evolutionary optimization of sparsely connected and time-lagged neural networks for time series forecasting,” Applied Soft Computing, vol. 23, pp. 432–443, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494614003159 [18] R. Mohammadi, S. Fatemi Ghomi, and F. Zeinali, “A new hybrid evolutionary based rbf networks method for forecasting time series: A case study of forecasting emergency supply demand time series,” Engineering Applications of Artificial Intelligence, vol. 36, pp. 204–214, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0952197614001985 [19] H. Dhahri, A. Alimi, and A. Abraham, “Hierarchical multi-dimensional differential evolution for the design of beta basis function neural network,” Neurocomputing, vol. 97, 11 2012. [20] C. de Groot and D. Würtz, “Analysis of univariate time series with connectionist nets: A case study of two classical examples,” Neurocomputing, vol. 3, no. 4, pp. 177–192, 1991. [Online]. Available: https://www.sciencedirect.com/science/article/pii/092523129190040I [21] J. J. Faraway, “Time series forecasting with neural networks : a comparative study using the airline data,” 1998. [22] G. Gomes, T. Ludermir, and L. Lima, “Comparison of new activation functions in neural network for forecasting financial time series,” Neural Computing and Applications, vol. 20, pp. 417–439, 04 2011. [23] C. Yang and E. J. Gonzales, Modeling Taxi Demand and Supply in New York City Using Large-Scale Taxi GPS Data. Cham: Springer International Publishing, 2017, pp. 405–425. [24] S. Faghih, A. Shah, Z. Wang, A. Safikhani, and C. Kamga, “Taxi and mobility: Modeling taxi demand using arma and linear regression,” Procedia Computer Science, vol. 177, pp. 186 – 195, 2020, the 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020) / The 10th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH 2020) / Affiliated Workshops. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1877050920322948 [25] Z. Liu, H. Chen, Y. Li, and Q. Zhang, “Taxi demand prediction based on a combination forecasting model in hotspots,” Journal of Advanced Transportation, vol. 2020, p. 13, 2020. [26] I. Markou, F. Rodrigues, and F. C. Pereira, “Multi-step ahead prediction of taxi demand using time-series and textual data,” Transportation Research Procedia, vol. 41, pp. 540 – 544, 2019, urban Mobility - Shaping the Future Together mobil.TUM 2018 - International Scientific Conference on Mobility and Transport Conference Proceedings. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352146519305113 [27] C. Antoniades, D. Fadavi, and A. F. Amon, “Fare and duration prediction: A study of new york city taxi rides.” [28] A. Safikhani, C. Kamga, S. Mudigonda, S. S. Faghih, and B. Moghimi, “Spatio-temporal modeling of yellow taxi demands in new york city using generalized star models,” International Journal of Forecasting, vol. 36, no. 3, pp. 1138–1148, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169207018301468 [29] J. Xu, R. Rahmatizadeh, L. Boloni, and D. Turgut, “Real-time prediction of taxi demand using recurrent neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 8, pp. 2572–2581, Aug 2018. [30] T. Liu, W. Wu, Y. Zhu, and W. Tong, “Predicting taxi demands via an attention-based convolutional recurrent neural network,” Knowledge-Based Systems, vol. 206, p. 106294, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095070512030469X [31] P. Shu, Y. Sun, Y. Zhao, and G. Xu, “Spatial-temporal taxi demand prediction using lstm-cnn,” in 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), 2020, pp. 1226–1230. [32] H. Luo, J. Cai, K. Zhang, R. Xie, and L. Zheng, “A multi- task deep learning model for short-term taxi demand forecasting considering spatiotemporal dependences,” Journal of Traffic and Transportation Engineering (English Edition), 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S209575641830521X [33] J. Ye, L. Sun, B. Du, Y. Fu, X. Tong, and H. Xiong, Co-Prediction of Multiple Transportation Demands Based on Deep Spatio-Temporal Neural Network. New York, NY, USA: Association for Computing Machinery, 2019, pp. 305 – 313. [Online]. Available: https://doi.org/10.1145/3292500.3330887 [34] U. Vanichrujee, T. Horanont, W. Pattara-atikom, T. Theeramunkong, and T. Shinozaki, “Taxi demand prediction using ensemble model based on rnns and xgboost,” in 2018 International Conference on Embedded Systems and Intelligent Technology International Conference on Information and Communication Technology for Embedded Systems (ICESIT-ICICTES), 2018, pp. 1–6. [35] Z. Liu, H. Chen, X. Sun, and H. Chen, “Data-driven real-time online taxi-hailing demand forecasting based on machine learning method,” Applied Sciences, vol. 10, no. 19, 2020. [Online]. Available: https://www.mdpi.com/2076-3417/10/19/6681 [36] Z. Chen, B. Zhao, Y. Wang, Z. Duan, and X. Zhao, “Multitask learning and gcn-based taxi demand prediction for a traffic road network,” Sensors, vol. 20, no. 13, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/13/3776 [37] T. L. Quy, W. Nejdl, M. Spiliopoulou, and E. Ntoutsi, “A neighborhood-augmented lstm model for taxi-passenger demand prediction,” in Multiple-Aspect Analysis of Semantic Trajectories, K. Tserpes, C. Renso, and S. Matwin, Eds. Cham: Springer International Publishing, 2020, pp. 100–116. [38] X. Guo, “Prediction of taxi demand based on cnn-bilstm-attention neural network,” in Neural Information Processing, H. Yang, K. Pasupa, A. C.- S. Leung, J. T. Kwok, J. H. Chan, and I. King, Eds. Cham: Springer International Publishing, 2020, pp. 331–342. [39] J. e. a. Torres, “A scalable approach based on deep learning for big data time series forecasting,” Integrated Computer-Aided Engineering, vol. 25, no. 4, pp. 335–348, 2018. [40] F. Mart??nez-Álvarez, A. Schmutz, G. Asencio-Cortés, and J. Jacques, “A novel hybrid algorithm to forecast functional time series based on pattern sequence similarity with application to electricity demand,” Energies, vol. 12, no. 1, 2019. [41] S. Singh and A. Yassine, “Big data mining of energy time series for behavioral analytics and energy consumption forecasting,” Energies, vol. 11, no. 2, 2018. [42] J. F. Torres, A. Troncoso, I. Koprinska, Z. Wang, and F. Mart??nez-Álvarez, “Big data solar power forecasting based on deep learning and multiple data sources,” Expert Systems, vol. 36, no. 4, p. e12394, 2019. [43] R. Talavera-Llames, R. Pérez-Chacón, A. Troncoso, and F. Mart??nez-Álvarez, “Big data time series forecasting based on nearest neighbours distributed computing with spark,” Knowledge-Based Systems, vol. 161, pp. 12–25, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950705118303770 [44] T. Kim, S. Sharda, X. Zhou, and R. M. Pendyala, “A stepwise interpretable machine learning framework using linear regression (lr) and long short-term memory (lstm): City-wide demand-side prediction of yellow taxi and for-hire vehicle (fhv) service,” Transportation Research Part C: Emerging Technologies, vol. 120, p. 102786, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0968090X20306963 [45] P. Rodrigues, A. Martins, S. Kalakou, and F. Moura, “Spatiotemporal variation of taxi demand,” Transportation Research Procedia, vol. 47, pp. 664 – 671, 2020, 22nd EURO Working Group on Transportation Meeting, EWGT 2019, 18th - 20th September 2019, Barcelona, Spain. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352146520303446 [46] Q. S. G. G. Xinmin Liu, Lu Sun, “Multitask learning and gcn- based taxi demand prediction for a traffic road network,” Sensors, vol. 20, no. 13, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/13/3776 [47] Y. Zhou, Y. Wu, J. Wu, L. Chen, and J. Li, “Refined taxi demand prediction with st-vec,” in 2018 26th International Conference on Geoinformatics, 2018, pp. 1–6. [48] B. Hu, S. Zhang, Y. Ding, M. Zhang, X. Dong, and H. Sun, “Research on the coupling degree of regional taxi demand and social development from the perspective of job-housing travels,” Physica A: Statistical Mechanics and its Applications, vol. 564, p. 125493, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378437120307913 [49] D. Faial, F. Bernardini, E. M. Meza, L. Miranda, and J. Viterbo, “A methodology for taxi demand prediction using stream learning,” in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 417–422. [50] N. Davis, G. Raina, and K. Jagannathan, “A multi-level clustering approach for forecasting taxi travel demand,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 2016, pp. 223–228.

Copyright

Copyright © 2022 Apoorva Thakur, Sandeep Monga. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET41879

Publish Date : 2022-04-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here