“Stock Price Prediction using Machine Learning”

Authors: Drashi Dave, Sanjana Prajapati, Udity Solanki

DOI Link: https://doi.org/10.22214/ijraset.2022.46610

Abstract

Stock is a curve with a lot of unknowns. The stock market has a lot of intricacy and turbulence, which makes it difficult to predict what will happen. The primary goal of the topic\'s argument is to forecast future market stock stability. Many researchers have looked at the future market\'s evolution. Data is a vital source of efficiency because stock is made up of shifting data. The efficiency of the forecast has an impact on the same probability. Machine learning has been incorporated into the picture for the deployment and prediction of training sets and data models in the latest trend of Stock Market Prediction Technologies. Machine Learning uses a variety of predictive models and algorithms to forecast and automate tasks. The focus of the paper is on the application of regression and LSTM to forecast stock prices.

Introduction

I. INTRODUCTION

A. Stock Price Prediction

The act of attempting to anticipate the future value of a business stock or other financial instrument traded on an exchange is known as stock market prediction. A successful forecast of a stock's future price could result in a large profit. As the degree of trading and investment increased, people looked for strategies and tools that would improve their profits while reducing risk. India has two main stock exchanges - the National Stock Exchange (NSE) and the older Bombay Stock Exchange (BSE). The Indian Stock Exchange is open for business. The two most important Indian stock indices are the Sensex and the Nifty for BSE and NSE respectively, which benchmark index values for measuring the overall performance of the stock market. Stock market prediction is difficult due to the dynamic nature of stock market values. Some forecasting models for this type of event have been created over the last few years. They'd been used to anticipate the money market for a while.

There are two traditional approaches for stock price prediction which are stated below:

Fundamental Analysis: Fundamental analysis assesses the true worth of a sector or company by calculating how much one share of that company should cost. The assumption is that, given enough time, the company will migrate to a cost that matches the prediction. If a sector or firm is undervalued, its market value should grow, and if it is overpriced, its market price should decline. The study takes into account a variety of elements, including yearly financial summaries and reports, balance sheets, a future prospectus, and the work environment of the organization. When equities are overvalued, the market price drops. The Price to Earnings ratio (P/E) and the Price to Book ratio (P/B) are the two most used fundamental research indicators for predicting long-term price fluctuations annually. As a predictor, the P/E ratio is used. Companies having a lower P/E ratio earn more money than those with a high P/E ratio. This is also used by financial analysts to back up their stock recommendations. Fundamental analysis can be used to determine between good and bad equities by looking at financial numbers. The P/B ratio compares the market value of a firm to the paper value of the same corporation. If the ratio is too high, the company may be overpriced, and its value may decline over time. If the ratio is low, however, the company may be undervalued, and the price may climb over time. Fundamental analysis is, without a doubt, a powerful tool. It does, however, have several flaws.
Technical Analysis: The study of stock prices in order to make a profit or make better investing judgments is known as technical analysis. Technical analysis forecasts stock prices by predicting the direction of future price movements based on historical data. It also aids in the analysis of financial time series data using technical indicators. Meanwhile, the price is thought to be moving in a trend and has momentum. Technical analysis investigates trends and employs price charts and formulae to anticipate future stock values; it is primarily utilized by short-term investors. The price would be deemed high, low, open, or the stock's closing price, with daily, weekly, monthly, or yearly time points. The essential concepts of technical analysis are that the market price discounts everything, that prices move in trends, and that historical trends tend to repeat themselves. The Moving Average (MA), Moving Average Convergence/Divergence (MACD), the Aroon indicator, and the money flow index are just a few examples of technical indicators. Expert opinions set norms in technical analysis, which are rigid and resistant to change, according to the shortcomings of technical analysis. Several factors that influence stock prices are disregarded.

B. Objectives

Predicting stock returns and a company's financial situation in advance will provide greater benefits for investors in the current competitive market, allowing them to invest with confidence. Stock forecasting can be done with the help of current and historical market data.

The proposed research aims to investigate and develop supervised learning systems for stock price prediction. It must calculate the stock's projected price using past data. It should also be able to visualize the market index in real time.

The goal of stock market prediction is to predict the future movement of a financial exchange's stock value. Investors will be able to make more money if they can accurately predict share price movement.

II. LITERATURE REVIEW

A. Survey Of Related Work

This research looked at studies that used a generic SMP framework. It mostly focused on studies from the previous decade (2011–2021). The research was compared and examined based on the types of data used as input, data pre-processing techniques, and machine learning algorithms used to make predictions. In addition, as reported it examined the various assessment criteria utilized by different studies for performance measurement. Furthermore, an exhaustive comparison analysis was conducted, with the conclusion that SVM is the most widely utilized technique for SMP. Techniques like ANN and DNN, on the other hand, are commonly used because they produce more accurate and faster predictions. In addition, using both market data and textual data from web sources improves prediction accuracy [1].

The machine learning models LSTM and GRU, which are modern versions of Recurrent neural networks, were used to implement this project.

The LSTM and GRU models were trained by feeding them historical information and statistics from which they learned and adapted to the pattern and forecast the future stock price value, which was close to the original value [2].

In this paper studies, the use of LSTM based Machine learning to forecast stock prices. Factors measured are open, close, low, high and volume. This research was an attempt to determine the future prices of the stocks of a company with improved accuracy and reliability using machine learning techniques. The LSTM algorithm resulted in a positive outcome with more accuracy in predicting stock prices [3].

This paper is based on the approach of predicting the share price using Long Short Term Memory (LSTM) and Recurrent Neural Networks (RNN) to forecast the stock value. Visualization resulted in the predictions made closely matched the actual stock prices. The performance of their proposed stock prediction system, which uses an LSTM model, was compared with a simple Artificial Neural Network (ANN) model on five different stocks of varying sizes of data. The results of comparison between Long Short Term Memory (LSTM) and Artificial Neural Network (ANN) show that LSTM has a better prediction accuracy than ANN [4].

Various machine learning algorithms like Multiple Linear Regression, Polynomial Regression, etc. are used here. Detailed comparison of all models is stated. The financial data contains factors like Date, Volume, Open, High, Low Close, and Adj Close prices. The models are evaluated using standard strategic indicators RMSE and R2 score. Lower values of these two indicators mean higher efficiency of the trained models. It states that adding more data helps the algorithm to learn better [5].

This paper summarizes research on machine learning approaches and algorithms used to increase stock price forecast accuracy. It examined and contrasted the state-of-the-art in machine learning algorithms and methodologies used in finance, particularly in stock price prediction. A variety of machine learning algorithms and methodologies have been examined in terms of input kinds, purposes, benefits, and drawbacks. Some machine learning algorithms and methodologies have been frequently chosen for stock price prediction due to their attributes, accuracy, and error acquired [6].

The goal of this work was to conduct a literature assessment on financial time series forecasting utilizing Deep Learning and technical analysis.

It was feasible to pick 34 publications for this investigation using a research approach. Predictor methodologies, trading strategies, profitability indicators, and risk management were used to base the study and discussions. Because of its memory storage capacity and ability to tackle the vanishing gradient problem, the recurrent neural network LSTM has been widely used. Some hybrid models combined LSTM with additional strategies to cope with news, resulting in more robust results and potential future research [7].

The Artificial Neural Network and Random Forest approaches were used in this study to forecast the next day closing price for five businesses from various industries. The financial data is used to create additional variables that are used as inputs to the model, such as the stock's Open, High, Low, and Close values. Standard strategic metrics such as RMSE and MAPE are used to assess the models. These two indicators have low values, indicating that the models are good at predicting stock closing prices [8].

Tittle	Technique/Method	Accuracy
Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies, Recent Developments, and Future Directions.	Support Vector Machine(SVM)	80%
Stock Market Prediction and Analysis using Machine Learning.	Long Short Term Memory(LSTM)	80%
LSTM base Stock Price Prediction.	Long Short Term Memory(LSTM)	85%
Stock Price Prediction Using Long Short Term Memory	Long Short Term Memory (LSTM)	75%
A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques	Long Short Term Memory (LSTM)	N/A
Stock Market Forecasting Using Deep Learning and Technical Analysis: A Systematic Review	Long Short Term Memory (LSTM)	N/A
Stock Closing Price Prediction using Machine Learning Techniques	Artificial Neural Network(ANN) Random Forest (RF)	N/A

B. Existing Techniques And Its Drawback

As it is mentioned in the introduction, traditional approaches to stock market prediction and analysis includes Fundamental and technical analysis.

Fundamental analysis dwells into stock’s past performance, company’s credibility, and other factors like news, economy. It uses two fundamental research indicators: P/E ratio and P/B ratio for long-term price fluctuations. Whereas technical analysis dives into investigating trends and employs price charts and formulae to anticipate future stock values.

Fundamental analysis can become tedious and time consuming. It has drawbacks of extrapolation, time delay, need for long term investment, and has no reliable trade signals.Fundamental analysis' drawback is that it can bring you on board a good stock at the wrong time, requiring you to hold on to the stock for a prolonged time.

Technical analysis has limitations like any other method. It's possible to misread the graph. It's possible that the formation is based on low volume. The moving average periods employed may be too long or too short for the type of transaction you're attempting to make. For the same stock, one technical analyst's view may differ from that of another. The technical approaches employed by analysts to examine equities can differ from one to the next.

C. Machine Learning Approach

Our requirement is to overcome the drawbacks of fundamental and technical analysis. The evident progression in modeling techniques has prompted a number of researchers to investigate new stock price prediction approaches.

In stock price prediction, machine learning is used to find patterns in data. Stock markets create a huge volume of structured and unstructured heterogeneous data in general. It is possible to swiftly evaluate more complicated heterogeneous data and provide more accurate findings using machine learning algorithms. For Stock Price Prediction, a variety of machine learning algorithms have been used.

Unsupervised and supervised techniques are the two primary categories in Machine Learning. The learning algorithms are given identified input data and the desired output in the supervised learning approach. Meanwhile, in the unsupervised learning approach, the learning algorithm is given unlabeled input data, and the programme recognises patterns and generates output accordingly.

Furthermore, different algorithmic approaches have been used, such as the Artificial Neural Network (ANN), Recurrent Neural Network (RNN) as stock prediction approaches.

Simple ANN: Artificial Neural Networks are feedforward neural networks. Here the input data travels in one direction only. It moves forward from the input nodes through the hidden layers and finally to the output layer.

2. Recurrent Neural Network: RNN in comparison to ANN is a bit more complex. Here the data travels in cycles through different layers. To put it another way, data travels through a Recurrent Neural Network in the form of directed cycles of routes between nodes. When compared to ordinary Neural Networks, this provides RNN a distinct advantage. The capacity to learn dynamically and store what has been learned to anticipate is critical when dealing with sequential input.

Simple Neural Networks also learn and retain what they've learned, which is how they predict classes or values for fresh datasets. However, unlike conventional Neural Networks, RNNs rely on information from prior output to predict forthcoming data/input. When dealing with sequential data, this capability comes in handy. LSTM is the most commonly used RNN model.

Therefore, for stock prediction RNN is used over ANN.

D. Proposed System

In order to predict the future of the stock market, a precise approach must be followed, as shown in this diagram.

The first phase will be to gather historical stock data for any company, which will be used to forecast stock prices.
The data is then preprocessed using techniques such as data scaling and data discretization.
Only the features that will be supplied to the neural network are chosen in this step: date, open, high, low, close, and volume.
In a 70:30 ratio, historical stock data is separated into training data and testing data.
The data is placed into a recurrent neural network, which is then trained to make predictions. A sequential input layer is followed by LSTM layers, a dense layer, and finally a dense output layer with linear activation function in our LSTM model.
The target value is compared to the output value generated by the RNN's output layer. The backpropagation through time algorithm, which modifies the weights and biases of the network, minimizes the error between the target and the acquired output value.

III. LONG SHORT TERM MEMORY

The LSTM (Long Short Term Memory) neural network is a form of neural network that is particularly useful in time series forecasting.
Hochreiter and Schmidhuber first proposed long short-term memory in 1997 to overcome the aforementioned issues.
By incorporating memory cells and gate units into the neural network design, long-short term memory addresses the difficulty of learning to recall information over a time interval.
In a traditional neural network, final outputs are rarely used as an output for the following phase, but if we look at a real-world example, we can see that our final output is often dependent not just on external inputs but also on earlier output.
When humans, for example, read a book, Each sentence's comprehension is based not only on the current list of words, but also on the prior sentence's comprehension or the context provided by previous sentences. Humans don't have to start again every time they think.
RNN can remember long-term inputs thanks to LSTM. Similar to computer memory, it stores information in memory.
It has the ability to read, write, and delete data from its memory. This memory can be thought of as a closed cell with a closed description that decides whether to save or remove data.

A. LSTM Architecture

A forget gate is responsible for removing information from the cell state. [11]
Information that is no longer necessary for the LSTM to understand things or that is of lesser importance is removed.
The forget gateway governs when newer information will be introduced into certain areas of the cell.
It produces values that are near to 1 for sections of the cell state that should be kept and zero for values that
This gate takes in two inputs; h_t-1 and x_t.
h_t-1 is the hidden state from the previous cell or the output of the previous cell and x_t is the input at that particular time step.

2. Input Gate

The input gate is responsible for the addition of information to the cell state. [11]

Regulating what values need to be added to the cell state by involving a sigmoid function. This is basically very similar to the forget gate and acts as a filter for all the information from h_t-1 and x_t.

Creating a vector containing all possible values that can be added (as perceived from h_t-1 and x_t) to the cell state. This is done using the tanh function, which outputs values from -1 to +1.

Multiplying the value of the regulatory filter (the sigmoid gate) to the created vector (the tanh function) and then adding this useful information to the cell state via addition operation.

3. Output Gate

The output gate is responsible for selecting useful information from the current cell state and showing it out as an output. [11]

An output gate's operation can be broken down into three parts once more:

Creating a vector after applying tanh function to the cell state, thereby scaling the values to the range -1 to +1.

Making a filter using the values of h_t-1 and x_t, such that it can regulate the values that need to be output from the vector created above. This filter again employs a sigmoid function.

Multiplying the value of this regulatory filter to the vector created in step 1, and sending it out as an output and also to the hidden state of the next cell.

B. Advantages Of LSTM

The ability of LSTM to read intermediate context is its key advantage.
Without explicitly applying the activation function inside the recurring components, each unit recalls facts for a long or short length of time.
The release of the forget gate, which ranges between 0 and 1, is the sole way for any cell state to be repeated.
To put it another way, the LSTM cell's forgetting gateway is in charge of both the hardware and the function of cell state activation.
As a result, instead of explicitly increasing or decreasing in each step or layer, the data from the preceding cell can flow through the unmodified cell, and the instruments can convert to their suitable values over a limited time.
Because the amount stored in the memory cell is not transformed in a recurrent fashion, the gradient does not cease when trained to distribute rearward, allowing LSTM to solve a perishable gradient problem.

IV. METHODOLOGY

A. Importing Libraries

Let's begin by importing some library numpy, pandas from, pandas data reader, and min max color from sklearn preprocessing.

Next, we'll import sequential from keras models, and Dense, LSTM from keras layers. Finally, we'll import matplotlib.

B. Plot Close Price Movement

We're going to plot a figure here. We'll pay more attention to the close price because that's the price we'll predict.

F. Data Normalization

Here we have Normalized and transformed the training data. Normalization is the process of rescaling data from its original range so that all values fall between 0 and 1.

With the help of MinMaxScaler we normalized our data.

It is imported from sklearn.preprocessing module which includes scaling, centering, normalization, binarization methods.

G. Scaling Train Data

We'll make a training data set with values ranging from zero to data size, which will account for 70% of our training dataset.

Next, we'll divide the training data into two groups: x train and y train.

The independent training variable is the x train, while the dependent or target variable is the y train.

We need 60 observations of our training data, so we'll make a loop for e that's in the range of 60 to the size or total size of the training data.We'll append the plus 60 values to the x train data, making the x train data start at position i-60 and the y train data contain the value in the 61st position. As a result, x will have 60 values ranging from 0 to 59, while y will have 61 values and the 60th position. This is the value we want to predict with our model.

H. Train Data Reshaping

First of all, the x_train and y_train were converted into numpy arrays which were used to tutor the LSTM.

Because the LSTM layer is a recurrent layer, it anticipates a three-dimensional input.

For that purpose we reshaped the data in 3D as (1670, 60, 1).

1670 = Number of Samples

60 = Number of steps

1 = Number of Features

I. Building LSTM Model

Process of building LSTM starts with calling sequential() which is imported from keras model. Further on, we have inserted one LSTM layer including 50 neurons and two Dense layers with 25 and 1 neurons respectively. The input_shape contains a number of steps and features.

For compiling this model we used an 'adam' optimizer.

An optimizer adjusts the neural network's properties like weights and learning rate. As a result, it aids in the reduction of total loss and the improvement of accuracy.

LSTM models are trained by calling the fit() function.

We have used train data to fit the LSTM model with 1 epochs and used a batch size of 1.

The number of epochs is the number of complete passes through the training dataset.

Batch size refers to the number of training examples utilized in one iteration

J. Testing Data

Test data is used to test the model here. We have the remaining 30% data which is test data. An array that contains 30% normalized data is created. 60 observations of testing data are created here which will be handy for prediction.

y_test is the actual data here and x_test will be the predicted data.

So, actual data and predicted data will be compared later on.

K. Converting Test Data

Then we turned the test data to an array and used the reshape tool to reshape the data to three dimensions so that we could use it in a prediction model. Now, we've used the model to build a prediction from our testing data and then we've used the scaler that inverse transforms to get the prediction's value.

V. RESULTS

A. Root Mean Square Error

To evaluate the model, the root mean square error will be calculated, which will tell us how accurate the model is. The rmse is 0.8 , which is a good figure because it indicates that the model is accurate and that the forecast is extremely near to the actual value.

C. Values

We can see that the close price and prediction are really near, with the exception of a few dates where they deviate somewhat, but in general they are both very close.

Conclusion

This project establishes the groundwork in order to make machine learning technologies more accessible to the retail investors.We used LSTM for stock price prediction which provided us with a fair prediction on closing price. Similar trend is seen between the actual and predicted price. Prediction with LSTM deals only with numbers. And there are limitations to just number crunching and analysis. Stock market analysis does not only rely on the closing price. It is a matter of fact that the stock market reacts to many other factors. National factors like Government Policies, Interest Rate and Inflation, Politics, Natural Disasters, Economic Numbers, Gold Prices and Bonds have a significant effect on stock prices. International Macroeconomic Factors that Affect Indian Stock Market are Exchange Rates (Dollar Index), Crude oil, International Politics. Therefore, just by predicting the closing price of a stock will not suffice. We need a hybrid and an intelligent model for prediction that could consider all influential factors. We would like to extend our work in an attempt to build a model that could check the impact of news on stock prices. It might be news feed analysis from social media platforms like Twitter and other sources, where emotions are evaluated from the articles. This sentiment analysis can be combined with the LSTM to increase weight training and accuracy.

References

[1] Rouf, N., Malik, M. B., Arif, T., Sharma, S., Singh, S., Aich, S., & Kim, H. C. (2021). Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies, Recent Developments, and Future Directions. Electronics, 10(21), 2717. [2] Rajan Kelaskar, , Manojkumar Sahu, Rahul Kamble, & Sumedh Kapse. (2018). Stock Market Prediction And Analysis Using Machine Learning. International Journal Of Advance Research And Innovative Ideas In Education, 4(2), 3806-3810. [3] Prof. Ahir, P., Lad, H., Parekh, S., Kabrawala, S. (2021). LSTM based Stock Price Prediction. International Journal for Creative Research Thoughts, 9(2), 5118-5122. [4] Nandakumar, R., Uttamraj, K. R., Vishal, R., & Lokeswari, Y. V. (2018). Stock price prediction using long short term memory. International Research Journal of Engineering and Technology, 5(03). [5] Pawaskar, S. Stock Price Prediction using Machine Learning Algorithms. International Journal For Research in Applied Science and Engineering Technology, 10(1), 667-673. [6] Obthong, M., Tantisantiwong, N., Jeamwatthanachai, W., & Wills, G. B. (2020, May). A Survey on Machine Learning for Stock Price Prediction: Algorithms and Techniques. In FEMIB (pp. 63-71). [7] Li, A. W., & Bastos, G. S. (2020). Stock market forecasting using deep learning and technical analysis: a systematic review. IEEE access, 8, 185232-185242. [8] Vijh, M., Chandola, D., Tikkiwal, V. A., & Kumar, A. (2020). Stock closing price prediction using machine learning techniques. Procedia computer science, 167, 599-606. [9] https://www.researchgate.net/publication/321985773/figure/fig3/AS:574184990023686@1513907777315/Artificial-Neural-Networks-ANN-architecture.png [10] https://www.researchgate.net/figure/Basic-Architecture-of-Recurrent-Neural-Network-22-23_fig2_325668211 [11] P.Srivastava. Essentials of Deep Learning: Introduction to Long Short Memory. Analytics Vidhya, 2017.

Copyright

Copyright © 2022 Drashi Dave, Sanjana Prajapati, Udity Solanki. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET46610

Publish Date : 2022-09-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here