Bitcoin is one of the most popular and valuable cryptocurrency in the current financial market, attracting traders for investment and thereby opening new research opportunities for researchers. Countless research works have been performed on Bitcoin price prediction with different machine learning prediction algorithms. For the research: relevant features are taken from the dataset having strong correlation with Bitcoin prices and random data chunks are then selected to train and test the model.
The random data which has been selected for model training, may cause unfitting outcomes thus reducing the price prediction accuracy.
Here, a proper method to train a prediction model is being scrutinised. The proposed methodology is then applied to train a simple Long Short Term Memory (LSTM) model to predict the bitcoin price for the upcoming 30 days. When the LSTM model is trained with a suitable data chunk, thus identified, sustainable results are found for the prediction. In the end of this paper, the work culminates with future improvements.
Introduction
I. INTRODUCTION
Instead of any direct human investments, generating profit with the help of algorithms is a common practice in the stock market. Many case studies have been performed to reach the conclusion that mathematical models warrant better results than humans. Bitcoins are an eye catching initiative in the fields of cryptography, economics, and computer sciences, as such currencies have a special character which is gained when integrating currency units with cryptographic technology. Due to the fact that cryptocurrency has a minute history, when compared to the stock market, new and unexplored territories are thus being scouted. Structurally, both the stock market and the cryptocurrency price data are having characteristics such as time series data, but high volatility is routinely present in the latter, with heavy wavering in the prices.
A cryptocurrency market differs from a traditional stock market in the respect that the former has a lot of new features. It is required to apply new techniques for prediction suitable for the cryptocurrency market. Fewer studies have been conducted on cryptocurrency price prediction when compared to the stock market. In this paper, we are predicting the Bitcoin price trend using a Long Short-Term Memory (LSTM) model.
Our model is aimed to predict the next thirty days price of Bitcoin.To develop a model which can help us to predict the price of the crypto currency used (in this case: Bitcoin), with low error rate and a high precision of accuracy. The model will not tell the future, but it might forecast the general trend and the direction to expect the prices to move.While using this model, first, the dataset of the crypto currency used needs to be uploaded.
This, usually, contains the various features that the prediction model has to depend on. For e.g. average block size, total number of Bitcoins mined, day high & day low (highest and lowest values of different days), number of transactions, trade volume, etc. Then, secondly, the dataset will be applied on the regression model to obtain the predicted price.What the model proposes to do is that, first the data on Bitcoin Price fluctuations is gathered, of the past couple of years, from the internet. Then, after the process of data acquisition, the database should be organised.
The database is divided into various spreadsheet files, which are then uploaded to the software mainly used for data processing. The necessary calculations, like classification and regression, are then done. And finally the results are evaluated in terms of accuracy, error rates involved.
II.LITERATURE SURVEY
The literature survey was carried out to find various papers published in international journals related to various Bitcoin price prediction algorithms, and associate the best algorithm for the same.
III. PROPOSED METHODOLOGY
Firstly, we collect the data set from the online source: Yahoo finance. The data set represents the Bitcoin price in United States Dollars (USD). The dataset includes all the information about bitcoin prices from 23rd October, 2014 to 5th January, 2022.
The second step involves filtering and cleaning the data set. This involves removing all the incomplete data from the rows. It also involves filtering out unnecessary features present in the data collected. For our model, we will only use the columns labelled : Date, Price, Open, High, and Low, as shown in Table below :
Sr. No.
Variable Name
Variable Description
Data Type
1
Time
Date and time of observation
Date
2
Volume
Sum total of trades taking place
Number
3
Open
Opening price on the given day
Number
4
High
Highest price on the given day
Number
5
Low
Lowest price on the given day
Number
6
Close
Closing price on a given day
Number
The next step is training, followed by testing the dataset. We train our model, using the algorithm and the features taken into account to assist our model, to predict the future price of the crypto currency. Moving on to the testing part, we test the data to measure the accuracy of the algorithm that our model is using to predict the price of the Bitcoin.
Finally after the processes of training with the help of the data set features and testing, we evaluate the accuracy of our model. We compare the predicted price of the crypto currency, at a given time period with the real world Bitcoin price at that particular period of time, and evaluate the accuracy and efficiency of our model.
IV. IMPLEMENTATION
Lag Plots : After the dataset has been filtered and cleaned, we need to generate a lag plot of the time series data. A lag in a time series data defines how much a data point is falling behind in time from another data point. Lag plots are put into use to analyse and find out whether the time series data follows any pattern
Train-Test Split : Now, the next step that is needed to be performed is train_x0002_test split. For our model, we will be considering sixty numbers of data samples for implementing the testing, and the rest of the re-sampled data as the training sample.
3. Scaling : we are going to scale the data, as we need the training and the test set to be scaled. One important point that needs to be mentioned is that the scaling should be performed after the train-test split has been performed, because scaling before the train-test split would introduce data leakage from the test set to the training set.
4. Data Generator : We frame our model, using a “lookback” period to take a window of the last five days of data to predict the data of the current day. A new function is defined, which will split the input sequence into windows of data appropriate for fitting a LSTM model.
5. Restructuring Input into a shape of 3D Tensor : For LSTM, we have to reshape the input data into the shape of a three dimensional Tensor of samples, timesteps, and features. Samples are the amount of data points that we are having.Thus, timesteps is equivalent to the number of time steps we are to be running our RNN. Finally, features include the amount of features in every timestep.
6. Generating the epochs :From the callback module of the keras library we are importing the callbacks: ModelCheckpoint, and EarlyStopping. These callbacks are used as a best practice to save the model at various checkpoints or after each epoch.
7. LSTM Prediction using testX and plotting line graph against actual testY : Due to scaling done earlier with the help of MinMaxScaler, the predicted scale will be between zero and one. We have to transfer this scale to the original data scale. Thus, we are going to use inverse transformation to scale back the data to the original presentation.
9. Root Mean Square Error: Finally, we will be generating the root mean square error (RMSE) for both the test and the train data. RMSE is the measure of how well a regression line will fit the data points. The RMSE loss achieved for train data is much lesser compared to the RMSE loss for test data, because the whole training and fit function was run on the training data set.
Conclusion
LSTM based Recurrent neural networks are the most powerful approach for learning from sequential data, whereas the time series are only a special case. The potential of LSTM based models is fully revealed when learning from massive datasets where we can detect complex patterns. The LSTM model, implemented here is a model that takes into consideration the features that affect the Bitcoin price. This model is accurate when predicting the future prices. However, to increase the efficiency of the model, more Bitcoin price features need to be taken into consideration. I recommend using Yahoo Finance for the source of datasets, since information present in this website holds a high degree of authenticity. In my future work, I would include in-depth scrutinisation on the topic of LSTM, and deep learning at large. Such fact-findings would be beneficial for forecasting the prices of cryptocurrencies with the help of LSTM’s in the future.
References
[1] Krishna Pal Sharma, Shivam Kumar Singh, Ankur Choudhary, Himanshu Girl, \"Price Prediction of Bitcoin using Social media activities and past trends\" 2023, 13th International conference on Cloud computing, Data science & Engineering.
[2] Muhammad Husaini, Amgad Muneer, Shakirah Mohd Taib, \"Crypto currency price prediction using LSTM with the Twitter sentiment analysis\" 2022, 6th International conference on computing, communication, control and automation.
[3] Soudeh Javadi, Paras M Kathuria, Nisha S Gowda, Talha Ali Khan, \"Bitcoin price prediction using LSTM\", 2022, By the 3rd International conference on Automatics and Informatics.
[4] Tamara zuvela, Sara Lazarevic, Sofia Djordjevic, Marko Arsenovic, \"Crypto currency price prediction using Deep Learning\" 2022, IEEE 16th International Symposium on Applied computational intelligence and informatics.
[5] Chandra Sekhar, M Padmaja, Biswajit Sarangi, Aditya , \"Prediction of Crypto currency using LSTM and XG BOOST\" 2022 IEEE International conference on Block chain and distributed systems security.
[6] T. Phaladisailoed, and T. Numnoda, “Machine Learning Models Comparison for Bitcoin Price Prediction” Under International Conference on Information Technology and Electrical Engineering, 2018.
[7] Neha Mangla, Akshay Bhat, Ganesh Avarbratha, and Narayana Bhat, “Bitcoin Price Prediction Using Machine Learning” in the International Journal of Information and Computer Science, Volume 6, Issue 5, May 2019.
[8] A. Rana, R. Kachchhi, J. Baradia, V. Shelke “Stock Market Prediction Using Deep Learning” International Research Journal of Engineering and Technology, Volume 8, Issue 4, April 2021.
[9] Q. Guo, S. Lei, Q. Ye, Z. Fang “MRC-LSTM: A Hybrid Approach of Multi-scale Residual CNN and LSTM to Predict Bitcoin Price,” MDPI, May 2021.
[10] T. Awoke, M. Rout, L. Mohanty, S. C. Satapathy, “Bitcoin Price Prediction and Analysis Using Deep Learning Models” 2021, Research Gate.