Time Series Forecasting done with LSTM
What is a time series and what is special about it?
A time series is a set of samples taken at regular intervals of time. It is interesting to analyze their behavior in the medium and long term, trying to detect patterns and be able to make predictions of how their future behavior will be. What makes a time series special, as opposed to a regression “problem”, are two things:
1- It is Time dependent. This breaks the requirement of linear regression that your observations be independent.
2- They tend to have some kind of seasonality, or growing or decreasing trend. Let’s think about how much more product an ice cream shop sells in only 4 months of the year than in the rest of the seasons.
Examples of time series are:
- Capture the temperature, humidity and pressure of an area at 15 minute intervals.
- Value of a company’s shares in the stock market minute by minute.
- Daily (or monthly) sales of a company.
- Production in Kg of a harvest every semester.
In this article, we will cover the concept of time series but injecting some machine learning into this. At the end of this reading, you will feel more attracted to machine learning and how you can apply time series forecasting with any neural network model, and who knows? maybe you will be the next wolf of wall street by predicting prices in stocks, ETF’s, currency exchange, and more.
BTC
Bitcoin is surrounded by mystery on who created this digital currency; also, it is a highly volatile asset, and some governments still deny the use of this currency for their model of decentralization commerce. Nevertheless, the hype and the same fact of being a decentralized currency has maintained this digital currency alive, and now banks are investing in it and started adopting it to offer bitcoin-related services.
In 2018 Bitcoin peaked with the transaction price and many people started to see the way to know more about the trend of this currency to invest and win a large profit over that investment. We are going to use a dataset from coinbase.com of the historical data by the minute of the bitcoin BTC transaction in USD Dollars from 2017 to 2020. Based on that, we are going to use an ANN to predict based on the last 24 hours the close price of the next hour of the BTC.
Preprocessing
The bitstamp and coinbase datasets contains records about the open price, close price, high, low volume in BTC, volume in USD, weighted price, and transaction time in one minute.By making this EDA (Exploration Data Analysis), a lot of NaN values were found, that were removed. At this point we check that bitstamp dataset contains more datapoints than coinbase dataset. This project will just deal with hourly predictions, so we subsampled the data from 1 minute intervals to 1h. This will also reduce the dataset but the number of records will not affect the performance of the model.
Data splitting & normalization
Once performed the initial exploration cleaning of the data, it is time to decide what will be the final data to put into the neural network model. We will focus on the Close column that contains the closing price and the reason can be appreciated on next plot.
Open, High, Low, and Weighted Price are virtually equal to our the most wanted feature: Close.
The split data process for the model was to choose 70% of the total data to be trained, 20% for validation, and 10% for the test data. This proportion is a standard one, but in case you want to work with an 80% — 20% ratio, it is possible to do it. The final amount of parameters to be trained is:
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
Also, it is important to scale the data (normalization process) this can be done by subtracting the mean and dividing it by the standard deviation.
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std
Next graphic shows the normalizod dataset.
Forecasting
Once having the data set, split, and normalized, the next step is to define the variable to be evaluated in a single step model. For the time series forecasting, the library to do this implementation is tensorflow, using a process called Data windowing needs to be done first.
Data windowing is to create a window (array) then will contain the number of steps (width) of the features. That means that if we are going to predict the close price of BTC each hour taking the last 24h transaction record, the width is going to be 24. The following graphical example can give an intuition of this.
window that makes a prediction 1h into the future, given 6h of history. Image credit: TensorFlow.org
w1 = WindowGenerator(input_width=24, label_width=1, shift=1, label_columns=['Close']) w1
Total window size: 25
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [24]
Label column name(s): ['Close']
This piece of code shows the output of implementing the data windows taking the main feature (Close price) as the column label.
By identifying the window that we have to construct, the next step is to split the window into pairs of features and labels. To achieve this, the most efficient way is to create a python class named Windows Generator and then generate batches of these windows from the training, evaluation, and test data, using tf.data.Dataset.
Also, two methods were created, split_window and make_dataset. Both of them serve for handling the label_columns generated in the windows generator function so it can serve useful to a single-step method training with the ANN.
Finally, the model and is going to compute three different scenarios in which the prediction will take place
RNN-LSTM
Recently, for a business decision, logistic monitoring, and financial prediction amidst other uses, the complexity of forecasting problems required clever solutions to heads up with these problems. With the continuous improvement of ANN models and properly applying them, a robust alternative is now available to predict accurately and extract unseen features and relationships in this kind of time series problems.
The chosen ANN for this example was an RNN using an LSTM network. This network is similar to an RNN (recurrent neural network); the big difference is that LSTM (Long Short-Term Memory) is a modified version of an RNN, allowing to remember past data easily. LSTM is popular to predict time series given time lags of unknown duration. The training process is done by using back-propagation. Below a diagram of an RNN-LSTM
For this implementation, the model will consist of a 16 units layer LSTM architecture, optimized with ADAM. Because there is a window generated, (24h step), this window will be the input of the dimensions in the cells of the network. Also, the model will have a dense layer for the output of the prediction.
Finally, a callback was selected to save the loss MSE (mean squared error) and to gather the most accurate prediction point based on the smallest error possible of the complete training in the valid set.
Baseline result
The baseline result is the time series forecasting of the data set without any ANN model implemented. The absolute error of this training was 0.0085.
To understand these diagrams, see the pattern between the label points and the predictions. When both data points are connected between themselves, it means the close price of that data point will be occur based on the MSE computed.
However, the patter of the three different scenarios is different. A way to improve this is to merge the three predictions into a single line plot to see which point is most likely to connect, and that should be the investment momentum (the prediction) of the model.
RNN-LSTM result
The ANN model used for this forecasting got a result of 0.0103.
There is not a considerable difference compared to the baseline model except for the second line plot. The first and third plot can be thinked as a good model for the case when there are not high variance data.
Perfomance
We can see that the LSTM can be considered the third best choice in terms of perfomance on predicting BTC close value and it is not bad result at all for our first model iteration.
More focused data preprocessing can be tried and also hyperparameters variation focusing on better results can be done. A lot of work to do is waiting…
Conclusion
Time series forecasting is a very powerful tool but has also limitations. Of course I think this model can be improved and all the methodology followed here is just and intro level overview.
BTC is out there and maybe this can be your lucky day doing ML on it.
Check the code of this blog post here:
https://colab.research.google.com/drive/1AbmRGGRi89TN2dLEQ6u2KAAGKj2JS1Si?usp=sharing
Sources: