When Your Client Says ‘Can’t You Code a Program That’ll Predict the Open Price?!’
Well, yes, you can, it may not be accurate, but you can use neural networks to take a good guess at the open price based on historical data.
In this blog post, we are going to use a type of neural network (a deep learning model) known as Long Short-Term Memory (LSTM) to predict the open price of the ES (S&P 500 futures). LSTMs are particularly good at processing time-series data like futures prices (or stock, FOREX prices), because they can learn long-term dependencies, which are patterns in the past data that can influence future data. We’re going to use the LSTM library within Python.
First, Let’s Grab Some (FREE) Data!
Before we can start with the LSTM model, we first need the historical data for the ES. There are various online sources where you can get this data. Two of the most popular ones are Yahoo Finance and Nasdaq.
-
Yahoo Finance: Go to the Yahoo Finance website (https://finance.yahoo.com/), search for the ES (or the specific stock or index you are interested in), go to the “Historical Data” tab, specify the time period you want the data for, and then click on “Download”. This will download a CSV file with the open, high, low, close prices, and volume for the specified period.
-
Nasdaq: Go to the Nasdaq website (https://www.nasdaq.com/), search for the ES (or the specific stock or index you are interested in), go to the “Historical Quotes” tab, specify the time period you want the data for, and then click on “Download Data”. This will download a CSV file with the open, high, low, close prices, and volume for the specified period.
After you have downloaded the data, you can load it into your Python environment using the pandas library’s read_csv
function. At this stage, let’s call in all the libraries that we’re going to need from within the terminal and take care of the csv file:
import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler from keras.models import Sequential from keras.layers import Dense, LSTM from keras.callbacks import EarlyStopping # Load your data data = pd.read_csv('path_to_your_file.csv')
Data Preprocessing Mastery!
Let’s start by loading the data and scaling the prices to the range between 0 and 1. This is important because the default activation function of the LSTM units has an output between 0 and 1, and scaling the data to this range can make the training process more stable.
data = [...] # replace this with YOUR data df = pd.DataFrame(data, columns=["Date", "Open"]) df["Date"] = pd.to_datetime(df["Date"]) df.sort_values("Date", inplace=True) df.reset_index(drop=True, inplace=True) # now we can scale the data we grabbed! scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(df["Open"].values.reshape(-1, 1))
Preparing Your Data for LSTM (let the fun begin!)
Let’s prepare the data for the LSTM model. We use a window size of 5, meaning the model will use the prices of the previous 5 days to predict the price of the next day (of course by open, I mean the RTH open). We also split the data into training and testing sets. We’ll use 80% of the data for training and the remaining 20% for testing. Remember, will all ML models, we need to train the data, like you would train a pooch to fetch, it takes time to get this part just right, but eventually the magic happens!
def create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return np.array(dataX), np.array(dataY) look_back = 5 X, y = create_dataset(scaled_data, look_back) # Split the data into training and testing sets (80/20) train_size = int(len(X) * 0.8) test_size = len(X) - train_size X_train, X_test = X[0:train_size,:], X[train_size:len(X),:] y_train, y_test = y[0:train_size], y[train_size:len(y)] # Now this part is a bit more complicated, so I will be verbose with my comments! # Reshape the input data into the 3D format as required by the LSTM. # The reshaped data will have the dimensions [samples, time steps, features]. # - "samples" is the number of data sequences. Each sequence corresponds to a row in the original 2D data. # - "time steps" is the number of observations within each sequence for which the LSTM will learn and make predictions. Here, it's set to 1, treating each sample as a separate sequence. # - "features" is the number of attributes used to represent each data point. In our case, it's simply the open price of the stock for each day. X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1])) # Reshaping X_train to [samples, time steps, features] X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1])) # Reshaping X_test to [samples, time steps, features]
Let’s Build & Train – Shall We?
We then build and train the LSTM model. We build a simple LSTM model with one LSTM layer of 50 units and a dense layer for the output. We use mean squared error as the loss function and Adam as the optimizer. We train the model for 50 epochs with a batch size of 1. We also add early stopping to stop training when the validation loss stops improving. This part is even more complicated, but to keep this post from turning into a dissertation, you can just copy and paste my code into your terminal and shoot me an email to discuss further if need be.:)
# Define the model model = Sequential() model.add(LSTM(50, input_shape=(X_train.shape[1], X_train.shape[2]))) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam') # Fit the model history = model.fit( X_train, y_train, epochs=50, batch_size=1, validation_data=(X_test, y_test), callbacks=[EarlyStopping(monitor='val_loss', patience=10)], verbose=2, shuffle=False )
Alas, Making Our Predictions To Impress (hopefully not upset) The Client!
Finally, we use the trained model to make predictions on the test set and scale the predictions back to the original price range.
# Make the predictions train_predict = model.predict(X_train) test_predict = model.predict(X_test) # Invert the predictions train_predict = scaler.inverse_transform(train_predict) y_train = scaler.inverse_transform([y_train]) test_predict = scaler.inverse_transform(test_predict) y_test = scaler.inverse_transform([y_test])
Taddah! That’s it, deploying this code will make a prediction of the open price for you, assuming the data you grabbed was current and accurate. Will this powerful tool always make the right prediction? Absolutely not! It most assuredly will be wrong, and it may be wrong often, but it is a proper way of guessing at the what price may do in the near future! It’s one of the best tools we’ve got (yes, we can certainly build more data points to make it better too). We can never predict the future with any level of certainty, but we always know the past, thus we can build models, and make educated guesses about the future, and as long as we stay humble, once in a while we may even be right!
To learn more about this or to make general inquiries, please contact https://PinnacleQuant.com/contact.
About
This blog was authored by Raffi Sosikian. Raffi is a highly proficient software engineer, and enterprise architect. He has an MBA, holds a Series 3 license, and is the principal at Pinnacle Quant, LLC, CTA, a boutique commodity trading advisory firm that specializes in building both custom trading systems (including private label for small funds to brand as their own software), as well as in-house pre-built quant and price action based automated trading systems. To learn more about Raffi, or if you would like to build your own private label custom trading system, please visit https://PinnacleQuant.com/consulting.
Happy coding, happy predicting, and happy trading!