Photo by Igor Shabalin

Stock Price Analysis with Deep Learning

Yuli Vasiliev

--

In recent times, the field of machine learning has largely shifted to neural networks. While a neural network (a deep learning model) typically requires more computational processing power than a regular machine learning model, it often yields more accurate results. A typical neural network requires lots of training — stepping through multiple epochs — to get the learning processes correct. In this context, an epoch of training is considered a single presentation of training samples to the network. During each epoch, the model is updated accordingly, with the ultimate goal of improving accuracy.

In my previous post Stock Price Analysis with Machine Learning, I provided an example of a simplified ML model designed for stock price analysis. In this article, you’ll see how to create and train a DL model on this same dataset.

Preparing Your Working Environment

The dataset to be used in this article is supposed to include historical price data of a stock, as well as the figures of the S&P500 stock market index. To be able to obtain this data, you’ll need to install the following libraries:

pip install yfinance
pip install pandas-datareader

If you didn’t know, the yfinance library is a Python wrapper for Yahoo Finance API that allows you to obtain the data for a certain stock within a specified time period. The pandas-datareader library can be used to obtain S&P500 index figures.

You will also need to have the sklearn library installed to split your sample data into random train and test subsets. The library can be installed with pip:

pip install sklearn

The neural network is then created and trained with Keras, a Python’s deep learning API library. Keras requires TensorFlow, so you’ll need to install both:

pip install tensorflow
pip install keras

If you cannot install these libraries on your machine for some reason, you can take advantage of a Google Colab notebook, a great way to run your Python experiments from within a browser. If you decide to use Google Colab, you’ll need to start with installing only the yfinance library in the first cell:

!pip install yfinance

All the other libraries mentioned above are available in Google Colab by default.

Obtaining and Preparing Data

The steps of obtaining and preparing data for training the model are the same as you might see in the Stock Price Analysis with Machine Learning article published recently. Below you’ll find that code bundled in a single listing:

import yfinance as yf
tkr = yf.Ticker(‘AAPL’)
hist = tkr.history(period=”5y”)
import pandas_datareader.data as pdr
from datetime import date, timedelta
end = date.today()
start = end — timedelta(days=5*365+1)
index_data = pdr.get_data_stooq(‘^SPX’, start, end)
df = hist.join(index_data, rsuffix = ‘_idx’)
df = df[[‘Close’,’Volume’,’Close_idx’,’Volume_idx’]]
import numpy as np
df[‘priceRise’] = np.log(df[‘Close’] / df[‘Close’].shift(1))
df[‘volumeRise’] = np.log(df[‘Volume’] / df[‘Volume’].shift(1))
df[‘priceRise_idx’] = np.log(df[‘Close_idx’] / df[‘Close_idx’].shift(1))
df[‘volumeRise_idx’] = np.log(df[‘Volume_idx’] / df[‘Volume_idx’].shift(1))
df = df.dropna()
df = df[[‘priceRise’,’volumeRise’,’priceRise_idx’,’volumeRise_idx’]]
conditions = [
(df[‘priceRise’].shift(-1) > 0.01),
(df[‘priceRise’].shift(-1) < -0.01)
]
choices = [1, -1]
df[‘Pred’] = np.select(conditions, choices, default=0)

As you might guess, the model to be trained uses stock price, stock volume, as well as index price and index volume as the input data. Since each indicator in the input data has a different scale, you calculate the percentage change of each indicator in relation to the previous day (as a measure of difference). As a result, you have the priceRise, volumeRise, priceRise_idx, and volumeRise_idx columns in the df DataFrame to be used as the features in the model to be trained.

Configuring and Training the Neural Network

In the following snippet, you create the array containing the target variable. To evaluate the effectiveness of the model, you might want to perform a test that uses random values from the range: -1, 0, 1 for the target variable. Obviously, the accuracy of such a model with the target variable values generated randomly should be approximately 33%:

#Uncomment it to perform a random test
#target=np.random.randint(-1,2, size=len(df))
#Comment the next line when performing a random test
target = df[‘Pred’].to_numpy()
Then, you create the array of the features to be used for training and evaluating the model:

features = df[[‘priceRise’,’volumeRise’,’priceRise_idx’,’volumeRise_idx’]].to_numpy()
features = np.around(features, decimals=2)

Next, you split the features and target arrays into random train and test subsets.

from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)

You also need to configure the neural network before you can train it:

from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(4, activation=’relu’, input_shape=(4,)))
network.add(layers.Dense(512, activation=’relu’))
network.add(layers.Dense(3, activation=’softmax’))
network.compile(optimizer=’rmsprop’,loss=’categorical_crossentropy’,metrics=[‘accuracy’])from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(y_train, num_classes = 3)
test_labels = to_categorical(y_test, num_classes = 3)

Finally, you can train the network:

network.fit(X_train, train_labels, epochs=5, batch_size=256) 
test_loss, test_acc = network.evaluate(X_test, test_labels)

In my Colab notebook, the progress of the training process, along with the loss and accuracy figures for each epoch, was shown as follows:

Epoch 1/5
4/4 [==============================] — 1s 5ms/step — loss: 1.0811 — accuracy: 0.4782
Epoch 2/5
4/4 [==============================] — 0s 5ms/step — loss: 1.0344 — accuracy: 0.5611
Epoch 3/5
4/4 [==============================] — 0s 4ms/step — loss: 1.0062 — accuracy: 0.5629
Epoch 4/5
4/4 [==============================] — 0s 6ms/step — loss: 0.9980 — accuracy: 0.5585
Epoch 5/5
4/4 [==============================] — 0s 5ms/step — loss: 0.9958 — accuracy: 0.5541
8/8 [==============================] — 0s 3ms/step — loss: 0.9962 — accuracy: 0.5595

You might want to look at the final accuracy figure:

print(test_acc)0.5595238208770752

What can this figure tell you? Well, you know that stock price history does not actually contain much predictive information. However, when it is used along with volume history and the index indicators are taken into account as well, you can get something a little better than it would when using just random values. To make sure, you can perform a rundom test. For that, find and uncomment this line in the script:

target=np.random.randint(-1,2, size=len(df))

Then, comment the next line:

#target = df[‘Pred’].to_numpy()

After that, re-run the script. In my experiment, I got the following results:

Epoch 1/5
4/4 [==============================] — 1s 5ms/step — loss: 1.0987 — accuracy: 0.3261
Epoch 2/5
4/4 [==============================] — 0s 4ms/step — loss: 1.0981 — accuracy: 0.3371
Epoch 3/5
4/4 [==============================] — 0s 5ms/step — loss: 1.0973 — accuracy: 0.3400
Epoch 4/5
4/4 [==============================] — 0s 4ms/step — loss: 1.0973 — accuracy: 0.3466
Epoch 5/5
4/4 [==============================] — 0s 4ms/step — loss: 1.0988 — accuracy: 0.3252
8/8 [==============================] — 0s 2ms/step — loss: 1.0996 — accuracy: 0.3373

Print the final accuracy:

print(test_acc)

In my experiment, the result was the following:

0.3373015820980072

It is quite understandable: with a random choice of one of the three, the probability of the correct choice is about 33.3%

IMPORTANT NOTE: Real-world models used in trading are much more complicated. The model discussed in this example is research level, NOT FOR PRODUCTION. The model is provided just to illustrate the concept of how you can derive and use different features for training a neural network.

--

--