原文 · 未翻译
Overview of Existing Tutorials
The Goal
Data Preparation Train / Test Split Normalization
Train / Test Split
Normalization
Model Construction Definitions Define Graph Start Training Session Use TensorBoard
Definitions
Define Graph
Start Training Session
Use TensorBoard
Results
This is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn. If you don’t know what is recurrent neural network or LSTM cell, feel free to check my previous post.
One thing I would like to emphasize that because my motivation for writing this post is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn’t try hard on improving the prediction outcomes. You are more than welcome to take my code as a reference point and add more stock prediction related ideas to improve it. Enjoy!
One thing I would like to emphasize that because my motivation for writing this post is more on demonstrating how to build and train an RNN model in Tensorflow and less on solve the stock prediction problem, I didn’t try hard on improving the prediction outcomes. You are more than welcome to take my code as a reference point and add more stock prediction related ideas to improve it. Enjoy!
Overview of Existing Tutorials#
There are many tutorials on the Internet, like:
A noob’s guide to implementing RNN-LSTM using Tensorflow
TensorFlow RNN Tutorial
LSTM by Example using Tensorflow
How to build a Recurrent Neural Network in TensorFlow
RNNs in Tensorflow, a Practical Guide and Undocumented Features
Sequence prediction using recurrent neural networks(LSTM) with TensorFlow
Anyone Can Learn To Code an LSTM-RNN in Python
How to do time series prediction using RNNs, TensorFlow and Cloud ML Engine
Despite all these existing tutorials, I still want to write a new one mainly for three reasons:
Early tutorials cannot cope with the new version any more, as Tensorflow is still under development and changes on API interfaces are being made fast.
Many tutorials use synthetic data in the examples. Well, I would like to play with the real world data.
Some tutorials assume that you have known something about Tensorflow API beforehand, which makes the reading a bit difficult.
After reading a bunch of examples, I would like to suggest taking the official example on Penn Tree Bank (PTB) dataset as your starting point. The PTB example showcases a RNN model in a pretty and modular design pattern, but it might prevent you from easily understanding the model structure. Hence, here I will build up the graph in a very straightforward manner.
The Goal#
I will explain how to build an RNN model with LSTM cells to predict the prices of S&P500 index. The dataset can be downloaded from Yahoo! Finance ^GSPC. In the following example, I used S&P 500 data from Jan 3, 1950 (the maximum date that Yahoo! Finance is able to trace back to) to Jun 23, 2017. The dataset provides several price points per day. For simplicity, we will only use the daily close prices for prediction. Meanwhile, I will demonstrate how to use TensorBoard for easily debugging and model tracking.
As a quick recap: the recurrent neural network (RNN) is a type of artificial neural network with self-loop in its hidden layer(s), which enables RNN to use the previous state of the hidden neuron(s) to learn the current state given the new input. RNN is good at processing sequential data. Long short-term memory (LSTM) cell is a specially designed working unit that helps RNN better memorize the long-term context.
For more information in depth, please read my previous post or this awesome post.
Data Preparation#
The stock prices is a time series of length $N$, defined as $p_0, p_1, \dots, p_{N-1}$ in which $p_i$ is the close price on day $i$, $0 \le i 1 else _create_one_cell()
cell = tf.contrib.rnn.MultiRNNCell( [_create_one_cell() for _ in range(config.num_layers)], state_is_tuple=True ) if config.num_layers > 1 else _create_one_cell()
(6) tf.nn.dynamic_rnn constructs a recurrent neural network specified by cell (RNNCell). It returns a pair of (model outpus, state), where the outputs val is of size (batch_size, num_steps, lstm_size) by default. The state refers to the current state of the LSTM cell, not consumed here.
tf.nn.dynamic_rnn
cell
val
batch_size
num_steps
lstm_size
val, _ = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
val, _ = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
(7) tf.transpose converts the outputs from the dimension (batch_size, num_steps, lstm_size) to (num_steps, batch_size, lstm_size). Then the last output is picked.
tf.transpose
batch_size
num_steps
lstm_size
num_steps
batch_size
lstm_size
# Before transpose, val.get_shape() = (batch_size, num_steps, lstm_size) # After transpose, val.get_shape() = (num_steps, batch_size, lstm_size) val = tf.transpose(val, [1, 0, 2]) # last.get_shape() = (batch_size, lstm_size) last = tf.gather(val, int(val.get_shape()[0]) - 1, name="last_lstm_output")
# Before transpose, val.get_shape() = (batch_size, num_steps, lstm_size) # After transpose, val.get_shape() = (num_steps, batch_size, lstm_size) val = tf.transpose(val, [1, 0, 2]) # last.get_shape() = (batch_size, lstm_size) last = tf.gather(val, int(val.get_shape()[0]) - 1, name="last_lstm_output")
(8) Define weights and biases between the hidden and output layers.
weight = tf.Variable(tf.truncated_normal([config.lstm_size, config.input_size])) bias = tf.Variable(tf.constant(0.1, shape=[config.input_size])) prediction = tf.matmul(last, weight) + bias
weight = tf.Variable(tf.truncated_normal([config.lstm_size, config.input_size])) bias = tf.Variable(tf.constant(0.1, shape=[config.input_size])) prediction = tf.matmul(last, weight) + bias
(9) We use mean square error as the loss metric and the RMSPropOptimizer algorithm for gradient descent optimization.
loss = tf.reduce_mean(tf.square(prediction - targets)) optimizer = tf.train.RMSPropOptimizer(learning_rate) minimize = optimizer.minimize(loss)
loss = tf.reduce_mean(tf.square(prediction - targets)) optimizer = tf.train.RMSPropOptimizer(learning_rate) minimize = optimizer.minimize(loss)
Start Training Session#
(1) To start training the graph with real data, we need to start a tf.session first.
tf.session
with tf.Session(graph=lstm_graph) as sess:
with tf.Session(graph=lstm_graph) as sess:
(2) Initialize the variables as defined.
tf.global_variables_initializer().run()
tf.global_variables_initializer().run()
(0) The learning rates for training epochs should have been precomputed beforehand. The index refers to the epoch index.
learning_rates_to_use = [ config.init_learning_rate * ( config.learning_rate_decay ** max(float(i + 1 - config.init_epoch), 0.0) ) for i in range(config.max_epoch)]
learning_rates_to_use = [ config.init_learning_rate * ( config.learning_rate_decay ** max(float(i + 1 - config.init_epoch), 0.0) ) for i in range(config.max_epoch)]
(3) Each loop below completes one epoch training.
for epoch_step in range(config.max_epoch): current_lr = learning_rates_to_use[epoch_step] # Check https://github.com/lilianweng/stock-rnn/blob/master/data_wrapper.py # if you are curious to know what is StockDataSet and how generate_one_epoch() # is implemented. for batch_X, batch_y in stock_dataset.generate_one_epoch(config.batch_size): train_data_feed = { inputs: batch_X, targets: batch_y, learning_rate: current_lr } train_loss, _ = sess.run([loss, minimize], train_data_feed)
for epoch_step in range(config.max_epoch): current_lr = learning_rates_to_use[epoch_step] # Check https://github.com/lilianweng/stock-rnn/blob/master/data_wrapper.py # if you are curious to know what is StockDataSet and how generate_one_epoch() # is implemented. for batch_X, batch_y in stock_dataset.generate_one_epoch(config.batch_size): train_data_feed = { inputs: batch_X, targets: batch_y, learning_rate: current_lr } train_loss, _ = sess.run([loss, minimize], train_data_feed)
(4) Don’t forget to save your trained model at the end.
saver = tf.train.Saver() saver.save(sess, "your_awesome_model_path_and_name", global_step=max_epoch_step)
saver = tf.train.Saver() saver.save(sess, "your_awesome_model_path_and_name", global_step=max_epoch_step)
The complete code is available here.
Use TensorBoard#
Building the graph without visualization is like drawing in the dark, very obscure and error-prone. Tensorboard provides easy visualization of the graph structure and the learning process. Check out this hand-on tutorial, only 20 min, but it is very practical and showcases several live demos.
Brief Summary
Use with [tf.name_scope](https://www.tensorflow.org/api_docs/python/tf/name_scope)("your_awesome_module_name"): to wrap elements working on the similar goal together.
with [tf.name_scope](https://www.tensorflow.org/api_docs/python/tf/name_scope)("your_awesome_module_name"):
Many tf.* methods accepts name= argument. Assigning a customized name can make your life much easier when reading the graph.
tf.*
name=
Methods like tf.summary.scalar and tf.summary.histogram help track the values of variables in the graph during iterations.
tf.summary.scalar
tf.summary.histogram
In the training session, define a log file using tf.summary.FileWriter.
tf.summary.FileWriter
with tf.Session(graph=lstm_graph) as sess: merged_summary = tf.summary.merge_all() writer = tf.summary.FileWriter("location_for_keeping_your_log_files", sess.graph) writer.add_graph(sess.graph)
with tf.Session(graph=lstm_graph) as sess: merged_summary = tf.summary.merge_all() writer = tf.summary.FileWriter("location_for_keeping_your_log_files", sess.graph) writer.add_graph(sess.graph)
Later, write the training progress and summary results into the file.
_summary = sess.run([merged_summary], test_data_feed) writer.add_summary(_summary, global_step=epoch_step) # epoch_step in range(config.max_epoch)
_summary = sess.run([merged_summary], test_data_feed) writer.add_summary(_summary, global_step=epoch_step) # epoch_step in range(config.max_epoch)
The full working code is available in github.com/lilianweng/stock-rnn.
Results#
I used the following configuration in the experiment.
num_layers=1 keep_prob=0.8 batch_size = 64 init_learning_rate = 0.001 learning_rate_decay = 0.99 init_epoch = 5 max_epoch = 100 num_steps=30
num_layers=1 keep_prob=0.8 batch_size = 64 init_learning_rate = 0.001 learning_rate_decay = 0.99 init_epoch = 5 max_epoch = 100 num_steps=30
(Thanks to Yury for cathcing a bug that I had in the price normalization. Instead of using the last price of the previous time window, I ended up with using the last price in the same window. The following plots have been corrected.)
Overall predicting the stock prices is not an easy task. Especially after normalization, the price trends look very noisy.
The example code in this tutorial is available in github.com/lilianweng/stock-rnn:scripts.