Using TensorFlow for Crypto

Gianfranco Mileo
8 min readMay 20, 2019

So let’s start with Bitcoin, what is it?

According to their website. Bitcoin is the main cryptocurrency and it uses peer-to-peer technology to operate with no central authority or banks; managing transactions and the issuing of bitcoins is carried out collectively by the network. Bitcoin is open-source; its design is public, nobody owns or controls Bitcoin and everyone can take part. Through many of its unique properties, Bitcoin allows exciting uses that could not be covered by any previous payment system.

In 2017, Bitcoin grew by around 850% only that year, held a market cap of around 300 billion dollars, and sparked worldwide interest in cryptocurrencies. Can you imagine if we could predict how the market would react?

Cryptocurrency Structure

There’s a lot of data related to Bitcoin I found about 30+ different characteristics of it on bitcoin.com (like blockchain size, price, market cap, etc.). This has been collected since July of 2010, so there ended up being around 60 thousand different data points to process. With such a large amount of available information, there was a great way to see if I could predict the prices using machine learning.

Machine Learning. By utilizing neural networks and acting as an artificial brain, machines are able to find patterns in a big dataset with minimal human involvement (which is awesome when there are 60,000 data points!). Machine learning has recently seen a huge increase because of a rise in both available data and computational power. Researchers have also been working to make even more complex neural networks with more and more layers (deep learning), which allows them to solve even harder problems. Machine learning itself has a bunch of applications in almost every field imaginable; recent advances in machine learning include self-driving cars, language translation, and facial recognition. This kind of power was my best bet to see if I could predict cryptocurrency prices! I will be using Long Short-Term Memory (LSTM).

Simple neural network

But how does it work? It basically works by using special gates to allow each LSTM layer to take information from both previous layers and the current layer. The data runs through multiple gates (i.e. forget gate, input gate, etc.) and various activation functions and is passed throughout the LSTM cells. The main advantage of this is that it allows each LSTM cell to remember patterns for a certain amount of time they essentially can “remember” important information and “forget” irrelevant information.

Basic LSTM cell

So with this in mind, it was time to put my hands into work. I went on bitcoin.com and downloaded all of the data they had available, ending up with 37 different Excel files. There was a way to feed each file separately into the neural network, but it was way easier to manually combine the files into one big file that has 37 columns instead of just one column per file. So I did this and got a huge file that was 37 columns by around 2667 rows (each row is a day, each column is a feature of Bitcoin for that day). Unfortunately, it’s not that easy to do machine learning, I also had to do some data preprocessing to make sure my data was fed into the neural network in the best way.

Okay, data preprocessing has some pretty technical steps to it. The first thing I did was an imaginary window over the big Excel file to make it into arrays of 100 days by 37 features. So imagine changing a 2D rectangle into a 3D rectangular prism. Next, I did some normalization on the data. Since the range of values for each feature varied so much, it was in my best interest to normalize the numbers for each feature so that each separate data point would contribute about the same to the overall training of the neural network. I still had to split the data into training, validation and testing sets. This step is pretty easy, I basically took the most recent 10 percent of the data as a test set and took the other 90 percent as training data (5 percent of that 90 percent was split off into a validation set).

The data is inputted as a .csv file. This method will transform the data from an array of shape (n x m), where n represents the number of days and m represents the number of features relating to bitcoin, to a tensor of shape (n-w x d x m), where d represents the number of days to look at in each sample of data and w represents the window size. This will be accomplished by using a time-series transform to turn the original array into a set of windows data (window_size = 50).

The data will then be normalized by looking at each window and dividing each value in the window by the first value of the window and then subtracting one. For example, the normalization technique will change the set of data [4,3,2] into [0, -0.25, -0.5]. These values are obtained by dividing all values into the data by the first value, in this case, 4, then 1 is subtracted from each resulting value (i.e. 3 would become (3/4)-1, or -0.25).

The unnormalized bases are kept in order to get the original values back for the testing data. This is necessary to compare the model’s predictions of prices with the true prices.

After normalization, the first 90% of the data is used in training the model, and the last 10% will be used to test the model. These data will be stored in X_train, Y_train, X_test, and Y_test. The training data will be shuffled such that the order of days in each window remains consistent, but the order of the windows will be random.

Finally, a list of the prices before each day Y_test is drawn from will be compiled in order to generate statistics about the model’s predictions, including precision, recall, and F1 score. Additionally, these prices can be used to identify whether the model predicted an increase or decrease in price.

Simple sliding window transformation

The model to be used in this project is a three-layer recurrent neural network (RNN) that incorporates 20% dropout at each layer in order to reduce overfitting to the training data. This model will have a total of 515,579 trainable parameters throughout all of its layers.

The model uses AdamOptimizer as its optimization function. The AdamOptimizer is an algorithm for first-order gradient-based optimization of functions, based on adaptive estimates of lower order moments. It is known for being straightforward to implement, being computationally efficient, having little memory requirements, and is well-suited for models that have lots of parameters or training data

The loss function used in this model is a mean squared error. Mean squared error measures the average of the squares of the difference between each true y value and the corresponding predicted y value. This measurement is useful when measuring how close a predicted line is to the true data points. The model will train itself by attempting to minimize the mean squared error.

A linear activation function is used in this model to determine the output of each neuron in the model. The linear activation function is simply defined as f(x) = x.

The model will use Keras’s Sequential model with Bidirectional LSTM layers.

The model will be fitted to the training data (which was X_train and Y_train), with a batch_size of 1024. Additionally, 100 epochs will be performed to give the model time to adjust its weights and biases to fit the training data.

The model will be given the x values of the testing data and will make predictions of the normalized prices. This will be stored as y_predict.

The percent change in price will be processed such that an increase in price is represented by a 1, and a decrease/no change is represented by a 0. These binary values will be stored in arrays delta_predict_1_0 and delta_real_1_0.

This will be done by looping through the values of the real and predicted percent change arrays. If a value is greater than 0, a 1 is stored in a new array. Otherwise, a 0 is stored in the new array. This process is very useful to understand how well the model did and can be used to gather statistics about the model’s performance.

The binary categories computed in the previous cell is now used to compare predicted and real data. It will be used to find the number of:

  • True positives
  • False positives
  • True negatives
  • False negatives These can then be used to further calculate statistics of the model’s performance.

This will be done by looping through both binary arrays at once and getting the corresponding values. If the real value is a 1 and the predicted value is a 1, that index will be counted as a true positive. If the real value is a 1 and the predicted value is a 0, that index will be counted as a false negative. If the real value is a 0 and the predicted value is a 0, that index will be counted as a true negative. If the real value is a 0 and the predicted value is a 1, that index will be counted as a false positive.

After I trained the model to convergence, I tested the model on my test set, getting an F1 Score of .62 when I used a binary classifier (either the next-day price goes up or it goes down), along with a mean squared error of 0.043075. I conducted a statistical significance test and I found out that my results were significant at 99.1% confidence interval with a p-value of .0012. You can think of this as there is only a .12% chance that my model’s results were not significant because they were by pure chance.

This shows how powerful machine learning is and how it has a huge variety of applications. Using this model itself on cryptocurrency would allow people to make a lot of profits by letting them buying and selling cryptocurrencies at predicted times.

2018-2019 Bitcoin market chart

--

--