Manaswi Patil
Bitcoin, the most successful cryptocurrency to date, is a decentralised digital currency that enables users to make peer-to-peer transactions securely. To date, the cryptocurrency market is worth more than 2 trillion USD and is accepted by many large companies as a legitimate source of funds. However, cryptocurrencies such as Bitcoin have often been criticised for their volatile nature and uncertainty making it unsuitable for further adoption as a currency.
This research project aims to investigate the application of Sentiment Analysis for forecasting Bitcoin. It seeks to analyse the extent to which sentiment of social media data (Twitter) and artificial intelligence (AI) techniques can be utilised to solve the uncertainties and value fluctuations that cryptocurrencies face.
To perform the analysis, a software application was developed to collect Twitter data and examine the sentiment of investors towards Bitcoin using unsupervised Machine Learning techniques. Machine learning will be utilised to perform predictions of long-term values of Bitcoin and further analyse their accuracy.
Results from this research are expected to show the value of the predicted prices, and deduce the most efficient machine learning algorithms for price accuracy. Further, the results attained can help identify improvements of the algorithms used and their potential applications in other financial markets such as stock and forex trading.
Manaswi Patil
The aim of this research is to use Sentiment Analysis and Machine Learning on Twitter data to predict the prices of Bitcoin.
What is Sentiment Analysis?
A Natural Language Processing technique to determine whether data is positive, negative or neutral. Sentiment Analysis can also focus on feelings and emotions (angry, happy, sad, etc), urgency (urgent, not urgent) and even intentions (interested or not interested).
What is Cryptocurrency?
At its core, Cryptocurrency came from the application of encryption to secure networks. In general terms, cryptocurrency is a form of digital currency that is not backed by banks or any government, and instead managed by peer-to-peer networks of computers.
In 2009, Bitcoin was introduced as a new blockchain technology by an individual or group of programmers under the pseudonym, “Satoshi Nakamoto”. Being the most successful cryptocurrency to date, it has attracted a lot of sectors, from technology to hospitality, allowing customers to make transactions using the currency. Now the entire cryptocurrency market is worth more than 2 trillion USD.
Cryptocurrencies have often been criticized because of their volatility. For example, in May 2021, in a single day Bitcoin’s value increased by 30% and then decreased to 12%! There are various reasons behind this, such as news related to geopolitical events, government statements, decisions made by mutinational companies, security threats that are highly publicized, etc.
There have also been multiple scientific studies done to show that the sentiment of an investor factors into the price fluctuation of cryptocurrencies. Like most commodities, assets, investments, or other products, Bitcoin's price depends heavily on supply and demand.
Therefore, this project investigates the application of Sentiment Analysis and Machine Learning for forecasting Bitcoin. This study will look to collect results from a respective software solution and analyse them, with the goal to provide a solution to tackle the volatility cryptocurrencies face.
Methodology
Tweets about bitcoin were retrieved from Kaggle. After pre-processing them, Sentiment Analysis was performed using VADER - a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.
Machine Learning models, consists of features and labels, where features are the sentiment values in this case, and labels are the Bitcoin prices. My dataset is divided into 2 parts. The first part consists of the training set which is used to find the inferred function (machine learning model) of the features and labels. This function is then applied on the second part of the dataset: the testing set. The features of the testing set predict the labels i.e., future Bitcoin prices, through this function.
Preliminary Results
I will be predicting Bitcoin using 4 Machine Learning algorithms: Naive Bayes, Support Vector Machine, Logistic Regression and Maximum Entropy. Until now I have been able to get the results from Support Vector Machine.
** Feel free to reach out to me later to learn about the outcomes of the other algorithms!
My dataset used the highest Bitcoin price from 5th February, 2021 to 31st March, 2021, and calculated sentiment of tweets on each day in this time period. Using Support Vector Machine I predicted the highest Bitcoin price for 10th April, 2021.
And it showed an accuracy of 82.7%!!
Investors, and different stakeholders involved in trading can use Support Vector Machine (SVM) to get a predicted value of Bitcoin that can help them make judgements about whether to sell or buy Bitcoin. The positive outcome of SVM shows a potential solution towards the ongoing issue of Bitcoin's volatility. Infact, there is potential for this to improve through not only larger datasets, pre-processed by powerful machines, but also by using other Articial Intelligence frameworks such as Deep Learning.
Limitations: It is important to acknowledge that the accuracy calculated in my research may differ with a different dataset. Therefore it is essential that this program is rigorously tested before making use of it in industry. Additionally, my result works on a hypothesis that a tweet with positive sentiment will show an increase in Bitcoin prices and vice versa, which may not always be true in reality.
Acknowledgements
I would like to thank my supervisor, Phillip Benachour who has guided and supported me throughout this project. I would also like to thank the Natural Language Processing (NLP) Research Group of Lancaster University who have provided me with feedback on my work.