Sentiment Analysis & Predictive Analytics for trading. Avoid this systematic mistake
The financial market is the ultimate testbed for predictive theories. With this post we want to highlight the common mistakes, observed in the world of predictive analytics, when computer scientists venture into the field of financial trading and quantitative finance.
By Lars Hamberg.
Many common mistakes can be avoided when testing sentiment data for predictive properties. Here is one:
The term “prediction” is not a legal definition. In assessing the predictive qualities of sentiment data there are no rules for what counts as a signal to be tested for predictive properties with regard to financial assets.
However, the method you chose ultimately defines what you mean with the term “prediction”.
To illustrate the point: Using a more prudent definition of the term, the accuracy in the world’s most famous prediction study could have been as low as 47% (7 out of 15) instead of 87% (13 out of 15%).
An accuracy rate of 47% would not have produced worldwide media attention and more than 1600 academic citations, in my view.
The financial market is the ultimate testbed for predictive theories. A successful prediction tool for the financial market is a tickling idea and mind-boggling, in terms of implications.
A few years ago, a study* called ”Twitter mood predicts the stock market” (“the Bollen Study”), by Johan Bollen, Huina Mao and Xiaojun Zeng (“Bollen”) received a lot of media coverage.
With more than 1600 academic citations, it remains the most cited paper in the field of investigating the use of sentiment data in prediction models for financial risk assets.
With this post I want to highlight a common mistake, observed in the Bollen study, and elsewhere in the world of predictive analytics, when computer scientists venture into the field of financial trading and quantitative finance:
Assume that your sentiment data produces a new signal every day. For some reason, your sentiment signal suggests that the market will move up the following trading day. This repeats itself, every day for 20 consecutive days, and the market keeps going up, every following trading day.
Your textbook on financial markets says that daily market directions are random. As a consequence, you believe that your model has 20/20 prediction accuracy for something that is known to be random!
This calls for champagne, since your model correctly called the coin-flip 20 times in a row! Or, did it really?
Have you stumbled across the greatest discovery of all times? No, of course you haven’t.
Does your sentiment signal have some kind of predictive properties? Probably not… Sorry!
Cancel the champagne and start consulting your critical faculties:
In academic papers regarding market predictions based on social media there are often references to the fact that the Efficient Market is since long disputed and that price formation may not be random.
Typically, people behind these papers hypothesize that price formation and market direction may be predicted through analysis of social media, or similar. That’s the basic idea subject to investigation.
As a consequence, it runs contrary to reason to test whether market direction is systematic under the testing condition that market direction is random, wouldn’t you agree?
Still, this is exactly what some people tend to do, which may surprise many practitioners within quantitative finance. The Bollen study has been academically cited more than 1600 times and remains the world’s most cited study in this field of research.
The results in the Bollen Study refers to a cumulative binomial probability (0.35%) as a mark of robustness in a prediction accuracy (86.7%) based on 15 observations, as if the model had correctly called 13 out of 15 coin flips.
Beyond the obvious problem with very small samples, Bollen is testing whether market direction is systematic under the testing condition that market direction is random. This general approach is problematic for a number of reasons:
Since causality is, typically, unclear, and since the possibility of both feedback loops and clustering effect must be considered with regard to both time series, any proper testing for predictive properties must be carried out using methods that take into account that both time series may be random or systematic; random in some time intervals and systematic in others; independent and dependent; independent in some time intervals and dependent in others; leading or lagging, as well as both leading and lagging, in different time intervals.
One such method, with regard to market direction, is to only count changes in direction forecast – i.e. to only count when sentiment data changes from LONG to SHORT, or vice versa – as proper sentiment signals to be assessed against the actual aggregate market direction for the entire time period between changes in sentiment signals. This, for instance, is the method used in the ongoing multi-year prospective prediction study (“WIM”), available online at whatismonitor.com
Hence, in order to use the method described, the sentiment data must produce a signal expressed as LONG or SHORT or BUY or SELL, or similar, and the accuracy of prediction must be based only on those instances when the sentiment signal changes, i.e. from LONG to SHORT, or vice versa.
To illustrate the point: if your sentiment data changes from SHORT to LONG on day 1 and keeps telling you to be LONG for the following 5 days in a row, after which it changes back to SHORT, and the market moves up during the first 4 of those 5 days, it does not mean that your model has been 80 per cent correct. Nor does it mean that your model has been right even one single time, i.e. regarding the four-day upward move.
In order to count the predictive signal as a “success” or a “failure” in the study it’s reasonable to look at the aggregate price move over the entire five-day period, i.e. up to the point when the predictive signal has “closed the trade”, by means of a change in signal: from LONG to SHORT, or vice versa
If, and only if, the market – or the price – moved up over the entire five-days, i.e. the time period between the changes in signal, from SELL to BUY and back from BUY to SELL, can it be counted it as one successful prediction.
Reversely, if the price had moved down, the observation would have been logged as one failed prediction. Regardless, – with this definition of “statistically predictive signals”, “predictions”, it remains one single point of observation.
It appears that Bollen used a different method. The impressive headline number of 87% (86,7%) accuracy, in predicting daily moves on the DJIA, was achieved during 15 trading days. During these 15 days Bollen implies a hit-ratio of 13 days out of 15 days.
However, is worth noting that the market didn’t change direction more than 8 times during the 15-day period, i.e. the first 15 trading days in December 2008. [table]. On two occasions, during these 15 trading days, the market direction remained the same for two days in a row, and on one occasion, the market direction remained the same for three days in a row.
What does it tell us?
In short order, it tells us that – with the prudent definition of the term “prediction” – the 87% (13/15) accuracy in the Bollen study could never have been achieved.
With a prudent definition of “prediction”, the accuracy in Bollen’s signals could actually have been as low as 47% (7/15) and still produce an accuracy of 87% (13/15 ) with Bollen’s definition of “prediction”.
An accuracy rate of 47% would not have produced worldwide media attention and more than 1600 academic citations, in my view.
Related: