How Data Science is Fighting Disease

Many organisations are starting to use Data Science as a method of tracking, diagnosing and curing some of the world’s most widespread diseases. We look at 3 common diseases, and how Data Science is used to save lives.

data-science-vs-diseaseHealthcare in the 21st century is at an impasse. Like so many things, it is a case of limited supply and unlimited demand, and the gap between the two is growing ever wider. In the UK, there are fears that the National Health Service (NHS) is to be privatised; divided up piece by piece and flogged to the highest bidder. In the US, President Obama is trying to make affordable healthcare a reality for all Americans, but it is proving an uphill struggle. In addition to this, disease is still an ever-present concern worldwide, and the number of people losing their lives to these conditions still vastly outnumbers the number of survivors. However, many organisations are starting to use Data Science as a method of tracking, diagnosing and curing some of the world’s most widespread diseases. We have compiled 3 of the most common diseases suffered worldwide, and the ways in which Data Science is being used to save lives.

As the leading cause of death worldwide, cancer is one of the foremost concerns of medical practitioners worldwide. With an estimated 14 million new cases and 8 million deaths worldwide annually, 60% of which are in Africa, Asia and Central and South America, cancer is a concern for everyone, regardless of background or age. According to the World Health Organisation, cancer cases worldwide could be reduced by up to 30% if certain natural factors were addressed, such as alcohol and tobacco consumption. However, Data Science is now actively being used to limit the deadly effects of cancer. With the right amount of medical data, data scientists and doctors will be able to work together to analyse hundreds of thousands of patient records and medical logs to best understand how the cancers start and how they might be slowed. San Francisco start-up Enlitic are training deep learning algorithms to detect signs of cancer much sooner than a doctor would be able to diagnose, and therefore be able to offer treatment more quickly. Meanwhile, the Oregon Health and Science University are trialling gene-mapping as a possible solution. They want to map the gene sequences of thousands of cancer patients in order to understand how the cancers form, and therefore offer diagnoses much more quickly. In fact, the institution has made the ambitious claim that it would like to offer same-day cancer diagnosis by 2020, which is a very bold statement, but clearly shows the faith they have in using Data Science to this end.

A condition that has an inextricable hold over 5 million people worldwide, with over 1 million of those living in the US, is Parkinson’s Disease. This neurological condition is epitomised by the tremors which many of its sufferers must deal with, but by the time these motor-based symptoms appear, much of the damage is already done. As many as 80% of the Dopamine cells in the brain will have already been destroyed before visible symptoms appear, and more frighteningly for those suffering with the condition, there is no known cure. While there is no cure for Parkinson’s at present, companies are researching using Data Science as a means of detecting Parkinson’s symptoms in those who might have a predisposition, but also making life easier for those who are already afflicted with the condition. Several firms are aiming to use wearable technology and the Internet of Things to help doctors get a better understanding of the symptoms of Parkinson’s, such as the aforementioned tremors, walking gait and sleep quality. To this end, Intel have partnered with the Michael J Fox Foundation to develop a series of wearable technologies which will not only alert the wearer more quickly that they might have a predisposition to the condition. The wearables will also be able to upload their data to a central hub, from which data scientists and doctors will be able to analyse and study the data to identify common threads among patients in order to one day find a cure for Parkinson’s.

A third condition, which has burst into the public consciousness in the last five years, is Ebola. Between March 2014 and October 2015, there were over 10,000 reported cases in West Africa alone, which resulted in some 5,000 deaths. The situation is exacerbated by the disease’s focal point: Guinea, Liberia and Sierra Leone, which happen to be three of the poorest countries in the world, and the constant movement of infected parties between different countries means that the disease is spreading at an alarming rate. The key role for Data Science in the fight against Ebola is tracking the infection across Africa, in order for aid workers and doctors to know where to send aid to most urgently. The Centre for Disease Control (CDC) is utilising real-time mapping software and telecommunications masts in order to track the disease across Western Africa. When there is a flare up in one part of the country, and the threat of infection is raised significantly, the CDC can send resources to the area to try and limit the spread. In addition, pharmaceutical firm Pfizer have been looking at how to limit the spread of infection, by using Big Data to investigate at how HIV was limited significantly in Scandinavia by the antibodies that some people produced that made them more resistant to the disease. Pfizer were then able to replicate the genetic makeup of this antibody and use it in their treatments of HIV. Tim Gamble, who headed up the Pfizer team when they made this discovery, has not ruled out a similar path for Data Science in the fight against the spread of Ebola. It cannot be said for definite that Big Data will be able to stop the spread of Ebola, but it will vastly improve the spread of resources to areas where they are needed most.

Data Science could have a huge impact on the way diseases are diagnosed, tracked and treated worldwide, but it does come with its own set of difficulties. The majority of medical data is considered confidential, and therefore cannot currently be used without the consent of the individual concerned. This has presented huge difficulties for companies using Machine Learning, which requires a large pool of data from which it can learn to detect conditions as efficiently as a human. Although these companies are fighting hard to be allowed access to the necessary data, it is the decision of the patients themselves whether they wish for their medical information to be used. Furthermore, while these diseases can be tracked and diagnosed by Big Data, which goes some way to helping the patients, will data science ever be able to find a cure for any of these conditions? One might also question whether, in the future, data scientists and doctors will be of equal importance to the diagnosis and eventual recovery of a patient, despite many data scientists have little to no medical experience or training?

Data Science offers enormous opportunity for healthcare around the world to tackle some of the deadliest conditions head on. It does however, require a compromise from the patients and medical professionals. If we went with the suggestion of TED speaker John Wilbanks, who suggested in his 2012 talk that we should pool our medical records in order to provide doctors and data scientists a larger spectrum of data with which to research, we might give ourselves the best chance of diagnosing, treating or even stopping these conditions before they can take any more lives.