Big Data and Data Science for Security and Fraud Detection

We review big data analytics tools and technologies that combine text mining, machine learning and network analysis for security threat prediction, detection and prevention at an early stage.

By Khushbu Shah, DeZyre.

Terrorism, Fraud, Cybercrime and other surreptitious online activities have been making headlines in the news recently. Following recent terrorist attacks in Paris and Beirut- it is the urgent need of the hour for the security agencies to eradicated terrorism altogether. Terrorism has become a business to several despots like ISIS, Al-Qaeda, and Boko-Haram and they have become a global threat.  Their objective is to kill and destroy. These tyrants pose the risk of radicalizing the young generation online. ISIS recruits teens from social media platforms like Facebook, Twitter by running successful propaganda campaigns to broadcast its gruesome successes. ISIS is strongly gripped with technologies and western media tools waging war of novel ideas. The good guys are all set to impasse the terrorist groups like ISIS, cyber criminals and fraudsters by using technologically powerful weapons– “Big Data and Data Science”.


Big data analytics tools and technologies have become the first line of defence that combines together text mining, machine learning and ontology modelling to ease security threat prediction, detection and prevention at an early stage. Big data and data science technologies now ease intelligence led investigation processes through improved collaboration and data analysis so that agencies can detect national security threats easily. With organizations moving from the conventional firewall and endpoint vendors to adopting big data and cloud solutions in the enterprise. FBR Capital Markets reports 20% increase in the “next-generation cyber-security spending” in 2015.

Terrorists are highly trained, well-equipped and financially strong. This implies that to win an encounter with Cyber terror, security agencies should use Big Data to leverage predictive analytics. Huge amounts of data is gathered on potential terrorist behaviour  from various data sources that include data on involvement in extreme online conversations, unusual purchases, moving in conflicted regions, connecting with other extremist dispositions, etc. . Security and intelligence agencies are leveraging analytics in real-time to identify data patterns across disparate systems by linking these different and unusual behaviours.

Security firms are using several innovative data visualization and data mining technologies to identify data patterns from the big data to flush out cyber spies, terrorists and hackers. These firms are trying to make the best use of big data and data science technologies to detect fraudulent and many other suspicious criminal activities by identifying suspicious behaviour patterns to identify threats that are likely to happen.

If you can get your arms around a big enough set of data, you’ll always find something in there. It’s not unreasonable to think that the more data you can get access to that you might discover something of predictive value.”- said Fred Cate, director of the Centre for Applied Cyber Security Research

Can big data analytics help fight ISIS?

Social media channels have rich source of information about the various terrorist groups – using which the government authorities can identify occurrence of events that can lead to identifying the key threats on a global scale.  Big data can knock ISIS down, but in an indirect fashion. There is no direct information available about the ISIS group, But there is data that can help security agencies identify who is financing and feeding the group, who are the people supporting the group, who is supplying weapons to the group and similar data points.  All this data can be mined and processed using various big data and data science techniques.

Following the recent Paris Attacks, Taha Mokfi, Teaching Assistant at University of Florida mined twitter data to know what people across the globe are thinking about the Paris terror incidence.200, 000 English tweets were extracted from Twitter account on Nov 15, 2015.All the hashtags including #Iraq, #Muslims, #ISIS, #Syria, #SaudiArabia were taken into consideration for producing hashtag clouds and sentiment scoring. R programming language for data science was used to draw charts to identify relationship analysis of tweets between #parisattacks hashtag and other related hashtags.

Applications of Big Data and Data Science-led Techniques for Security and Fraud Detection

  1. Big Data System in Abu Dhabi to prevent Terrorism

In Abu Dhabi, top security experts have presented a novel security concept through the development of a big data system to Abu Dhabi Autonomous Systems Investments, Tawazum Company. The big data system would screen the entire data that flows into the databases of government authorities which can then be used to prevent any kind of cybercrime or terrorist activities. These big data systems apply a statistical data model and filter the data accordingly. Australia, US and UK are already using this big data system. Such systems help the government assess the feelings of the population about any kind of a social media issues. There are several opposition groups that use social media to organize protests and terror attacks which can be prevented by introducing this kind of a big data system in UAE.

  1. Use of Tableau Software to identify Terrorism

Tableau data visualization tool is used by The Institute for the Study of Violent Groups to scan 10 years’ worth of data on individuals and groups engaged in extremism, trans-national crime and terrorism. ISVG generates various reports every week using Tableau and sends them to the governments defence officials worldwide to detect any suspicious and unusual data patterns.

“We can slice and dice the data instantly and answer questions that we never thought to ask before. Knowing patterns and characteristics of the major terrorist’s camps has helped defence officials make decisions that have saved lives.”-said John Hitzeman, the institute’s IT and analysis coordinator

  1. European Government develops POLE Data Model to Store and Record Incidents

The news headline on 3 girls from London travelling to Syria to join ISIS could have been prevented if this model was developed earlier. One of the three girls was in contact with another girl on Twitter, who was known to the authorities for  joining ISIS. A big data solution has been developed that works on the POLE (Person, Object ,Location and Event based) data model for storing and recording suspicious entities and incidents. The recorded people (entities) in the system can be linked to various other events or people many number of times to build a network of associations and keep track of suspicious people. This data can be retrieved and updated quickly in real time.

  1. Use of Machine Learning and Analytics to predict Online Fraud

The cyber security arm RSA of the US big data company EMC uses machine learning and advanced big data analytics methodologies to prevent online fraud. They have detected approximately 500, 000 attacks in 8 years – half of which were identified in 2012 alone. RSA’s Israeli operation moved away from the rule based fraud detection system in favour of a more self-improving method that uses data science-led methodologies reinforced by Bayesian inferencing.

Every time any RSA client makes a transaction through online banking option-20 factors are stored in the Anti-Fraud Command Centre (AFCC) database. All these 20 factors are then pooled with 150 fraud risk features where each risk feature is a combination of 2 or more of the recorded 20 factors. For instance, a combination of MAC address and IP address can better predict the fraudulency than just the IP address. All these risk features are combined to form groups with Bayesian predictors depending on the patterns in which they indicate fraudulent activity.

Detica – the data intelligence arm of BAE Systems in UK also implements similar technology to identify any kind of advanced tenacious threats by using various data science technologies which had gone unnoticed earlier.

  1. University of Maryland develops algorithm to predict Lashkar-e-Taiba attacks

Lashkar-e-Taiba – the terrorist group that operates between India and Pakistan and was responsible for 2008 Mumbai bombings. University of Maryland implemented analytic technologies similar to the data mining analytics algorithms used by Amazon to predict customer behaviour. Computational analysis of Terrorist Group Lashkar-e-Taiba mined data on 770 variables by extracting 20 years of Lashkar-e-Taiba’s activities.

By using the monthly data on 770 variables, security agencies could identify various factors like what are the different types of terror strikes in various geopolitical situations, identifying the factors responsible for the frequent occurrence of LeT attacks, how the terrorist group chooses their attack locations, etc. This proprietary project developed by Laboratory for Computational Cultural Dynamics (LCCD) at University of Maryland along with another project Temporal Probabilistic Rule system received a funding of $600,000 from the defence department.

  1. Microsoft uses powerful Data Mining Systems to identify Security Threats

Researchers at Microsoft have developed custom built data mining system that culls approximately one million malicious files, 320 million early warning reports and 250 million threat reports that are sent by various organisations running Windows network. The analysts at Microsoft categorize and prioritize the most prevailing threats. This information is then shared with antivirus partners namely McAfee and Symantec. This helps Microsoft analyse and combat cybercrime.

The major areas to focus on, to counter terrorism – are adopting advanced analytics and data science technologies for real time analytics, sharing data in a responsible way and using the analytics to take actionable insights from the huge amounts of data produced. Following these steps can help security agencies and other intelligence firms track online fraud, cybercrime and terrorist activities online and offline.

If you have come across any other interesting applications of big data and data science-led techniques in identifying cybercrime, fraudulent activities and terrorist attacks, please feel free to share these use cases with us in the comments below.