Say you have a thousand columns and a million rows in your data set. Whichever way you look at it – small, medium or big data – you won’t be able to actually look at it. Zoom it in or out. Fit it into one screen. Blame human nature but most of us understand a subject better when they get to see a bigger picture. Is there a way to put your data in one image and navigate it almost like you would do with a map?
Deep Learning combined with Topological Data Analysis can do exactly that and more. Here are 6 craziest science stuff this technology can do with your data:
1. It creates an image of your data within minutes where every dot is an item or a group of similar items
Based on items’ correlation and learned patterns the system places groups of similar items together. This results in a unique representation of your data, which will give you a better insight into your data. Nodes in a visualisation consist of one or many data points while links represent a high lever of similarity between the items.
2. It spots patterns in the data that would have been impossible to identify using traditional business intelligence
This is an example of how the algorithm identifies two distinct groups just by analysing users’ activities. A surprising characteristic distinguishes yellow and blue dots: females and males.
If we analyse by the type of activity, one of the groups mostly sends messages (males), another receives them (females).
3. It identifies the segments in your data on many levels
Segmentation is performed on many levels – from high-level categories to groups with similar data items.
In the example of a Netflix dataset, each data item is a movie. The highest level groups are music, kids, foreign and adult movies. Middle level contains different segments: from Indian and Hong Kong to thriller and horror movies. On the lower level we’ve got a group of TV series such as “Jeeves and Wooster”, “The Office”, “Doctor Who” and others.
4. It analyses any data: texts, images, sensors’ data and even sound
Any data can be segmented and understood if it can be presented as a matrix of numbers, where every row is a data item and column is a parameter. These are the most common use cases:
5. It learns more complex dependancies if you guide it
Select a group of items, group them, and the algorithm will find all related or similar items. Repeat this process a few times and a neural network will learn the difference between, for example, texts about Mac hardware, PC hardware and general electronics.
Initial analysis of 20,000 articles on 20 different topics resulted in a dense cloud of points (left image). After applying Deep Learning a few times an algorithm grouped them at an error rate of just 1.2% (right image).
6. It learns even without a supervision
Deep Learning and Autoencoders are mimicking human brain activity and can automatically identify high-level patterns in a dataset. For example in Google Brain project Autoencoders successfully trained themselves to recognise human and cat faces based on 10 million digital images taken from YouTube videos:
I’ve been playing around with topological data analysis and deep learning lately and developed a tool that brings these technologies into one user-friendly interface to help people to see their data and new possibilities it offers. Have a look at the website and let me know if you’d like to create a map of your data.