KDnuggets : News : 2002 : n24 : item3 < PREVIOUS | NEXT >

Features

From: Gregory Piatetsky-Shapiro
Date: 10 Dec 2002
Subject: Can Total Information Awareness work or how can you separate bad coins from good ones?

I have seen many discussions about Total Information Awareness that say it cannot possibly work because there are very many americans and very few terrorists and any data mining will generate too many false positives.

We can have a serious debate about how much privacy we are willing to give up in exchange for reducing the threat of terror. However, the debate should also allow the possibility that TIA can work and identify at least some bad guys before they succeed.

The key point is that with enough history and repetition the suspicious patterns will stand out, despite noise and errors.

The second key point is that the system does not need to perfect to be useful. Law enforcement people are usually glad if they have even one good lead from a hundred tips. For example, in the recent Washington sniper case, there were over 20,000 tips and only 7 good leads.

Here is a very simple example that shows the power of having enough data.

Imagine that you have a thousand coins and one of them is crooked (i.e. probability of heads is not 1/2 but 1/4). How can you tell the crooked one apart from observing the coin behaviour?

If you flip each coin once, you cannot determine which one is crooked.

If you flip each coin a thirty times, and separate the group with 11 or less heads, you are likely get about 100 coins, which include the crooked coin and 99 "false positives" (honest coins).

If you flip each coin three hundred times, and separate the group with 110 heads or less, it will include only the crooked and no other coins with probability of over 0.99.

Likewise, if you have 300,000,000 honest coins, and 19 bad coins,

if you make 30 tosses, you cannot identify the bad coins.

If you toss each coin 300 times, you can separate a group of 2000 coins which will includes all bad coins with high probability.

With 600 tosses, you will identify a group that is likely to include only the bad coins.

How many times you will need to flip each coin to find at least one crooked coin? One can find a formula that depends on the level certainty you want, the number of crooked coins, and how crooked is each coin. If a bad coin always falls on the edge, you will identify it immediately after one toss.

Applying this to terrorists, if there is one terrorist that does one thing that is a little suspicious, no system can find it.

However, if there are several terrorists that do number of things that are suspicious, data analysis may be able to find them. Is it feasible? That is what TIA wants to test.

Of course the real system will be using much more complex reasoning than what I presented above.


KDnuggets : News : 2002 : n24 : item3 < PREVIOUS | NEXT >

Copyright © 2002 KDnuggets.   Subscribe to KDnuggets News!