KDnuggets Home » News » 2011 » Sep » Software » Hilary Mason Wants To Get You Started With Big Data  ( < Prev | 11:n23 | Next > )

Hilary Mason Wants To Get You Started With Big Data


 
  
First, set up a proper env: Linux, Python and other tools you can find on her Github page.


ReadWriteHack, By David Strom / September 20, 2011

Hilary Mason I spent part of this week with Hilary Mason, one of the smartest people that I know in Big Data. She works as the Chief Scientist for Bit.ly and has a wealth of skills at her fingertips that bridge computer science and mathematics. Plus, she is used to facing largely male audiences and just being the smartest person in the room. She was speaking at The Strange Loop conference in St. Louis this week, which should definitely be on your radar for next year if you are interested in this topic or want to broaden your programming skills.

Mason outlined in a series of workshops the tools you need to get started with manipulating Big Data and understanding the basics of machine learning, something she does everyday as she sifts through each one of those shortened URLs that we all create furiously.

The first step is setting up a proper environment, and for Mason it is a Linux machine with a variety of tools on it that you can find on her Github page linked above. She is a Python programmer, and so this reflects that interest. She uses Python with JSONview's Chrome extension, NLTK, numpy, Pycluster, hcluster, and mathplotlib. You can use most of these tools on other OSs too.

Second, you need to obtain a few test data sets that you can start to manipulate. Even if you aren't drinking out of the Bit.ly data fire hose, there are ways to get access to lots of great data around the Internet. Mason mentioned a few places, including:

Third, you need to start thinking about how to make your data sets smaller. ...

Then comes the fun part, exploring your data. ...

Read more.


 
Related
Data Mining Software

KDnuggets Home » News » 2011 » Sep » Software » Hilary Mason Wants To Get You Started With Big Data  ( < Prev | 11:n23 | Next > )