Boston AnalyticsWeek Panel Highlights: Next Big Thing in Big Data
Tags: AnalyticsWeek, Automating, Big Data, Boston-MA, Gregory Piatetsky, Next Big Thing, Panel, Paul Sonderegger, Privacy
Boston AnalyticsWeek opens with a vigorous panel discussion, which debates the next "Big Thing" in #BigData, Replacing data scientists by an algorithm, and is Privacy a big obstacle to Big Data?
On March 24, I moderated a panel at
Boston's first AnalyticsWeek: Big Data And Analytics Unconference.
The week-long event, ably organized by Cognizeus CEO Vishal Kumar, @VishalTx, was held over 5 evenings, focusing on different topics each day.
First day started with the keynote address by Oracle's Paul Sonderegger, @PaulSonderegger on Big Data Analytics.
Watch the keynote at http://pxl.me/awuak
After a delicious pizza and networking break, there was an expert panel with
- Gregory Piatetsky (KDnuggets), moderator
- David Jegen (Devonshire Investors)
- Anand S. Rao (PWC)
- Chip Hazard (Flybridge Capital Partners)
- Paul Markowitz (Bain & Company)
After the panelists introduced their experience with Big Data and Analytics, I asked them 3 questions. Here are selected answers, thanks to the excellent twitter stream of @AnalyticsWeek.
Q1. Has Big Data reached its Hype Peak, as projected by Gartner? And what is the next "Big Thing" in Big Data ?
- PaulM: We have not reached a peak yet. Next is to try to tease out signal from noise
- ChipH: Next thing is using today's tools and techniques to do real time analytics to be able to act quickly
- ChipH: next thing is how to be quicker at taking action of data
- AnandR: The word 'big data' is becoming bigger and bigger; ex. from 3V to 4V (veracity as well).
- AnandR: next thing: what decision can you make and how to make better using data (data size is immaterial)
- AnandR: change in mindset: data informs model and model informs data
- DavidJ: We have been in phase of infrastructure and tools. Tableau and others are in low penetration still
Q2. We have all seen McKinsey numbers about projected shortage of data scientists. Some companies want to alleviate this shortage by creating automated analytics solutions intended for non-technical people, "a data scientist in a box". Will data scientists be replaced by automated algorithms?
- DavidJ: Humans do what humans do really well. Machines do what they do really well. There will be a coexistence
- DavidJ: PayPal had problems and put machines on it, realized that you need humans and now they coexists
- DavidJ: Machines to curate data. When do you bring in a human to curate, to train the machine. Humans are artist.
- AnandR: In AI, we have proven that it is really hard to replace a human brain.
- ChipH: The data preprocessing part will be handled by machine. The key is to ask the right question...
- ChipH: ... the right question is really hard to find and that is where we need a human.
- PaulM: data scientist will be doing less grudge work (machines handles that), instead they will be doing the fun
- @KDnuggets: Kaggle is looking to build vertical solution starting with energy
Q3. Big Data and Privacy and fundamentally opposing forces. In today's increasingly digital world it is difficult to remain anonymous. Facebook recently reported a face recognition algorithm which has 97% accuracy. Anonymized data from Netflix Prize was matched by clever researchers to IMDB data, and resulting de-anonynimization of a few people was enough to cancel Netflix Prize 2. Is losing privacy a big obstacle to Big Data, or will people get used to no privacy?
- DavidJ: People are willing to give up privacy: credit cards, Facebook etc. The key is to prevent illegal use of data
- DavidJ: Tough to enforce. Security and privacy is similar. Security is trusting someone Privacy is trusting no one
- AnandR: Privacy, security are closely linked. Also, who owns the data, your data about yourself?
- AnandR: It is probably a few years away until we figure out legally who owns the data.
- AnandR: As we humans are getting more comfortable about sharing, we will share online more.
- ChipH: Privacy at SXSW for consumers are basically non-existant at the consumer level.
- ChipH. How many do actually use Tor? Probably not many, privacy is more of a red herring.
- PaulM: brand are not so interested in learning about you. Instead, they want to learn about people like you.
- PaulM: we give our data to the companies. The companies give better products back to us:
- ChipH: Nest has opened up their API - very cool. We like that.
- AnandR: We benefit from give up some privacy, ex Amazon and Netflix recommendations.
There were also questions from the audience, especially on
Data Science education.
- AnandR: Where data science education needs to go is 'domain knowledge' ex healthcare or financial services
- AnandR. Need better data visualization education.
- DavidJ: Input of data, preprocessing, analytics, story telling (visualization) are different roles.
- AnandR: Analytics is going toward of getting bite-sized chunks when you need it, instead of all data at one time.
- AnandR. Domain knowledge is chunked in to bite-sized pieces - this is a really good thought.
- ChipH. Nate Silver is not the best data scientist, but he is probably the best communicator among data scientists.
Check also many links to data science courses, certificated and education on www.kdnuggets.com/education/
Question from the audience: What is the impact of open source on analytics
- ChipH: Hadoop, R etc are hugely impacting on big data. Open Source is hugely economical.
What is the division of roles between data scientist and the business analyst?
- AnandR: A perfect data scientist is impossible to find. You can find perhaps 2 or 3 roles in one person
- AnandR: Within a small group, you can cover all data science roles but one person cannot be expert on all roles.
You can watch the panel discussion here.
Perhaps 150-200 people have come to the first evening and gave an excellent start to the Analytics Week.