Real scientists make their own data –
what about data scientists? Twitter conversation

Twitter conversation sparked by my tweet - Real scientists make their own data - but does it apply to data scientists?

This tweet

Real scientists make their own data - but does it apply to data scientists?

of the Sean J. Taylor post Real scientists make their own data

sparked a lively twitter conversation, involving quite a few data scientists. I think most thought that data scientists both use and create their data, so does it make data scientists half-real scientists ?

26 Jan missphenom @missphenom
@kdnuggets That's the difference between experimental vs. observational. Pure science vs. social science.

26 Jan Peter Skomoroch @peteskomoroch
@kdnuggets If you work for a software or internet co. you can build new product features that collect data & measure behavior - so yes.

26 Jan Hilary Mason @hmason
@peteskomoroch @kdnuggets Agreed! But data scientists at such companies generally already *have* data.

26 Jan Peter Skomoroch @peteskomoroch
@hmason @kdnuggets I think ideally it is a combination, re-mining the same existing data often doesn't offer the same "alpha"

26 Jan Peter Skomoroch @peteskomoroch
@hmason @kdnuggets I think building something that allows you to make new measurements is part of the job

26 Jan Hilary Mason @hmason
@peteskomoroch @kdnuggets Yup. I should have said that if you are at such a company, you are already doing this to gather data.

26 Jan Steven H. Noble @snoble
@peteskomoroch @hmason @kdnuggets proper tracking, data collection, and enforcement of experiments is a big part of my job

26 Jan Sean J. Taylor @seanjtaylor
@peteskomoroch @kdnuggets Pete is exactly right. I think we often settle for what's available instead of adding measurement where it matters

26 Jan Ferenc Huszar @fhuszar
@hmason @peteskomoroch @kdnuggets using better models on smaller but cleverly collected data can often beat big data + naive models

26 Jan Peter Skomoroch @peteskomoroch
.@seanjtaylor @kdnuggets measurement goes beyond instrumentation - asking users direct questions, tagging, shares all generate novel data

26 Jan Peter Skomoroch @peteskomoroch
.@fhuszar @hmason @kdnuggets Months can be wasted poking at insufficent datasets instead of asking 'what new measurement would help?'

26 Jan Max Shron @mshron
@peteskomoroch @fhuszar @hmason @kdnuggets for sure. But if clever proxies get you 80%... and who spends months on one problem anyway?

Curtis Pokrant @curtispokrant
@peteskomoroch: .@fhuszar @hmason @kdnuggets Closing the loop btwn transactions and analysis important. Requires flexible IT architecture.

Beto Borbolla @xcoatl
@peteskomoroch @hmason @kdnuggets create new products based on the data u need to enrich your existing products: "data driven product design

eloy sasot @eloysasot
Science = build knowledge. Shouldn't questions + available data dictate the need of more data or not? @hmason @peteskomoroch @kdnuggets