H2O World 2015 – Day 3 Highlights
Highlights from talks delivered by machine learning experts from Fast Forward Labs, H20.ai, Kaiser and Macy's at H2O World held in Mountain View.
He listed common problems in clinical prediction:
- Too many possible useful parameters
- Easy to overfit the data
- Relationships often are not linear
- Missing almost never random
- "AI" often is “augmented” , not “artificial” intelligence
He demonstrated how all above characteristics are present in Titanic dataset which is used to try to predict who lives and who dies.
Daqing Zhao, Director of Advanced Analytics, Macy's talked about advanced analytics at Macy's. He stated that big data is not about data, it is about big analysis and solutions. Division of expertise is inevitable. Modeling in Big Data era has following challenges:
- Modeling needs to scale
- Timeliness of models
- It takes time to integrate - make sure right data is collected; think model metrics
- Test and Experimentation - theory may or may not be right; let the experiment decide.
In order to solve the above challenges, he proposes following solutions:
- Big data warehouse solutions
- Separation of concerns: Solution complexity, Data complexity, Variability of requirements, standard data mining algorithms, etc.
- Scalable modeling tools - Out of sample testing, cross validation; automated model optimization tools
- Best practices in modeling - understand how data is collected, what data can and cannot be collected; good ideas are not necessarily complicated
After detailing some of the projects at Macy's, he concluded by mentioning that data science is not about data, but domain solutions. Also, data is not clean until thoroughly analyzed.
Related: