- Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools - Jan 29, 2020.
Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.
- Reproducibility, Replicability, and Data Science - Nov 19, 2019.
As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.
- Datmo: the Open Source tool for tracking and reproducible Machine Learning experiments - Sep 26, 2018.
As a data scientist, managing environments and experiments is always hard and results in wasted time and effort with all the troubleshooting and lost work. With datmo, you can track your experiments using this common standard and not worry about reproduction of previous work.
- Data Version Control: iterative machine learning - May 11, 2017.
ML modeling is an iterative process and it is extremely important to keep track of all the steps and dependencies between code and data. New open-source tool helps you do that.
- Analytically Speaking Featuring Melisa Buie – On Demand - Apr 6, 2017.
Learn how to keep your audience from struggling to understand your work, why others should review your experimentation process, how to build your experimental muscle, and more.
- What is Academic Torrents and Where is Data Sharing Going? - Oct 26, 2016.
Learn more about Academic Torrents, a platform for researchers to share data consisting of a site where users can search for datasets, and a BitTorrent backbone which makes sharing data scalable and fast.
- Ten Simple Rules for Effective Statistical Practice: An Overview - Jun 23, 2016.
An overview of 10 simple rules to follow to ensure proper effective statistical data analysis.
- We need a statistically rigorous and scientifically meaningful definition of replication - Oct 29, 2015.
Replication and confirmation are indispensable concepts that help define scientific facts. It seems that before continuing the debate over replication, we need a statistically meaningful definition of replication.
- Data Mining Process/Workflow Reproducibility and KNIME - May 1, 2015.
What happens with analytics and data mining workflows when different components change? KNIME approach of keeping the old versions as part of the platform guarantees reproducibility.
- Top KDnuggets tweets, Mar 19-22: Tensor methods for Machine Learning; Tibco survey: Big Data top use cases - Mar 23, 2015.
Tensor methods for #MachineLearning: fast, accurate, scalable, need open-source libs; #DataScience and Reproducibility: Explaining when the experiment does not work; Google #DeepLearning FaceNet is the best ever for recognizing faces; Tibco survey #BigData top use cases: Customer & Experience Analytics, Risk/Threat.
- The Elements of Data Analytic Style – checklist - Mar 4, 2015.
Jeff Leek book "Elements of Data Analytic Style" had a rocket launch, thanks to author course on Coursera. The book includes a useful checklist that can guide beginning data analysts or serve for evaluating data analyses.