- The Prefect Way to Automate & Orchestrate Data Pipelines - Sep 13, 2021.
I am migrating all my ETL work from Airflow to this super-cool framework.
- How to build a DAG Factory on Airflow - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
- 6 Web Scraping Tools That Make Collecting Data A Breeze - Feb 25, 2021.
The first step of any data science project is data collection. While it can be the most tedious and time-consuming step during your workflow, there will be no project without that data. If you are scraping information from the web, then several great tools exist that can save you a lot of time, money, and effort.
- A Layman’s Guide to Data Science. Part 3: Data Science Workflow - Jul 6, 2020.
Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others.
- Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools - Jan 29, 2020.
Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
- How A Data Scientist Can Improve Productivity - May 25, 2017.
Data Science projects involve iterative processes and may need changes in data at every iteration. But Data versioning, data pipelines and data workflows make Data Scientist’s life easy, let’s see how.
- Dataiku: The Complete Data Sheet - Apr 20, 2017.
Whether your every day tool is Scala, Python, R, or Excel, you can now use one tool - Dataiku - to transform raw data to predictions without the hassle. Discover the platform!
- Grunion, Query Optimization Tool for Data Science and Big Data - Mar 14, 2017.
Grunion is a patent-pending query optimization, translation, and federation framework built to help bridge the gap between data science and data engineering teams. Read more to request access.
- Analyzing and Visualizing Flows in Rivers and Lakes with MATLAB - Jul 20, 2015.
ADCPs and VMT have increased the pace of studies that rely on flow data. Find out how these toolkits from MathWorks are revolutionizing the analysis and visualisation processes.
Pages: 1 2
- Data Workflows for Machine Learning - Apr 20, 2014.
Paco Nathan compares several open source frameworks for Machine Learning workflows, including KNIME, IPython Notebook and related libraries, Cascading, Cascalog, and Spark/MLbase, and proposes 9 criteria to evaluate the best alternatives.