- Development & Testing of ETL Pipelines for AWS Locally - Aug 2, 2021.
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.
- dbt for Data Transformation – Hands-on Tutorial - Jul 28, 2021.
The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.
- How to pitch to VCs, explained: The Deck We Used to Raise Capital For Our Open-Source ELT Platform - May 21, 2021.
Winning seed funding from venture capitalists is a daunting task, and the pitch is key. Learn how one effective slide deck resulted in a successful early funding round for an open-source start-up, Airbyte.
- KDnuggets™ News 21:n15, Apr 21: The Most In-Demand Skills for Data Scientists in 2021; How to organize your data science project - Apr 21, 2021.
The Most In-Demand Skills for Data Scientists in 2021; How to organize your data science project; You may have heard about Simpson's paradox, but do you know the other 2? Read Top 3 Statistical Paradoxes in Data Science; ETL in the Cloud; Data Profession Job Satisfaction: Beware Of The Drop; and more.
- ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation - Apr 15, 2021.
Today, organizations are increasingly implementing cloud ETL tools to handle large data sets. With data sets becoming larger by the day, unified ETL tools have become crucial for data integration needs of enterprises.
- What’s ETL? - Apr 2, 2021.
Discover what ETL is, and see in what ways it’s critical for data science.
- Introducing dbt, the ETL and ELT Disrupter - Mar 17, 2021.
Moving and processing data is happening 24/7/365 world-wide at massive scales that only get larger by the hour. Tools exist to introduce efficiencies in how data can be extracted from sources, transformed through calculations, and loaded into target data repositories. However, on their own, these tools can introduce some restrictions in the processing, especially for the needs of data analytics and data science.
- The Best Tool for Data Blending is KNIME - Jan 13, 2021.
These are the lessons and best practices I learned in many years of experience in data blending, and the software that became my most important tool in my day-to-day work.
- KDnuggets™ News 20:n46, Dec 9: Why the Future of ETL Is Not ELT, But EL(T); Introduction to Data Engineering - Dec 9, 2020.
Learn why the future if ETL is not ELT, but EL(T) and what does that mean; Read a great intro to Data Engineering; Get expert opinions on the main developments in 2020 and key trends in 2021 in AI, Data Science, Machine Learning; NoSQL for Beginners; and more.
- Why the Future of ETL Is Not ELT, But EL(T) - Dec 4, 2020.
The well-established technologies and tools around ETL (Extract, Transform, Load) are undergoing a potential paradigm shift with new approaches to data storage and expanding cloud-based compute. Decoupling the EL from T could reconcile analytics and operational data management use cases, in a new landscape where data warehouses and data lakes are merging.
- Find Your Perfect Fit: A Quick Guide for Job Roles in the Data World - Apr 23, 2020.
Data related positions are considered the hottest in the job market during the last couple of years. While everyone wants to join the party and enter this fascinating field, it is essential to first get an understanding. In this quick guide, I’ll do my best to dispel the confusion by crystalizing the essence of the different positions.
- Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data? - Aug 19, 2019.
What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.
- The Role of the Data Engineer is Changing - Jan 10, 2019.
The role of the data engineer in a startup data team is changing rapidly. Are you thinking about it the right way?
- UnitedHealth Group: Senior ETL Developer (Horsham, PA) - Aug 17, 2018.
Seeking a Senior ETL Developer with advanced ETL Architecture/Development background, to be a primary contributor in developing, testing and deploying key data warehouses, data marts and will be working with cutting edge technology.
- From Insights to Value in 90 Minutes – with Snowflake, July 12 Webinar - Jul 2, 2018.
Learn How to Accelerate Data Warehouse Modernization at a Low Cost.
- ETL vs ELT: Considering the Advancement of Data Warehouses - May 22, 2018.
The traditional concept of ETL is changing towards ELT – when you’re running transformations right in the data warehouse. Let’s see why it’s happening, what it means to have ETL vs ELT, and what we can expect in the future.
- Loading Terabytes of Data from Postgres into BigQuery - Apr 9, 2018.
Despite the fact that an ETL task is pretty challenging when it comes to loading Big Data, there’s still the scenario in which you can load terabytes of data from Postgres into BigQuery relatively easy and very efficiently.
- A Beginner’s Guide to Data Engineering – Part II - Mar 15, 2018.
In this post, I share more technical details on how to build good data pipelines and highlight ETL best practices. Primarily, I will use Python, Airflow, and SQL for our discussion.
Pages: 1 2
- A Beginner’s Guide to Data Engineering – Part I - Jan 25, 2018.
Data Engineering: The Close Cousin of Data Science.
Pages: 1 2
- Are Data Lakes Fake News? - Sep 6, 2017.
The quick answer is yes, and the biggest problem is that the term “Data Lakes” has been overloaded by vendors and analysts with different meanings, resulting in an ill-defined and blurry concept.
- How to Choose a Data Format - Nov 3, 2016.
In any data analytics project, after business understanding phase, data understanding and selection of right data format as well as ETL tools is very important task. In this article, a very useful and practical set of guidelines is explained covering data format selection and ETL phases of project lifecycle.
Pages: 1 2
- Automating Data Ingestion: 3 Important Parts - Sep 9, 2016.
In the day and age of ‘Big Data”, data ingestion has to be automated on some level. How best to automate it?
- Choosing Tools for Data ETLs - Aug 9, 2016.
Which tool should I use for my data pipelines? Get some advice from a data scientist recently having gone through this pipeline tool selection process.
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department - Mar 28, 2016.
An exploration of data science team building, with insight into why engineers should not write ETL, and other not-so-subtle pieces of advice.
Pages: 1 2 3
- Data Lake Plumbers: Operationalizing the Data Lake - Feb 18, 2016.
Gain insight into data lakes, their benefits, when they are appropriate, and how to operationalize them. How do they compare to the data warehouse?
- 3 Reasons Big Data Projects Fail - Aug 24, 2015.
Download Lavastorm whitepaper: How to Overcome 3 Key Big Data Challenges - how to operationalize the results, how to enable ETL to handle complexities of Big Data, and more.
- Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools - Jun 16, 2015.
We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.