7 Resources to Becoming a Data Engineer
An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.
Date Engineering is one of the fastest growing and in-demand occupations among Data Science practitioners. The ability to collect, store, query, clean and manipulate databases fast, efficiently and effectively becomes more important as the data we generate gets bigger and bigger each day as we consume more technological services.
According to Statista, the big data market by volume is expected to grow from 26 zetabytes in 2017 to 175 zetabytes by 2025. This represents 573% increase from 2017 to 2025. Prior to 2017, the big data market by volume grew 800% from 2010 to 2016.
For beginners, Dataquest may be a good starting point before going into the the other qualifications especially the cloud certifications.
Azure Data Engineers design and implement the management, monitoring, security, and privacy of data using the full stack of Azure data services to satisfy business needs. This certification is the final stage after a number of training modules have been successfully completed. Each module trains the user to become skilled in using Azure's suite of products to successfully become a data engineer on the platform. Each learning module takes less than one day and should not take more than 10 hours depending on the commitment of each person in any given time.
Level: Beginnner (with pre-requisites)
The Udacity Data Engineering course is a brand new course crated to help bridge the skills gap and cater to the growing demand from companies that require more advanced knowledge of databases along with efficient and scalable data manipulation. The course is slated to begin in January 15th 2020 and has an estimated time of completion of 5 months given a commitment of 5 hours per week.
The course goes on to teach in the areas of SQL, Spark, Data Warehousing on AWS, Apache Airflow etc. There are numerous options in today's market to create your database whether on-premise or in the Cloud.
Before taking the above certification exam, you might want to take their recommended training course with Qwiklabs: Data Engineering on Google Cloud Platform. This training course is also best suited for someone with familiarity in the cloud computing space. Both the certification and training are short stints and go on to teach you about using Hadoop, Google BigQuery, and building scalable machine learning applications on GCP.
The course begins with an introduction to Python and moves onto SQL which develops further into learning how to use PostgresSQL and Data Structures and Algorithms. It seems to have a more breadth view of the topics and centers around using Python and SQL. This is a good course for someone beginning their journey into the data engineering landscape but because of the course structure it seems to be useful to have some basic Python knowledge at the least.
The course by the University of California San Diego's course on Coursera centers around using the Hadoop framework and Spark and applying these big data handling techniques in a machine learning instance at the end. There is no programming experience required according to the course description. The course has been made in partnership with Splunk.
There are specific hardware and software requirements for this course.
AWS being the largest cloud provider by amount of services and revenue will also be an important player in the data engineering space.
A new version of the AWS Certified Big Data – Specialty exam will be available in April 2020 with a new name, AWS Certified Data Analytics – Specialty.
Because this certification is for advanced users, it requires you to have a few years experience using AWS and having other certifications such as AWS Certified Cloud Practitioner
Level: Intermediate to Advanced
Andreas Kretz created this book to share his knowledge of data engineering loosely based on his data science workflow. He may be more well-known for his podcast Plumbers of Data Science where he talks and educates us about data engineering topics live.
He is very active on LinkedIn and is quickly becoming a prominent public figure for anyone wanting to become or expand their knowledge on this topic.
- The thin line between data science and data engineering
- The Last SQL Guide for Data Analysis You’ll Ever Need
- Four questions to help accurately scope analytics engineering project