- Apache Spark Cluster on Docker - Jul 22, 2020.
Build your own Apache Spark cluster in standalone mode on Docker with a JupyterLab interface.
- Skills to Build for Data Engineering - Jun 4, 2020.
This article jumps into the latest skill set observations in the Data Engineering Job Market which could definitely add a boost to your existing career or assist you in starting off your Data Engineering journey.
- Why and How to Use Dask with Big Data - Apr 15, 2020.
The Pandas library for Python is a game-changer for data preparation. But, when the data gets big, really big, then your computer needs more help to efficiency handle all that data. Learn more about how to use Dask and follow a demo to scale up your Pandas to work with Big Data.
- Five Interesting Data Engineering Projects - Mar 17, 2020.
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
- 7 Data Trends for 2020 (and one non-trend) - Feb 24, 2020.
This article discusses trends that will (and won't) take shape in 2020.
- In Loving Memory of Strictly-Typed Schemas - Feb 20, 2020.
This article addresses one very peculiar manifestation of marketing propaganda in the big data industry that has crippled data engineers across the board — a resolute and methodical undermining of the sanctity of strictly-typed schemas.
- Observability for Data Engineering - Feb 10, 2020.
Going beyond traditional monitoring techniques and goals, understanding if a system is working as intended requires a new concept in DevOps, called Observability. Learn more about this essential approach to bring more context to your system metrics.
- 7 Resources to Becoming a Data Engineer - Jan 7, 2020.
An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.
- Four questions to help accurately scope analytics engineering project - Oct 9, 2019.
Being really good at scoping analytics projects is crucial for team productivity and profitability. You can consistently deliver on time if you work out the issue first, and these four questions can help you prepare.
- The thin line between data science and data engineering - Sep 25, 2019.
Today, as companies have finally come to understand the value that data science can bring, more and more emphasis is being placed on the implementation of data science in production systems. And as these implementations have required models that can perform on larger and larger datasets in real-time, an awful lot of data science problems have become engineering problems.
- Mongo DB Basics - Jun 5, 2019.
Mongo DB is a document oriented NO SQL database unlike HBASE which has a wide column store. The advantage of Document oriented over relation type is the columns can be changed as an when required for each case as opposed to the same column name for all the rows.
- 7 “Gotchas” for Data Engineers New to Google BigQuery - Mar 28, 2019.
Here are some things that might take some getting used to when new to Google BigQuery, along with mitigation strategies where I’ve found them.
- KDnuggets™ News 19:n10, Mar 6: What no one will tell you about data science job applications; The rise of ML Engineering - Mar 6, 2019.
Also most impactful AI trends of 2018: The rise of ML Engineering; How to do Everything in Computer Vision; GANs Need Some Attention, Too; OpenAI GPT-2.
- On Building Effective Data Science Teams - Mar 4, 2019.
We take a look at the qualities that make a successful data team in order to help business leaders and executives create better AI strategies.
- UnitedHealth Group: Sr Manager, Data Engineering [Minnetonka, MN] - Nov 19, 2018.
UnitedHealth Group is seeking a Sr Manager, Data Engineering in Minnetonka, MN. The position will work with our report developers and analyst to help set the vision and deliver data assets that drive insights and opportunities for the digital product teams.
- Things you should know when traveling via the Big Data Engineering hype-train - Oct 8, 2018.
Maybe you want to join the Big Data world? Or maybe you are already there and want to validate your knowledge? Or maybe you just want to know what Big Data Engineers do and what skills they use? If so, you may find the following article quite useful.
- Crunch Data Engineering and Analytics Conference, 29-31 October, Budapest - Sep 21, 2018.
The biggest (and anecdotally best) data engineering and analytics conference in the CEE region, is back! Practical Data Engineering and Data Analytics talks will take over Budapest, 29-31 October. Best part: discounted 3-in-1 tickets for Crunch, Amuse and Impact.
- A Winning Game Plan For Building Your Data Science Team - Sep 18, 2018.
We need to understand the responsibilities, capabilities, expectations and competencies of the Data Engineer, Data Scientist and Business Stakeholder.
- Scientific debt – what does it mean for Data Science? - May 23, 2018.
This article analyses scientific debt - what it is and what it means for data science.
- DSTI: Applied MSc in Data Engineering, Advanced MSc in AI – Learn in France - May 14, 2018.
DSTI launches 2 new programmes for October 2018 entry: Applied MSc in Data Engineering and Advanced MSc in AI - Paris, Nice, and online.
- KDnuggets™ News 18:n12, Mar 21: Will GDPR Make Machine Learning Illegal?; 5 Things You Need to Know about Big Data - Mar 21, 2018.
Also: A Beginner's Guide to Data Engineering - Part II; Introduction to Optimization with Genetic Algorithm; Introduction to Markov Chains; Your free 70-page guide to a career in data science
- A Beginner’s Guide to Data Engineering – Part II - Mar 15, 2018.
In this post, I share more technical details on how to build good data pipelines and highlight ETL best practices. Primarily, I will use Python, Airflow, and SQL for our discussion.
Pages: 1 2
- KDnuggets™ News 18:n05, Jan 31: Feynman Technique to become a Data Scientist; 4 Big Data Trends for 2018; Data Scientist – best job in America - Jan 31, 2018.
Also How To Grow As A Data Scientist; A Beginner Guide to Data Engineering; Exclusive Interview: Doug Laney on Big Data and Infonomics
- A Beginner’s Guide to Data Engineering – Part I - Jan 25, 2018.
Data Engineering: The Close Cousin of Data Science.
Pages: 1 2
- Strata Data Conference – 3 reasons to attend, Sep 25-28, NYC - Sep 7, 2017.
Data is driving business transformation. Come to Strata Data Conference and learn how to turn algorithms into business advantage, build modern data strategies, and spend quality time with experts. Use code KDNU to save.
- What data has to teach us about deep learning? - Sep 4, 2017.
Budapest is calling Data Scientists and Data engineers to CRUNCH Conference, Oct 18-20. CRUNCH will feature talks from Google, Airbnb, Tesla, LinkedIn, Netflix, Uber, and more. Use code KDnuggetsAtCrunch to save.
- 37 Reasons why your Neural Network is not working - Aug 22, 2017.
Over the course of many debugging sessions, I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be useful to you.
Pages: 1 2
- Strata Data Conference, the reunion of data brain trust – KDnuggets Offer - Aug 8, 2017.
Strata Data Conference, the annual reunion of data brain trust, is Sept 25-28 in New York. Early price ends Aug 11 - save more with code KDNU.
- Jimdo: Team Lead Data - Jul 13, 2017.
Shape a technological vision for our data department and passionately manage a great and diverse team of very skilled engineers, scientist and analysts.
- 5 Career Paths in Big Data and Data Science, Explained - Feb 6, 2017.
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
- Why the Data Scientist and Data Engineer Need to Understand Virtualization in the Cloud - Jan 25, 2017.
This article covers the value of understanding the virtualization constructs for the data scientist and data engineer as they deploy their analysis onto all kinds of cloud platforms. Virtualization is a key enabling layer of software for these data workers to be aware of and to achieve optimal results from.
Pages: 1 2
- How to Choose a Data Format - Nov 3, 2016.
In any data analytics project, after business understanding phase, data understanding and selection of right data format as well as ETL tools is very important task. In this article, a very useful and practical set of guidelines is explained covering data format selection and ETL phases of project lifecycle.
Pages: 1 2
- Behind the Dream of Data Work as it Could Be - Sep 13, 2016.
This post is an insider's overview of data.world, and their attempt to build the most meaningful, collaborative, and abundant data resource in the world.
Pages: 1 2
- Data Science, Data Engineering Bootcamp, Seattle, Oct 10-14 - Aug 8, 2016.
Data Science Dojo will be teaching a comprehensive five-day Data Science & Data Engineering Bootcamp in Seattle on October 10 - 14. Register today!
- Connecting Data Systems and DevOps - Jun 17, 2016.
This post will explain why anyone transforming their company into a data-driven organization should care about software development best practices, even if they don’t consider themselves a software company.
- Building Data Systems: What Do You Need? - Jun 3, 2016.
This post shares some insight gained through years of building data-powered products, and discusses the capabilities you need to have in place in order to successfully build and maintain data systems and data infrastructure.
Pages: 1 2
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department - Mar 28, 2016.
An exploration of data science team building, with insight into why engineers should not write ETL, and other not-so-subtle pieces of advice.
Pages: 1 2 3
- ZocDoc: Engineering Manager, Data Engineering - Feb 3, 2015.
Run Data Engineering team to create a hardcore data and analytics infrastructure for our business - working with our teams of data scientists and business analysts to transform data into information enabling the next generation of ZocDoc insight and products.
- Civis Analytics: Data Scientist – Engineering (Senior and Junior roles) - Oct 7, 2014.
Founded by a team from Obama 2012, we are helping companies, non-profits, and campaigns leverage their data. Integrate, scale, and optimize our team data science methods, techniques, and best practices to run on very large datasets at high speeds.
- Health Integrated: Manager of Data Engineering - Sep 18, 2014.
Responsible for all aspects of data exchange processes and software in collaboration with developers, business analysts, product managers, and program managers.