Big Data Architecture: A Complete and Detailed Overview

Data scientists may not be as educated or experienced in computer science, programming concepts, devops, site reliability engineering, non-functional requirements, software solution infrastructure, or general software architecture as compared to well-trained or experienced software architects and engineers.



Scalable software header

As discussed in a previous article that I wrote on the subject of data science as a field, and the role of data scientists, I discussed what I call the four "pillars" of data science expertise. These pillars of expertise include business domain, statistics and probability, computer science and software programming (aka hacking skills), and written and verbal communication.

Based on this, data scientists are expected to have a strong computer science foundation and solid programming skills. That said, they may not be as educated or experienced in computer science, programming concepts, devops, site reliability engineering, non-functional requirements, software solution infrastructure, or general software architecture as compared to well-trained or experienced software architects and engineers.

This is a summary (with links) of a three-part article series that's intended to be an in-depth guide to help fill in any knowledge gaps that the reader may have regarding the concepts and fields listed above.

The first article of the series is focused on different software solution application types such as enterprise, SaaS, IoT, big data, and so on. It also includes a thorough discussion of the so-called cloud, common cloud-based architectural components, functional and non-functional requirements, and the concept of separation of concerns (SOC).

The second article covers architectural patterns and designs, the concepts and protocols behind network communication and information transfer, and ends with a discussion of application programming interfaces (APIs) and software development kits (SDKs).

The third and final article brings together all of the concepts and techniques discussed in the first two articles, and extends them to include big data and analytics-specific application architectures and patterns. This includes a detailed discussion of typical components and stages in the so-called data pipeline, including data storage and modeling; data acquisition, ingestion, and integration; data availability, performance, and scalability; data processing and movement; and finally data access, querying, analytics, and business intelligence (BI).

After reading the three posts in the series, you will have been thoroughly exposed to most key concepts and characteristics of designing and building scalable software and big data architectures.

Cheers and enjoy!

 
Alex Castrounis is the founder and CEO of Why of AI and the author of AI for People and Business. He is also an adjunct for Northwestern University’s Kellogg / McCormick MBAi program.

 
Related: