Description
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they’re also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of these architectures to help data professionals understand the pros and cons of each.
James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You’ll learn what data lakehouses can help you achieve, as well as how to distinguish data mesh hype from reality. Best of all, you’ll be able to determine the most appropriate data architecture for your needs. With this book, you’ll:
- Gain a working understanding of several data architectures
- Learn the strengths and weaknesses of each approach
- Distinguish data architecture theory from reality
- Pick the best architecture for your use case
- Understand the differences between data warehouses and data lakes
- Learn common data architecture concepts to help you build better solutions
- Explore the historical evolution and characteristics of data architectures
- Learn essentials of running an architecture design session, team organization, and project success factors
Free from product discussions, this book will serve as a timeless resource for years to come.
From the Preface
I’ve been in information technology (IT) for nearly 40 years. I’ve worked at companies of all different sizes, I’ve worked as a consultant, and I’ve owned my own company. For the last 9 years, I have been at Microsoft as a data architect, and for the last 15 years, I have been involved with data warehousing. I’ve spoken about data thousands of times, to customers and groups.
During my career, I have seen many data architectures come and go. I’ve seen too many companies argue over the best approach and end up building the wrong data architecture—a mistake that can cost them millions of dollars and months of time, putting them well behind their competitors.
What’s more, data architectures are complex. I’ve seen firsthand that most people are unclear on the concepts involved, if they’re aware of them at all. Everyone seems to be throwing around terms like data mesh, data warehouse, and data lakehouse—but if you ask 10 people what a data mesh is, you will get 11 different answers.
Where do you even start? Are these just buzzwords with a lot of hype but little substance, or are they viable approaches? They may sound great in theory, but how practical are they? What are the pros and cons of each architecture?
None of the architectures discussed in this book is “wrong.” They all have a place, but only in certain use cases. No one architecture applies to every situation, so this book is not about convincing you to choose one architecture over the others. Instead, you will get honest opinions on the pros and cons of each architecture. Everything has trade-offs, and it’s important to understand what those are and not just go with an architecture that is hyped more than the others. And there is much to learn from each architecture, even if you don’t use it. For example, understanding how a data mesh works will get you thinking about data ownership, a concept that can apply to any architecture.
This book provides a basic grounding in common data architecture concepts. There are so many concepts out there, and figuring out which to choose and how to implement them can be intimidating. I’m here to help you to understand all these concepts and architectures at a high level so you get a sense of the options and can see which one is the most appropriate for your situation. The goal of the book is to allow you to talk intelligently about data concepts and architectures, then dig deeper into any that are relevant to the solution you are building.
There are no standard definitions of data concepts and architectures. If there were, this book would not be needed. My hope is to provide standard definitions that help everyone get onto the same page, to make discussions easier. I’m under no illusion that my definitions will be universally accepted, but I’d like to give us all a starting point for conversations about how to adjust those definitions.
I have written this book for anyone with an interest in getting value out of data, whether you’re a database developer or administrator, a data architect, a CTO or CIO, or even someone in a role outside of IT. You could be early in your career or a seasoned veteran. The only skills you need are a little familiarity with data from your work and a sense of curiosity.
For readers with less experience with these topics, I provide an overview of big data (Chapter 1) and data architectures (Chapter 2), as well as basic data concepts (Part II). If you’ve been in the data game for a while but need to understand new architectures, you might find a lot of value in Part III, which dives into the details of particular data architectures, as well as in reviewing some of the basics. For you, this will be a quick cover-to-cover read; feel free to skip over the sections with material that you already know well. Also note that although the focus is on big data, the concepts and architectures apply even if you have “small” data.
This is a vendor-neutral book. You should be able to apply the architectures and concepts you learn here with any cloud provider. I’ll also note here that I am employed by Microsoft. However, the opinions expressed here are mine alone and do not reflect the views of my employer.
I wrote this book because I have an innate curiosity that drives me to comprehend and then share things in a way that everyone can understand. This book is the culmination of my life’s work. I hope you find it valuable.
Book details
- Author : James Serra
- Publisher : O’Reilly Media
- Publication date : March 12, 2024
- Edition : 1st
- Print length : 275 pages
- Language : English
- Format : Paperback







