I was honored when Mona Khalil invited me to write the foreword for her excellent new book, Effective Data Analysis, which I recommend everyone buy in triplicate.
Early in my career, I was conducting a complex analysis as part of a consulting project [1]. I had spent weeks collecting and cleaning data, and putting together a model that pointed to a clear recommendation.
As I walked my manager through it, however, his feedback was somewhat unexpected: he thought I did a great job on my analysis, but wanted me to change my methodology, explaining that another angle would “better fit the story the client wants to hear.”
What?! I saw my job as a seeker of objective truth, out to understand the world through exacting measurement. Changing my approach because it fits a story someone wanted to hear felt… weird.
But surprisingly, looking back, I’m not sure it was wrong. The alternative approach was completely valid – it’s not like we were faking data. And just because it was what the client wanted to hear didn’t make it inaccurate; it was just another way to look at the world.
What I didn’t know then was that these kind of situations are incredibly common in the world of data analysis. There’s rarely a straightforward “right” answer to the complex questions that arise in organizations. One often finds situations with incomplete data, varying assumptions, and unclear conclusions. Fusing objective measurement with subjective interpretation and thoughtful communication is a much larger part of the role of a data practitioner than most give credit, and it took me years to fully understand.
But that learning curve would have come much faster if I had access to Mona Khalil's new book, Effective Data Analysis. That’s not because it’s full of helpful technical pointers, methodologies, and exercises (which it certainly is) but rather because it does such an effective job breaking down how to think about the craft of data analysis, in all its nuance.
Mona takes the time to unpack topics like defining metrics, and the theory of measurement. She breaks down how to think about results that don’t support a hypotheses, and how to present findings to stakeholders (including how to debug the kind of situation I found myself in so many years ago). And she shares the one lesson that every aspiring data scientist needs to learn: ”simpler is often better.”
This toolkit is especially important as we enter the exciting and exotic world of AI. It turns out that LLMs can write code pretty well; I certainly wouldn’t recommend aspiring analysts to spend a lot of time memorizing arcane syntax, but I would encourage them to study how to properly interpret, contextualize, and communicate results – the kind of things that a model may never come to do as well as a human.
Reading through this book, I found myself reflecting back on lessons from a career in data, and finding a new appreciation for the science – and art! – of effective analysis. I believe practitioners of any experience and technicality can benefit from this book, and avoid common pitfalls so they can make new, unique mistakes of their own. If you missed the links above, you can get Mona's book here. 📕
[1] if you were confused as to the seemingly-random pricing of Wi-Fi on a major US airline in the early 2010s, I was partly to blame.