In what I hope will be a regular feature, I’m going to use this space to highlight a relevant book I’ve recently read along with some commentary. I don’t pretend to elevate the below post to the height of a true book review. More than anything, I’m documenting important points I’d like to remember – my own personal bookmarks and post-it notes, if you will. In fact, if the book is no good, I’m not going to spend time writing about it.
First up: Weapons of Math Destruction by Cathy O’Neil
In the current big data world with its “make the world a better place” hype of machine learning, predictive algorithms, and artificial intelligence, Weapons of Math Destruction paints a very necessary gray cloud above all the cheery platitudes. Those of us in relevant fields would be wise to pump the brakes a bit. Cathy O’Neil presents some very telling anecdotes and valid warnings about an over-reliance on computer models.
Given the title, it’s necessary to define WMDs – and by contradiction define good models – which O’Neil does by way of 3 rules:
- They are transparent / they are a blackbox
- They are continuously updated / they ignore outcomes
- They are limited in scope / they are widespread
I find these 3 rules to be succinct and incredibly useful but not in just the obvious way. I can think of many very specific work scenarios where models and their output have been discussed. Those discussions would have been improved if stakeholders were frequently reminded of the limitations of a model in question, or if any potential for negative feedback loops was consistently highlighted and addressed.
Some highlights…
On bad models, from the introduction:
They define their own reality and use it to justify their results. This type of model is self-perpetuating, highly destructive — and very common.
And later:
… many poisonous assumptions are camouflaged by math and go largely untested and unquestioned.
A refreshing alternative definition to the term “model” from Chapter 1:
Models are opinions embedded in mathematics.
Vigilance is required to monitor the impact of any model, but especially one built using proxies when direct data isn’t possible. From Chapter 3:
However, when you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent.
Particularly painful to read shortly after the 2016 presidential election and chillingly prescient, from Chapter 10, The Targeted Citizen
… can Facebook game the political system?
In describing a newsfeed experiment Facebook conducted leading up to the 2012 presidential election, a researcher altered the newsfeed for ~2mm people.
These people got a higher proportion of hard news, as opposed to the usual cat videos…
I find that particular experiment summary especially troubling given Facebook’s own contention after the election that they don’t believe their newsfeed had a role in influencing the election. You can’t claim in an experiment to be able to distinguish what constitutes “hard news” then later wash your hands of it. I haven’t been on Facebook since reading that chapter.
Telling quote from Judd Antin, a former research manager at Facebook, in the CNNMoney link shown above:
When data-driven becomes data-myopic, we all suffer… I worry that Facebook’s decision-making has lost its humanity. And that’s frightening given the central role it plays in our world.
Sounds like a WMD to me.
And finally, while reading Weapons of Math Destruction, I stumbled on a series of posts about the book from members of the NYC-based research institute Data & Society. They read as mini-book reports focusing on different themes relevant to each members’ interests and research areas. Some highlights below:
From Mark Van Hollebeke’s Shining a Light on the Darkness
This book should be required reading for all data scientists and for any organizational decision-maker convinced that a mathematical model can replace human judgment. Not because we should reject data science and algorithmically-driven analytics, but because of how important carefully designed and principled data science is.
And, to add some gravitas to his position…
Weapons of Math Destruction is The Jungle of our age.
From Angèle Christin’s Models in Practice, paraphrasing a point from the original work,
… most algorithms function as self-fulfilling prophecies: they create the reality that they purport to describe…
Again, from Christin’s response, and something that I find to be a really interesting nuance:
As organizational sociologists and ethnographers know well, there is often a discrepancy between what workers and organizations say they do and what they actually do.
The implication is that the existence of a potential WMD and the alarm bells that should sound do not mean necessarily that they function as a true WMD. The potential for harm may be there, but it may not be used practically the way one might envision in the worst case scenario, or possibly at all. This has its own implications for how organizations behave separate from the existence of these models.
From Mark Ackerman’s Safety Checklists for Sociotechnical Design:
We have often seen that new forms of industrialization bring new forms of abuse, and these must be exposed.
From Anne L. Washington’s Sausage, politics, and data predictions? (Big fan of the Oxford comma, incidentally)
Models, O’Neil reminds us, are dynamic instruments that require tinkering to maintain excellence.