Print Issue

August 2017

About this Issue

Algorithms rule more and more of the world around us. They screen school and job applications. They determine who qualifies for loans and insurance. They trigger audits and investigations. But what’s going on under the hood? Are algorithms impersonal, and thus impartial and fair? Or can they be programmed, intentionally or otherwise, to replicate human biases? If so, then using algorithms leaves us worse off. A veneer of fairness now covers our systemic biases, making them harder to argue against or even discover.

That’s precisely the charge that Cathy O’Neil levels in her lead essay this month. The author of Weapons of Math Destruction takes us on a brief tour of how algorithms can mislead in teacher evaluations, debt collection, and several other important areas of life. She invites us to greater skepticism about artificial intelligence and recommends policy solutions that would curb the dangers of the algorithm-driven life.

Responding to her this month we have Caleb Watney of the R Street Institute, freelance journalist and former WIRED senior editor Laura Hudson, and Cato Institute Senior Fellow Julian Sanchez. Each will respond to O’Neil with an essay, and conversation among the four will continue through the month. Comments are also enabled through the month, and we invite readers to contribute to the discussion as well.

Lead Essay

Why We Need Accountable Algorithms

AI and machine learning algorithms are marketed as unbiased, objective tools. They are not. They are opaque mechanisms of bureaucracy and decisionmaking in which old-fashioned racist, sexist, and classist biases are hidden behind sophisticated technology, usually without a system of appeal. As their influence increases in society, we face a choice. Do we ignore their pernicious effects, or do we understand, regulate, and control the biases they exert? If we want them to represent transparent fairness, freedom, and consistency in an efficient, cost-saving manner, we must hold them accountable somehow.

What is an algorithm? For my purposes I simply mean a system trained on historical data and optimized to some definition of success. We even use informal algorithms, defined this way, in our own heads. The dinners I make for my family on a daily basis require the data of the ingredients in my kitchen and the amount of time I have to cook. The way I assess whether a meal is “successful” is to see, afterwards, if my kids ate their vegetables. Note that I curate the data - I actually don’t include certain foods, like ramen noodles or sprinkles, into my ingredients list. I also have a different definition of success than my kids would have. Over time, the succession of meals optimized to my definition of success varies wildly from the one my kids would have used. There are two obvious ways that I have inserted my agenda into my algorithm. Indeed any algorithm builder does this – they curate their data, and they define success and likewise measure the cost of failure.

In general, people are intimidated by algorithms and don’t question them the way they should. Thousands of teachers have been told “it’s math, you wouldn’t understand it,” regarding administrators’ statistical value-added model for teachers, even though teachers’ tenure or job status depend on the results. Criminal defendants likewise have no recourse to understand or protest against their recidivism risk scores, used by the court to decide whether a criminal defendant’s profile matches someone who can be expected to return to prison after leaving, even though a higher score can mean a longer prison term. The people targeted by these algorithms – usually in the form of scoring systems – have very little power, and typically no recourse to understand or interrogate their scores.

Algorithms don’t make things fair. They embed historical practices and patterns. When the medical school at St. George’s Hospital in London automated their application process, they noted that it came out both sexist and xenophobic. That surprised them, since they’d expect a computer wouldn’t be discriminatory. But it happened, of course, because the historical data they fed to the algorithm to train it was, itself, sexist and xenophobic. The algorithm of course picked up on this pattern and propagated it.

In general, there is unintentional, implicit bias in all kinds of ways in our culture and our processes. We can expect biases to be automated when we feed historical data into training these processes – even when that historical data is very recent. Until we consistently rid our society and ourselves of implicit bias, we cannot trust algorithms to be clear of it. Said another way: all algorithms are likely racist, sexist, and xenophobic unless they’ve been treated not to be. Why assume any characteristic of a complicated mathematical metric or measure is inherently a specific setting, after all, especially when we know it tends not to be? That’s like assuming the IQ of a given person is 100, even though you’re in a place overrun by known geniuses.

Of course, the above argument was made under the assumption that bias and discrimination is wrong. That’s a moral choice, which brings us to our next point.

There is no such thing as a morally neutral algorithm. We embed ethics into our objective functions. A great example of how we do this – even without thinking about it – comes from an algorithm that predicts child abuse. I was talking to a group in California that is attempting to build a predictive algorithm that will help them decide whether a given call from a teacher, doctor, neighbor, or family member about a child in danger is sufficiently ominous to send out a caseworker. They don’t have enough resources to send out a caseworker for every call, so they have to make judgment calls. How should they train a predictive algorithm to improve their service?

The first answer, as it is with most data problems, is to make sure they have clean and relevant data. An important aspect of this is to know, when someone calls, whether they are referring to a child that is already in the system. Spelling errors or vague information could hamper this investigation, as could a family that moves frequently or is homeless. Secondarily, what kind of information can and should one use about the family, assuming a child is positively identified? After all, this call usually amounts to suspicion, not conviction of child abuse, so there are privacy concerns. Moreover, the information is not equally distributed: there’s likely to be far more information in the system if the family is poor and minority, if it contains people who’ve been homeless or on welfare, or who have mental health problems or criminal records.

The second answer is to think about the objective function: what are you optimizing to? No algorithm is perfect, so although you’re always trying to build an algorithm that is as accurate as possible, you must also consider the errors. And in this case, that means you must balance false positives against false negatives.

Let’s work out the scenarios there: a false positive is when you suspect a family of abuse when there is none, or when it doesn’t rise to the level of severe abuse. Depending on the outcome, this could end in tragedy, if the child is taken away from their family, say. A false negative, on the other hand, is where you decide the child is probably safe and you don’t investigate, but the child actually is abused. It’s also a tragedy.

What’s the trade-off? How many false positives would you give for one false negative? It’s a tough question to answer but it’s absolutely required to train this algorithm.

Every algorithm has, at its heart, an ethical dilemma. Some of them are much less difficult to resolve, but they exist nonetheless. One of the most important goals of algorithmic accountability is that the data scientist should not make those ethical decisions for the sake of society, but rather should be a translator of ethical decisions into code. In other words, a data scientist should make transparent what the trade-offs are and even build tests, or ongoing monitors of their algorithms, to make sure the decisions are consistently upheld by the algorithm. Specifically, such a monitor should keep track of errors, and the distribution of those errors over the population.

Here’s an example to illustrate how badly this can go wrong. Last year the Australian Department of Human Services introduced an automated “debt recovery system” that utilized a crude fraud detection algorithm which calculated the extent to which Australian citizens had been overpaid by the welfare system. The fraud detection program was flawed because it assumed a steady income throughout the year for all recipients, while indviduals’ actual monthly incomes often varied considerably. The number of such compliance letters jumped from 20,000 per year to 20,000 per month with the new system, and the department was flooded with complaints. This is a system that, I would argue, was not carefully vetted for its errors, especially its false positive rate. Rather, it was optimized for cost-savings with little regard to the human toll of errors.

I don’t wish to overstate the case against algorithms, or rather the case for accountable algorithms. Algorithms aren’t the only or the biggest problem we face as a society. They are simply the most recent set of tools for power. Power is old, after all. Powerful institutions and individuals have had tools before and they’ll have them again in the future, even if algorithms are held accountable and appealable by then. But, as a mathematician, it appalls me that these particular tools are being wielded as mathematical weapons.

Neither am I the arbiter of truth. I’m not saying what the embedded ethics of a given algorithm should be. But neither are you, nor should a lone Facebook engineer, unschooled in ethics, decide how the difference between information and propaganda is understood, and how data is therefore disseminated to the rest of us. Personally, I don’t know what a “qualified” candidate is in a given situation, but I know that the historical process of hiring people should probably be given a second look. And that means by a group of people who take the job seriously.

Here’s an important first step: let’s separate conversations around rules and morals from conversations around the technology of algorithms or AI. The first type of conversation is one that can include everyone, and the second is a technical discussion that should be informed by the first. If we left it to Silicon Valley technologists to decide on the new rules, we could end up with a machine deciding who gets organs based on the expected value of a human being to society, which in turn could be determined by who makes the most money on the stock market.

Next, let’s hold algorithms accountable. That means, if we decide an algorithm has the potential to act unethically or illegally, we monitor it from the get-go and resolve the problem if necessary. The tools to do this exist, although they’re not refined.

The biggest pushback I expect to get from this idea is that it’s expensive. That’s true. It’s expensive in that it simply costs money to add layers to an already complex process, and ongoing monitoring and fiddling with algorithms is definitely one or maybe multiple layers. Moreover, and more importantly, it’s expensive because on any reasonable definition it will generally cut down on profit to be nondiscriminatory. If you don’t know what I mean, check out this recent paper written by Moritz Hardt from Google and others, which examines the “cost of fairness” when you optimize to profit with a fairness constraint, under multiple definitions of fairness. They even have an section, starting on page 16, that works out the case of a FICO-type credit score optimized with various choices of fairness constraints, and they measure how much each one eats into profit as a function of the error tradeoff we mentioned earlier.

No one company will likely take on this challenge unless they need to, because none of them want to incur the real expense that ethical constraints must bring. A first mover in this space will be at a competitive disadvantage. That’s why we need rules, or laws, around algorithmic fairness, especially when they directly threaten profit margins. If everyone has to abide by the same standards, then the industry as a whole will be fair, and the profit will be distributed based on other competitive advantages like customer service or ease of use, as we want.

I don’t see a way forward in this space without a place for standards that amount to anti-discrimination and fairness laws. In fact, I encourage tech companies to come up with reasonable laws before government comes up with unreasonable ones.

Response Essays

Fairy Dust, Pandora’s Box… or a Hammer

Cathy O’Neil wants to puncture a particular perception of algorithms that views artificial intelligence and other mathematical decisionmaking models as a type of “fairy dust” we can sprinkle over all our problems to make them disappear. This is an important project, and O’Neil does vital work in her essay and her book pointing out where and how unquestioning faith in these processes can lead us astray. Libertarians should take heed of her warning: algorithms can certainly be skewed with historically biased data, or targeted toward unethical ends. Yet this should not lead us to storm Silicon Valley with torches and pitchforks. Algorithms are powerful tools that also have tremendous potential for good.

In her effort to push back on the fairy dust viewpoint, O’Neil goes too far in the opposite direction. In her lead essay, she claims there “is no such thing as a morally neutral algorithm” and that “Every algorithm has, at its heart, an ethical dilemma.” While this certainly sounds like a stern and prudent warning, it’s not immediately obvious that this is true. Does a simple AI trained to play checkers house an intractable moral debate? Does matrix multiplication need to be preemptively regulated for fear of opening Pandora’s Box?

If this seems silly it’s because algorithms, in the abstract, are morally neutral; any moral weight they have comes from their real world applications. And the decisions we make don’t gain or lose any ethical significance because we asked an algorithm to help. An individual wrongly imprisoned is just as much a tragedy whether human bias or “algorithmic bias” is responsible.

Put this way it becomes clear that an algorithm is like any other tool – neither inherently good nor evil and only raising questions of ethics when we ask it ethical questions. Like a hammer, it requires careful and thoughtful application to hit the nail, rather than our thumb. Granted, it’s much easier to understand how a hammer works and so perhaps easier to avoid causing unintended harm. But it doesn’t make sense to render an ethical evaluation of the hammer outside of its applications.

This matters because it indicates that we may want to govern algorithms differently depending on the specific use case, rather than as an impossibly broad blanket category. We probably have more to fear from an algorithm that can recommend a longer prison sentence than from one that gives us Tinder dates. And it’s worth pointing out that the vast majority of applications fall closer to the Tinder side of the spectrum. As such, we should desire the strongest legal protections and accountability measures when algorithms are being used by governments, as they are the only entities legally entitled to curtail our civil liberties.

Government Incentives Are Different

Libertarians will argue that private companies have strong reputational and competitive reasons to make sure they are using algorithms in a productive manner, and they will be skeptical of excessive government regulation as a result. And rightly so! After all, many companies are responsibly using algorithms to create amazing innovations, and we don’t want to put the brakes on those developments. Governments already have established legal frameworks for governing discriminatory outcomes in sensitive areas, and while the courts may need to interpret and tweak their application for the digital age, there is no reason to think we need a sweeping new regulatory apparatus. For the most part, we should let the private sector experiment with this developing technology and avoid buying into technopanic. New regulations, if necessary, should be narrowly tailored to address specific bad outcomes rather than theoretical ones.

The most compelling concerns about the improper use of AI and algorithms stem primarily from government use of these technologies. Indeed, all the tangible examples of harm O’Neil cites in her essay are the result of poor incentives and structures designed by government. Namely, hiring models at a public teaching hospital, teacher value-added models, recidivism risk models, and Centrelink’s tax-fraud detection model. The poor results of these kinds of interactions, in which governments purchase algorithms from private developers, could be viewed primarily as a failure of the government procurement process. Government contracting creates opportunities for rent-seeking, and the process doesn’t benefit from the same kinds of feedback loops that are ubiquitous in private markets. So it should be no surprise that governments end up with inferior technology.

Libertarians, then, should be especially supportive of strong oversight and accountability for the use of algorithms and artificial intelligence when the government is exerting its power over individuals in areas like criminal justice. Take the Wisconsin case of Eric Loomis, for example. He was deemed “high risk” by a proprietary risk-assessment software and sentenced to six years in prison, partly because of that designation. He appealed, claiming that he should be able to view the algorithm and make arguments about its validity as part of his defense, but his request was denied by the state Supreme Court. Regardless of the specific merits of the Loomis case, the larger idea of unviewable algorithms aiding in sending people to prison is extremely problematic; a fundamental aspect of due process is understanding why you were sentenced and a public explanation of the process. As the use of artificial intelligence in criminal justice continues to grow, this will only become more of an issue.

But the answer here isn’t to abandon the project or collapse into what I might call bias fatalism – the belief that bias is inevitable, so why bother. We need to push forward and advocate for strong institutional accountability over the use of AI – especially when people are being sent to prison. The safest path forward would be open sourcing the entire criminal justice system. Requiring that all algorithms used to strip individuals of their civil rights be made available as open source software would mean government and civil society groups could regularly audit everything from the underlying data to the variable weights to help identify and root out problems. This may, on the margin, decrease the incentives for private developers to innovate and develop new solutions, but constraining the coercive power of the government requires a strong weighting toward transparency.

Open source is a fantastic tool to use in particular situations, like the justice system, but it seems unlikely to be the silver bullet for all possible government applications, from tax-fraud detection to child protection services. Sometimes we need to keep the exact weighting of the variables opaque to prevent gaming of the system, and sometimes transparency isn’t nearly as important as getting accurate predictions: although it is disputed, there is reason to believe there is some fundamental trade-off between accuracy and transparency in “black box” machine learning algorithms. But this brings us back to the earlier point that our weighing of the trade-offs between different governance systems should change based on the specific use case of the algorithm in question, rather than on the fact that it is “an algorithm.” There are lots of potential options, and a robust discussion around this topic certainly needs to continue.

A Positive Vision

My biggest fear is that someone reading O’Neil’s work would go on to become an activist against the use of algorithms, rather than for their use in a responsible manner. O’Neil herself recognizes that the pre-algorithm world often isn’t a better alternative, but doesn’t spend much time laying out a positive vision for the inclusion of these tools in the first place. One could be forgiven for thinking that the takeaway is despair: if including mathematical decisionmaking models opens you up to criticisms of entrenching racism with no benefits, then why bother?

But places where human bias is most prevalent offer some of the most exciting opportunities for the application of algorithms. Humans appear to be really, really bad at administering justice by ourselves. We judge people based on how traditionally African American their facial features look, we penalize overweight defendants, we let unrelated factors like football games affect our decision making, and more fundamentally, we can’t systematically update our priors in light of new evidence. All this means that we can gain a lot by partnering with AI, which can offset some of our flaws.

Algorithms can also be used as a catalyst for other, more fundamental, reforms of our system. New Jersey, for example, recently reformed its bail system and replaced it with a risk assessment software. The result has been a 20 percent decline in the jail population in the first six months alone. But without the algorithms to augment the new system, it seems unlikely this reform would have happened.

We can not afford to collapse into bias fatalism; our own human failings are too great to leave unaddressed. Greater integration of algorithms into our society poses risks, and O’Neil certainly brings up many important questions. But with the proper safeguards we can slowly find and remove many forms of machine bias while beginning to constrain our own.

“Neutrality” Isn’t Neutral

In Cathy O’Neil’s essay about the hidden bias of machine learning algorithms, she offers a simple and obvious truth about the nature of the algorithmic tools that increasingly influence many of the most important decisions in our society: “AI and machine learning algorithms are marketed as unbiased, objective tools. They are not.”

Her words indict not just the failings of a specific technology, but also the larger fiction of neutrality that invisibly coats so much of the world as we understand it. In linguistics, there is the notion of deixis: the idea that language is coded, inevitably, with countless markers of positional context. The words “I” or “here” or are not monoliths of meaning that stand self-sufficient and independent of each other, but indicators of position that are inescapably enmeshed in notions of “you” and “there.” You cannot exist, or express that existence, without coming from a particular place and pointing at another.

As David Foster Wallace observed in his speech “This is Water,” the formative conditions of our lives and our society are often as invisible to us as they are real, and difficult to see precisely because we are so suffused in them. A fish does not contemplate the nature of water, much as we do not always contemplate the foundational impact of our families, our governments, our societal perceptions of value and ability. We simply move through them, if we are able to do so without friction, and call them the world.

Remaining empathetic as well as critical of the darker and unquestioned forces that move beneath the surface of our culture is often a challenging task, but also an essential one if we hope to escape the solipsism of walking through life from behind only one set of eyes, and one set of experiences.

“The really important kind of freedom,” said Wallace, “involves attention and awareness and discipline,” the ability to question the ground we walk upon with every step we take, rather than striding forth with the incurious blinders of inherited “truth” wrapped firmly around our eyes. We do this only by cultivating our own awareness about “what is so real and essential, so hidden in plain sight all around us, all the time, that we have to keep reminding ourselves over and over.”

Too often, we consider the world—and the data, tools, and systems that make it run—from a perspective that takes the biases of its creators for granted, and institutionalizes their blind spots in ever more troubling ways. In the case of algorithmic decisionmaking tools, when we ignore the bias of the data that informs them and the notions of success and accuracy that their code expresses, we create tools that can become substantially worse than human decisionmaking. A human, even a biased one, can always decide simply to take a chance on someone, or to make an exception based on a “gut feeling.” A machine that is trained to reiterate and even amplify existing biases cannot. Our technological children have no empathy or situational awareness to temper their worst impulses, and ours; they merely run the scripts we have given them, without context or contemplation.

When we talk about technology being neutral, we are engaging in a more specialized version of a much larger argument: the idea that the world itself is neutral, and that the particular configuration of reality that we engage with from day to day is somehow natural and fair, rather than a product of countless historical, economic, environmental, and cultural forces that were as incidental as they were formative.

Algorithms do not emerge fully formed into existence like digital Athenas springing from the head of a perfectly objective Zeus. They are crafted by hands of flawed human beings and operate on data that is generated by the same. Much like our words, they are always saying something about where we are coming from and where we are going. We should pay attention to what we are saying, especially in the moments when we think we are not speaking at all. To ignore this is to perpetuate all of the biases that course unexamined through the veins of the world.

“Neutrality” has always been the province of the powerful. Much as whiteness is often perceived as not having a race and maleness as not having a gender, existing within the dominant social groups of a society has historically meant not having to examine the forces that empower them or produce them. The ability to bathe in the frictionless ease of being the default—of never having to consider these questions of bias all—is the height and definition of unearned advantage.

So let us question them, and let us stop handing over the most important decisions that our institutions can make—who gets jobs, loans, prison sentences, and opportunities—to systems whose methodologies are opaque and whose impacts on the most vulnerable members of society may be substantially worse than the mistakes we might make with our own human hands. If algorithms can truly do better than us and create fairer outcomes, then let them prove it first, before we hand them the keys to the kingdom and absolve ourselves of what happens next simply because the outcomes are generated by machines.

We will always be coming from somewhere, a place that is as complicated, conflicted, and complicit with the forces of power that shaped us as the things we shape in return. There is no neutral way to approach algorithms, no way to reduce the complexity of the world to the sleek simplicity of ones as zeros, no matter how seductive the idea may seem. To be responsible and ethical is to demand acknowledgement and transparency about the choices that we make—especially the ones that might not initially seem like “choices” to us at all—and the values that they code into the technological foundation of the world.

“It is unimaginably hard to do this, to stay conscious and alive in the adult world day in and day out,” concluded Wallace, “which means yet another grand cliché turns out to be true: your education really is the job of a lifetime. And it commences: now. I wish you way more than luck.”

What to Expect When Everyone’s Collecting

Cathy O’Neil’s essay, like her engaging book Weapons of Math Destruction, provides a valuable counterweight to our tendency to be overawed by imposing mathematics, granting the products of machine learning and big data analysis an unexamined aura of objectivity. Since my own focus is on privacy and surveillance policy, I’ll even add a few items to her bill of indictment.

One of O’neil’s central concerns about machine learning is its potential to generate vicious feedback loops: Subsets of the population are algorithmically branded “high risk”—whether for default on loans or criminal recidivism—and that judgment, echoed across multiple institutions employing similar algorithms, ultimately contributes to the fulfillment of its own prophecy, seemingly validating the model that generated it. But there’s also a potential feedback loop on the input side, as the prospect of reaping gains—commercial or otherwise—from sophisticated algorithmic analysis generates demand for more data to train and feed ever more complex models.

Two related technological developments are jointly responsible for much of that algorithm food: The precipitous decline in the cost of data storage—to the point where even seemingly useless data can be stored indefinitely at trivial expense—and the explosive growth of networked computing technologies that generate structured data records of the human actions and interactions they mediate as a side effect of their ordinary operation. As big data analytics provide a means of monetizing information, that data that was once, in effect, a digital waste product—granular records of how the reader’s mouse moves around a Web page, say—is increasingly retained, either for internal use or for sale (perhaps in theoretically anonymized form) to data brokers. If storage is cheap enough, it may be worth simply defaulting to retaining nearly everything, on the theory that even if it’s not useful now, some analytic use may later be found for it.

Nor, increasingly, is that approach limited to the digital realm. Cell phones are networked, sensor-enabled devices that can be used to amass a dizzying array of useful types of data that would until recently have been infeasible to gather at scale—allowing driving apps like Waze to make better predictions about traffic patterns and delays based on historical user data. In spaces like Walt Disney World, foot traffic is monitored just as meticulously via electronic bracelets that serve as both tickets and trackers. Retail goods now routinely arrive on the shelf bearing RFID tags to help give brick-and-mortar shops the same analytic insights into consumer behavior that their online counterparts take for granted.

Most of this is benign enough in itself. It’s precisely because the data collected has value that we’ve grown accustomed to getting valuable online services for free, and few Disney visitors mind having their stroll through the park tracked if it helps Big Rodent provide a better experience. But in the aggregate, the imperative to feed all those hungry algorithms generates both massive pools of data and, perhaps more importantly, pervasive architectures of surveillance that can subsequently be repurposed—whether by their owners, malicious attackers, or law enforcement and intelligence agencies. Security experts routinely recommend minimizing the retention of unnecessary data, both to reduce the attractiveness of databases to attackers and mitigate the harms of a breach if it does occur. Exploiting big data, more or less by definition, means doing the reverse.

Thanks to a quirk of American jurisprudence, personal information effectively loses its presumption of Fourth Amendment protection when shared by third party businesses. That’s why, to pick a notorious example, the NSA’s bulk collection of Americans’ telephone records, publicized by Edward Snowden, did not require the kind of particularized search warrant that would be required to enter a home and rifle through private papers. That constitutional asymmetry means that as ever more useful data is collected by private firms, intelligence and law enforcement naturally gravitate toward investigative methods that exploit those resources when possible, sometimes to establish the probable cause needed to seek judicial authorization for a physical search or electronic communications surveillance—sometimes obviating the need to do so altogether. That temptation is particularly strong for intelligence agencies, whose mandate is not to punish particular crimes after the fact, but to anticipate and preempt terrorism or espionage before they occur.

At NSA, that led to the self-conscious adoption of a “collect it all” approach—presumably on the theory that if the attacks of 9/11 represented a failure to “connect the dots,” the solution was to collect more dots. But, as Jim Harper and Jeff Jonas argued in a Cato policy paper more than a decade ago, terrorism is rare enough and its manifestations mutable enough that data mining approaches are sure to yield vastly more false positives than true hits. In the years after 9/11, exasperated FBI agents were known to complain about time and resources wasted following up “Pizza Hut leads” generated by the intelligence community—because, say, a phone number cropping up in the call records of a seemingly suspicious number of terror suspects turned out to be the local pizza parlor.

One solution to this problem, of course, is to gather yet more data, in hopes of refining one’s algorithms and adding additional variables that help exclude false positives. But the data sets necessary to do this are often enormous. Consider, for instance, an NSA program known as CO-TRAVELLER which seeks to map the social networks of foreign targets, not by looking at electronic communications links, but by using cell phone location data to identify people who are meeting up in person. The trouble, of course, is that you can’t do this by looking at the records of your target, which won’t have an entry for “people nearby.” Rather, you need to analyze everyone’s location records and find statistically anomalous pairings in the sea of human motion. But since even that is bound to generate a substantial number of purely coincidental matches, you likely need to consult still more data sets to figure out which of the leads thus generated are promising and which are innocuous.

Intelligence analysts sometimes refer to this as “drinking from a firehose”—the quantity of data to sift through is so enormous that it becomes overwhelming to sift through. But this complexity poses challenges for oversight as well. If we think back to the political abuses of intelligence in the 1970s and 70s, we find instances where misconduct could be identified by observing that a wiretap or other method of data collection had been carried out on a politically sensitive target without any clear legitimate purpose, or had continued even after a legitimate investigation had been wrapped up. When data is collected at a population scale, abuse can far more easily hide in the crowd.

That’s especially true when that same size and complexity make “innocent” violations of the rules more common. In documents disclosed in 2013, the government explained to the Foreign Intelligence Surveillance Court that rules for querying the NSA’s vast telephone database had been routinely violated, not as a result of malfeasance, but because the system was so vast, complex, and compartmentalized that nobody within the agency understood how the pieces fit together.

Having made this case, I should note that while the dynamic I’ve described here is exacerbated by the rise of machine learning and data mining, that is hardly the sole factor, and it would be somewhat odd to focus on the algorithms if our aim is to remedy privacy problems. And I think something similar can be said of many of the problems O’Neill discusses in her essay and book.

One section of Weapons of Math Destruction for instance, deals with the ways predatory businesses, and even outright fraudsters, can employ large data sets to target their ads at vulnerable populations. But this is hardly a problem with targeted marketing—or with the telephone and e-mail networks those businesses use to reach their targets. It’s just another instance of technologies with general utility regrettably facilitating the conduct of bad actors as well as good ones.

A more plausible case of a harm originating with algorithms comes in a section where O’Neil considers how adjusting car insurance premiums based on risk—does the policy holder live, work, or commute through areas with a heightened risk of theft or vandalism?—compounds the burdens on the poor. But O’Neil doesn’t really make a case that these assessments are systematically wrong, so much as that they seem unfair: The poor may have no option but to live in higher risk neighborhoods, and being faced with higher premiums as a result is just one more burden. And that may be true, but if we think there’s a social obligation to aid those who lose out as a result of such risk analysis, it’s not clear why the sensible remedy is to try to shift real risk to insurance companies, obscuring market assessments of that risk in the process.

In other cases, algorithms don’t so much introduce problems as make them harder to ignore. If racial and gender bias influence hiring and promotion, we may be able to detect the problem by either experiment or analysis of labor force statistics, but it’s notoriously difficult to remedy, especially if the bias is often subconscious, since the problem is the upshot of millions of people at thousands of businesses making discrete decisions that may well seem individually defensible. If those hiring practices are used to train an algorithm for training employment applications, we at last have a single target to train our disapproval on. And often, of course, we should—provided we recognize that this is likely an improvement over the scenario where the same biases are hidden in the black-box algorithm that is human judgment. Precisely because algorithmic biases are easier to monitor and tweak than their human counterparts, we should be wary of approaches that make it less costly to fall back on human judgment as a means of concealing rather than remedying those biases.

Instances where algorithmic analysis of big data simply produces wrong results seem like the easiest case, and the one where the case for some external policy remedy is weakest. The loss that results from an algorithm that tells a bank to pass on a loan that would be repaid, or an employee who’d perform well, may not be large enough to spur a reweighting in the individual instance, but over time—and especially for larger firms—any truly systematic analytic failure, affecting significant numbers of people, is likely to impose enough of a cumulative cost that old fashioned greed in a competitive market provides adequate motivation for gradual improvement.

The toughest cases are apt to be of the sort we considered at the start: Where algorithms are widely enough used that they generate feedback loops that seem to validate their own predictions. Even here, though, the problem is often somewhere other than the algorithm. If a model for predicting recidivism gives longer sentences to some convicts, and the longer sentence itself increases the likelihood of recidivism upon release—because the prisoner has now spent years in a criminal social milieu and their marketable skills have atrophied—that’s a serious problem, but it’s fundamentally a problem with the carcereal state, and one that would seem to recommend less punitive sentences across the board, not a tweaking of the process by which they’re allocated.

The feedback loop cases also seem like the ones least amenable to remedy by regulatory intervention in the market context, because barring a monopoly, the problem will be less a function of any one process, but of how many interact over time. Problems of this type may be quite serious, but they’re also the least likely to be spotted in advance.

That’s why despite my sympathy with much of the argument O’Neil advances, I’m not terribly sanguine about the idea of giving some regulatory body responsibility for ex ante review of the algorithms deployed by private firms. The biggest downside to such an approach, I suspect, would not be the monetary expense so much as the friction imposed when problems that were unforeseeable based on scrutiny of code in isolation become evident in the wild. It would be counterproductive to make denying the problem as long as possible less expensive than tweaking the algorithm, which seems unavoidable if we impose a layer of regulatory review in each iteration of the trial-and-error process of tweaking.

More promising, in my view, would be an intermediate approach: Encourage firms to let outside researchers review their models with an eye toward identifying potential harms that may not reduce short-term profits, and thus are more likely to fly beneath the radar of in-house coders. If there are systematic problems that seem to arise in particular sectors, that should prompt a discussion about sector-specific solutions—armed with the essential general insights O’Neil has provided.