In the world of humans, Brian Russell is a regular blue-collar guy. Stocky with a shaved head, black-rimmed glasses and a tightly trimmed Van Dyke, he pulls down steady hours at his job installing security systems. Every night, he drives his old green Jeep home to a freshly planted subdivision of modest ranch houses outside…
In the world of humans, Brian Russell is a regular blue-collar guy. Stocky with a shaved head, black-rimmed glasses and a tightly trimmed Van Dyke, he pulls down steady hours at his job installing security systems. Every night, he drives his old green Jeep home to a freshly planted subdivision of modest ranch houses outside the squeaky-clean West Michigan town of Zeeland. Trucks moan past on the freeway out back and the dewy-sweet smell of cut grass follows him to the door. His dog, Mischief, his fiancée and their two boys greet him. All seems right with the world.
But this world – the one we can see and touch and smell – is no longer the only one that matters. Another domain, built by humans but ruled by computers, has taken shape in the past few decades: that of algorithmic decision-making.
This new world is often invisible but never idle. It likely determines whether you’ll get a mortgage and how much you’ll pay for it, whether you’re considered for job opportunities, how much you pay for car insurance, how likely you are to commit a crime or mistreat your children, how often the police patrol your neighborhood. It even influences the level of prestige conferred by a U-M degree, thanks to the now-ubiquitous, algorithm-based U.S. News & World Report college rankings.
Generally, these algorithms keep a low profile. But occasionally, they collide spectacularly with humans. That’s what happened to Russell.
In 2014, a computer system called MiDAS plucked his file out of the Michigan Unemployment Insurance Agency database and calculated, without any human review, that he had defrauded the unemployment system and owed the state of Michigan approximately $22,000 in restitution, penalties and interest – the result of a supposed $4,300 overpayment, plus Michigan’s customary 400 percent penalty and 12 percent interest. Then, still untouched by humans, MiDAS began to collect. It seized more than $10,000 from Russell by electronically intercepting his tax refunds in 2015 and 2016. He knew nothing about the fraud determination until his 2015 tax refund disappeared.
Russell simply couldn’t afford the five-figure hit to his income. For the next two years, he made ends meet the best he knew how – he cancelled family trips, cut back on medical care for his diabetes, worked odd jobs. For a time, he lived in a friend’s basement.
While Russell struggled in the aftermath of the fraud determination, MiDAS kept rolling. An algorithm-based administration and fraud collection system implemented by the state of Michigan, it ran without human intervention for nearly two years between 2013 and 2015. During that time, it accused about 50,000 Michiganders of unemployment fraud. A 2017 review by the state found that more than 90 percent of those accusations were false.
Russell still doesn’t know why MiDAS accused him of fraud. He collected unemployment on and off a few years back when he was working as a journeyman electrician. Like generations of electricians before him, his union filed for unemployment on his behalf when he was between jobs. He can’t see the system, can’t touch it, can’t talk to it, can’t ask it why it has taken his money. The Michigan Unemployment Insurance Agency hasn’t shared any information with him.
“How do you beat something you can’t see?” Russell said. “It’s like swinging in the dark. What are the laws that apply to a computer system? And what about us humans?”
That’s a question that, increasingly, is troubling the architects of the algorithmic world. They’ve dedicated their careers to data, certain that it would make life more fair, equitable and efficient. In some cases, it has. But as algorithmic decision-making has become more and more powerful, some researchers have become increasingly concerned that it’s not living up to their vision.
A growing number of people, like Russell, have been harmed by an algorithm gone off the rails. Algorithm-based financial systems were found to have helped spark the 2008 housing crisis. Residents of neighborhoods targeted by predictive policing systems feel besieged by an unending wave of police scrutiny. The U.S. News & World Report college ranking system has been criticized as distorting academic priorities and raising the cost of an education.
In many cases, the researchers who helped create the algorithmic world have turned their attention to rebuilding it in a form that’s fairer, safer and more sophisticated. At times, their work takes on an ironic David-and-Goliath quality as they work to hem in the massive entity they helped create.
H.V. Jagadish, the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at U-M, has been tapped to lead the Center for Responsible Information Technology, a U-M think tank that’s now in development. It will aim to help technologists use algorithmic systems and other IT advances in a socially responsible way.
“In the early days of any new technology, you want raw creativity and bold ideas; you want to make technical progress as quickly as possible. As the technology matures and has a greater reach, you have to take into account its impacts on society,” said HV Jagadish, the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at U-M. “In the early days of the industrial revolution, for example, there was tremendous pollution, but over time the need to control pollution became well recognized. Thereafter, there has been a lot of technical work to maximize the benefits and minimize the harms of industrialization.
“I expect that with algorithms and data science and artificial intelligence it’s going to be the same,” he said. “A lot of smart people are starting to think about this and over the next several years, we’ll adopt best practices that have as little harm as possible for the good they can do.”
Changing the world – any world – is a series of small steps. And researchers are just beginning what could be a decades-long journey.
NEW WORLD, NEW PROBLEMS
Some engineers are working to make sure that the data that goes into decision-making systems is better parsed and more thoroughly understood. Others are working to help non-engineers gain a better understanding of the algorithmic tools that shape their lives. And still others have assumed the role of cyber-vigilante, testing algorithms in the wild to ensure that they’re helping humans, not harming them.
One of those engineers is Danai Koutra, a U-M computer science and engineering assistant professor who is working to build a deeper understanding of the data that goes into algorithmic decision-making tools. From an engineering standpoint, she says, it’s too easy to find a simple correlation and use it, without looking deeper to see why that correlation exists or how the data behind it might be flawed.
“People are flawed. We make biased decisions,” Koutra said. “And now we’re asking algorithms to be better than we are, based on data that we produce. That’s an interesting ask.”
MiDAS, for example, relied on a single, simple correlation: discrepancies between the data reported by claimants like Russell and data reported by their former employers. If an employee and employer reported different amounts of income, for example, or different reasons why an employee left a job, it was flagged as possible fraud. By default, it was assumed that the employee was the one who reported incorrectly with the intention of defrauding the system. An automated questionnaire was sent to the employee – often at a years-old address – and if the agency didn’t receive it back within 10 days, MiDAS made an automatic determination that the employee had committed fraud.
Koutra is working to develop systems that handle data in a more nuanced way, partnering with Jagadish on GeoAlign, a project that aims to clean up the algorithmic world’s geographic data. Present in some 80 percent of algorithmic datasets, geographic data is a key component in the measurement of everything from credit scores to crime rates. Comparing how two different variables – say, home ownership and crime rate – shake out over a given geographic area is endlessly useful.
But geographic data is notoriously messy, largely because different agencies measure it in different ways. How do you compare home ownership and crime rate, for example, if home ownership is measured by ZIP code and crime is measured by county?
Mathematically, it might be tempting to assume that crime is evenly distributed across the county and simply slice the county map into ZIP codes, attributing a given percentage of the county’s crime to each ZIP code based on its geographic size. But in reality, crime tends to cluster in certain areas. So that approach would almost certainly lead to a faulty algorithm – and potentially life-changing consequences for the people affected by it.
GeoAlign takes a more sophisticated approach. It uses what’s called a “crosswalk algorithm” to find other variables in the dataset that correlate with the ones being studied and are available on a finer geographic level. It then uses those additional data points to infer the geographic distribution of the data that’s being studied.
It might find, for example, that crime is closely correlated with the number of tax-delinquent properties in a given area – and that the latter piece of information is available on an address-by-address level. The system crunches the variables to determine exactly how closely they hew to one another, coming up with a weighted formula that can more accurately port geographic data from one unit of measurement to another.
GeoAlign shows that honing the data behind automated decisionmaking systems isn’t simple. But as algorithms work their way into more and more areas of our lives, there’s less and less room for error. When an algorithm recommends the wrong movie, it’s laughable. When it declares you ineligible for medical coverage or saddles you with a massive debt? Not so much.
“It goes back to understanding what your data is representing,” she said. “Data is never perfect, but if you understand how it’s skewed than you can build the math to account for that. As powerful as data is, it’s up to us to interpret it.”
CRACKING THE CODE
But getting the math right is only one piece of the puzzle. If humans and algorithms are to play well together, it’s also critical for everyone who works with algorithms – not just the engineers who build them – to have some understanding of what goes into their decisions and how they affect people. People like Russell, for example, need to understand how a system like MiDAS uses their data, and how they might be affected by its decisions.
U-M’s H.V. Jagadish and researchers at Drexel University, the University of Washington and the University of Massachusetts Amherst are working on a National Science Foundation-funded solution that he calls “a nutritional label for rankings.” He believes it could make algorithms like MiDAS more transparent to both the agencies that run them and the end users who are affected by them. Much like the nutrition labels on grocery store food packaging, his system breaks down an algorithm into objective, easily understandable attributes to rate its fitness for use.
“When I’m looking at a box of cereal in the grocery store, I want to know what’s in it, but I don’t need to know the process that was used to manufacture it,” he said. “It’s similar for algorithms. The average person doesn’t need to know all the details of every algorithm, but they do want to know how it uses their data and whether it might harm them.”
While it’s a very new idea, Jagadish envisions such labels being applied to algorithms at every step of their use. In the MiDAS example, such labels might have helped the state of Michigan better understand how the system worked before it was unleashed on the population.
“Often, the person running the algorithm isn’t the person who wrote it, and entities like the state of Michigan basically have to take the word of the software provider,” Jagadish said. “A nutritional label could give decisionmakers enough information to know what they’re signing off on, without requiring them to become experts in the software.”
Once an algorithm is rolled out, the labels could be applied to publicly visible parts of the system, for example the parts of the MiDAS system that claimants like Russell interact with. This would let the people affected by an algorithm get at least a basic idea of how it makes its decisions. It could have let Russell, for example, know that even a rounding error made by his employer could result in a fraud determination. It could also alert potential investigators like journalists, university researchers and private-sector experts when flawed algorithms are being used in sensitive areas.
RIGGED SYSTEMS?
There’s one big weakness in solutions like nutrition labels – they assume that once a software engineer or corporate IT manager has the facts, they’ll do the right thing. And in the algorithmic world as in the human, that isn’t always the case.
Algorithm designers have been tilting the field in their favor for decades – one of the earliest examples is Sabre, the electronic flight booking system that American Airlines rolled out in 1960. Sabre enabled American Airlines agents to book flights electronically for the first time, using a system of dedicated computer terminals.
A major improvement over the system of push pins and bulletin boards it replaced, Sabre made it possible to book a ticket in seconds instead of hours. In the 1970s, it was expanded beyond American Airlines and made available to travel agents. By the 1980s, travel agents began to notice that the top search result was often an American Airlines flight that was more expensive and longer than those below it.
Suspicious that American Airlines had rigged its system to bump its own flights to the top of the list, the U.S. Civil Aeronautics Board and Department of Justice launched an antitrust investigation. American Airlines CEO Robert Crandall was brought before Congress. He was surprisingly matter-of-fact about the whole affair, owning up to the manipulation right away. Why, he argued, would he go to all the trouble of building an algorithm if he couldn’t rig it?
In 1984, Congress decreed that Sabre’s code must be made transparent to the public, and Sabre and American Airlines finally parted ways in 2000. Most algorithms, though, are still black boxes – and these days, they control far more than airlines. Naturally, this has compelled computer scientists and others to find ways to peek inside. One of those scientists is Christo Wilson, an associate professor of computer and information science at Northeastern University.
Wilson is an expert in a field called “adversarial testing,” which finds clever ways to determine how algorithms work, how well they’re built and whether they’re doing a good job of serving humans.
Occasionally, adversarial testers are able to get their hands on an algorithm’s source code, in which case they can simply dissect it and identify areas where improvements could be made. Wilson recently co-chaired a conference featuring a paper that used this method to scrutinize the algorithm that underlies PredPol, a predictive policing tool used in many major cities.
PredPol uses past crime data to predict which of a city’s neighborhoods are most likely to see crime in the future, enabling police departments to focus resources on those areas. It has been a hit with police departments, but the study found that it creates a feedback loop that funnels officers to the same neighborhoods over and over regardless of their actual crime rate.
According to the paper, officers find more crime where they patrol more and less crime in the areas where they patrol less. And as each day’s new crime data is fed into the system, PredPol points officers back to the same citizens – often poor and minority – over and over, creating an atmosphere of distrust and funneling people into the criminal justice system for the most minor offenses, while those in neighborhoods outside PredPol’s feedback loop face no such scrutiny.
“There’s something tantalizing about this idea that data is truth, that people generate it and it’s an accurate prediction of reality,” Wilson said. “But if you talk to a statistician they immediately tell you no. In policing data and elsewhere, human error is often compounded. But we treat it as if it’s just true.”
The opportunity to dig into a juicy algorithm like PredPol is a rare treat for researchers like Wilson. Usually, they must use more elaborate methods to kick the code’s tires. In many cases they use what’s called a “scraping audit,” writing a script that makes repeated requests to a system and scraping data from the results that come back. In other cases, they use a “sock puppet audit,” which sets up a series of fake user accounts to interact with the system and analyzes the results.
Adversarial testing has amassed a solid track record of uncovering algorithmic tomfoolery. It has found that Google stacks search results in its own favor and uncovered the fact that Uber’s ride-hailing algorithm gives preferential treatment to certain neighborhoods.
But, perhaps not surprisingly, algorithm makers and users don’t always appreciate the attention of people like Wilson. Fake users and automated scripts violate most websites’ terms of service agreements, and courts have generally given those agreements the backing of federal law. In most cases, they’ve ruled that a terms-of-service violation runs afoul of the Computer Fraud and Abuse Act of 1987.
This is a problem for Wilson and his cadre of online vigilantes – in fact, he is one of five plaintiffs who are currently suing the United States Justice Department with the help of the American Civil Liberties Union. They hope to, among other things, decouple terms-of-service agreements from the Computer Fraud and Abuse Act.
Wilson argues that giving software developers and their lawyers the power to effectively write federal law creates an imbalance of power between algorithm designers and users that will only become more dangerous as algorithms push their way into more and more aspects of their lives.
“I suspect things will get worse before they get better because we keep moving toward more complex machine learning that’s harder and harder to interpret, and it’s becoming commoditized so that everyone has access to it,” he said. “There’s a lot of snake oil out there, and you run the risk of something awful getting entrenched.”
As for Brian Russell, he’s still in limbo. Some Michigan unemployment claimants have gotten their money back, but the state has yet to review Russell’s case. They haven’t provided him with any information about why he was accused of fraud and fined in the first place. With a steady job and marriage on the horizon, things are looking up for him, but he believes he’ll have to declare bankruptcy to put the tax seizures behind him once and for all. He isn’t counting on ever seeing the money the state took from him. In the meantime, the Michigan Law Unemployment Insurance Clinic has helped him get a cease-and-desist order that prevents MiDAS from seizing any more of his money.
The MiDAS system is still running today, though the state has made modifications and implemented human oversight that it says has solved the problem. Meanwhile, Michigan Law’s Unemployment Insurance Clinic continues to review automated fraud determinations.
“If I get my money back, that would be great. I meet other people around town who have gotten settlements,” he said. “But I try not to get my hopes up.”
A NUTRITIONAL LABEL FOR ALGORITHMS
U-M professor H.V. Jagadish’s labelling system for algorithmic systems evaluates their safety and effectiveness. It breaks down algorithms according to five key attributes:
RECIPE:
Lists the factors that the algorithm considers, along with the weight given to each factor by its designer. This provides an easily understandable window into the designer’s intentions.
INGREDIENTS:
Lists the same factors, but instead of showing the weight given to each factor by the designer, it shows the actual correlation of each factor to the algorithm’s final output. Put another way, the Recipe widget shows intentions while the Ingredients widget shows results.
STABILITY:
Determines whether slight changes in the data going into an algorithm can lead to wild and undesirable swings in the results coming out. The stability score is determined by feeding a ranking algorithm, for example, a range of hypothetical items and plotting the results on a line graph. A steeply sloping line indicates more differentiation between ranks and a stable graph. A flat line indicates an unstable graph — the ranks are too close together and could be influenced by noise in the data or very small changes in the algorithm.
FAIRNESS:
Evaluates whether the results of an algorithm show statistical parity with a given attribute. Using the MiDAS example, it could determine whether female claimants are flagged for fraud disproportionately to men, for example. A low fairness ranking could reveal a flaw in the algorithm that causes it to treat one group differently than another.
DIVERSITY:
Related to fairness, shows a graphical distribution of both the diversity of the overall population in a range of categories, and the diversity of the algorithm’s results, providing a broader view of how different demographic groups are handled by the algorithm.