Power doesn’t lie in gathering data but in analysing it

The revelation that the US’s National Security Agency (NSA) is gathering phone call metadata and online interactions such as e-mails and discussion forums has the world abuzz.

Legal issues aside, there is a broader debate to be had about Americans’ belief that technology alone can solve the US’s gravest problems.

Data mining technology is only as good as the inherently human effort to determine which data are relevant. This is art as much as science. Unless we begin to value this critical human effort, data mining will not yield results that make us safer.

Americans have always excelled at technological innovation and have admired the alluring rationality of science. The country grew rich on the readiness of Henry Ford and others to ask humans to imitate the regularity and efficiency of machines.

Today, defence establishment seeks to find ways to eliminate human error from the all-too-human practice of war. Thus, the development of “smart” weapons able to hit targets with superhuman precision.

The ability of a supercomputer to crunch incredibly large data sets has allowed some to argue that we can bypass human analysis altogether.

The underlying belief in the “power of big data” inverts reality. It isn’t the data that are powerful. It’s the people whose insightful grasp of the context of a particular phenomenon who are powerful. And it is their ability to build algorithms that capture expression of those contexts in massive data sets that we should be focusing on.

Right after the NSA issue broke, former CIA director Leon Panetta’s chief of staff, Leon Bash, remarked: “If you’re looking for a needle in the haystack, you need a haystack.”

Not so. Actually, what you need is an accurate narrative, or theory, about the needle and how to characterise it.

Moreover, its characteristics must leave digital indicators, if you are planning to search in a digital haystack.

There must be enough examples of needles in the world for researchers to be certain that they can distinguish a needle from a stalk of dried grass. Without a precise sense of how to recognise a needle, all you will get are a lot of false positives.

Any process for selecting particular data from a larger set represents a story about the world outside the data.

It has been almost a quarter of a century since Princeton professor Orley Ashenfelter used statistics on rainfall and temperature to predict the quality of Bordeaux wines.

The reason that Ashenfelter could compute the value of a wine using statistics is that he had developed a strong theory about how rainfall and temperature combine to produce good wine. In other words, he imposed a pre-existing story onto data and correctly collected the particular data that served the story.

Imagine what would have happened if he had also collected statistics about the rise and fall of the population in France, the number of agricultural strikes per year and the number of cars travelling on national highways each month

These are all eminently collectable data points, but he may not have had such success and may have simply drowned in an oversupply of information.

In the past decade, the US has by all appearances put enormous energy and resources into developing the capability to collect and process digital and digitiseable data, and to make the data available to analysts in a format that makes intuitive sense. The technological challenges are not insignificant and advances that have been made are impressive.

But these technological challenges pale before the much more complex task of determining the factors and circumstances that lead people to political violence.

And they pose a mere fraction of the challenge analysts have in transcending their own and their institutions’ biases and assumptions as they develop their theories about the meaning of the data.

These assumptions, as we know, are often unconscious. Some of them lie in the human predisposition to take mental shortcuts in order to make sense of the complexities around us. These include the tendency, described by Nate Silver in his recent book The Signal and the Noise, to elevate the importance of those data that are easy to access and dismiss data that are difficult to collect.

The content of online searches may not be the best data for analysing political violence. But it is easier to collect this than it is to develop an on-the-ground nuanced understanding of behind-the-scenes conspiracy building in, for example, Peshawar.

In other words, if you are looking for a needle, and collection technology makes it easy to build a haystack in which to look, it would be an entirely understandable human tendency to elevate the importance of the haystack.

The task facing those seeking to use data mining to support counterterrorism is not fundamentally different from the detective work that has always faced the investigator.

Intelligence is the job of selecting and putting together evidence into a feasible narrative. But it also requires having a nuanced sense of which evidence to look for to fill in the developing story. Poor stories will lead to poor data extraction. Collecting more data will not solve the problem.

We will not make the necessary advance with an imbalanced focus on the technological capability without an equally strong focus on our human capabilities.

We need critical thinking that helps us defend against our own biases, knowledge of societies and histories in which we are engaged, and imaginative and nuanced understanding of how statistical data do (and do not) express social patterns.

In order to understand problems that are fundamentally social and political, such as international terrorism, analysts need encouragement from their leadership to relentlessly interrogate their own narratives. Is the story we are imposing on the data the right one? Are we exploring the right data? Are we using the data because they are the right source for insight or simply because they are available?

This encouragement can be reflected in the allocation of resources to projects that develop the human side of cyber security.

And it needs to be reflected in the education and training of national security professionals, in the hiring process and via a culture of appreciation for the degree to which cyber is a human endeavour.

Above all, Americans should recognise their technological bias and their tendency to tell themselves the story that technology has self-generating power.

Perhaps that means developing a greater faith in their ability to stay critically engaged in a complex world using the power of knowledge and imagination. That would be an excellent starting point to learn the right lesson from the NSA story.

Amy Zalman is the Department of Defence information integration chairwoman at the National War College in Washington, DC. She is also an adjunct instructor at the Joint Special Operations University at US Special Operations Command in Tampa, Florida. The views expressed in this article are solely those of the author and do not necessarily reflect the policies or positions of the National War College or the US government. This article is reprinted with permission from theGlobalist.com (Follow theGlobalist on Twitter @theglobalist)