Aggregation is biased towards anonymity

Did the EU Court of Justice’s compromise on the right to be forgotten get its inspiration from a US law’s attempt at solving a logistical problem?

I’ve written about the bias of aggregation towards anonymity in Anti-Viral, published by SecurityCurrent.  In that piece, I show how the EU’s decision reinforces the idea that aggregation, the results of an on-line search in this case, is somehow inherently biased towards anonymity.  It is only when you know the subject’s name first and use it in the search query that you might have the results limited by the subject’s right to be forgotten.   The search engine is only required to suppress “forgotten” data points under those circumstances.  If the query is more generic, then displaying those data points are ok.

The Court found:

36      Moreover, it is undisputed that that activity of search engines plays a decisive role in the overall dissemination of those data [about an individual] in that it renders the latter accessible to any internet user making a search on the basis of the data subject’s name, including to internet users who otherwise would not have found the web page on which those data are published.

(EU decision, emphasis mine)

In other words, if you already know a person’s name and are looking for things about them, then they should have some control over that.

BUT, if you’re searching for something generic and can therefore be assumed to not be looking for things just about that person, then their data can be included along with everyone else’s.  Because non-subject based searches are NOT focused on that person (i.e., they are anonymous).

How did they come up with that?

Perhaps they took their lead from the Federal Data Mining Reporting Act of 2007.  It defines “data mining” as an activity…

…involving pattern-based queries, searches, or other analyses of 1 or more electronic databases, where—

 (A) a department or agency of the Federal Government, or a non-Federal entity acting on behalf of the Federal Government, is conducting the queries, searches, or other analyses to discover or locate a predictive pattern or anomaly indicative of terrorist or criminal activity on the part of any individual or individuals;

(B) the queries, searches, or other analyses are not subject-based and do not use personal identifiers of a specific individual, or inputs associated with a specific individual or group of individuals, to retrieve information from the database or databases;

(Data Mining Reporting Act of 2007, Section 804(b)(1), emphasis mine)

The short Act is an attempt to require the Federal Government to be more transparent in its data mining activities.  But, I believe the people that wrote it must have recognized the logistical problem of reporting to Congress on every query run by every analyst in the course of every investigation.  Indeed, the 2014 report to Congress that was recently issued in compliance with this Act is only 9 pages long.

To be clear, assuming all search queries are logged, the logistical problem of reporting every query to Congress would be that it would be like asking the Legislature to read thousands of pages of system logs; no meaningful read possible.  In other words, the US definition had nothing to do with any “right to be forgotten” or some recognition that searching by subject is special.  Just legibility, logistics.

In any event, these two fit together.  The Federal Data Mining Reporting Act defines data mining as “not subject-based” and the EU Court of Justice explicitly protects that very activity, i.e., queries that do not “search on the basis of the data subject’s name”, in their decision.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.