When it is presidential primary news. When the “family feud” is more newsworthy than the data. When there are no less than four parties involved who one can identify as data custodians of one kind or another.
In a single sentence, the incident can be described as follows (the four data custodians are numbered in parenthesis):
Through a temporary bug in the software maintained for the (1) Democratic National Committee (DNC) by a (2) third party service provider, individuals working for the (3) Bernie Sanders Campaign for the Democratic Presidential nomination were able to access some information on voters that should only have been accessible to individuals working for the (4) Hillary Clinton Campaign.
The Security professional’s description of the incident would go something like this:
A Role Based Access Control defect was introduced by a third party service provider when they performed a routine promotion of code into production. This resulted in unauthorized access of one entity’s data by another, valid entity. The access was by authenticated users and was logged by the system. The defect was corrected through the emergency change management procedure and data are no longer accessible to unauthorized users. Future incidents can be prevented by more robust testing of security controls prior to releasing code into production. The third party service provider has indicated they will be implementing such steps going forward.
The Privacy professional’s description of the incident would be a little different:
A software defect allowed authorized users from one entity to view confidential information which belonged to another entity. According to the third party vendor, the only records that the users could see were those they had access to anyway. It was only certain campaign specific attributes that were viewable in error. According to the complaint files by the Sander’s campaign, the information available in the file was personally identifiable information (PII) and it included but was not limited to: “demographic and geographic data for registered voters (such as name, address and jurisdiction); email addresses; voter registration status; telephone numbers; vote history; commercially acquired consumer data; ethnicity information; political party preference or affiliation, if any; candidate preference data, if any; and other key analytic metrics selected by the DNC.” (https://berniesanders.com/wp-content/uploads/2015/12/Bernie2016vDNCComplaint.pdf ). An audit of access logs indicated a maximum of 24 times that users seem to have inappropriately accessed data. The third party vendor has provided assurance that no data were downloaded or printed and that their software prohibits that. The defect has been corrected. The Sanders campaign has terminated the employment of one individual who inappropriately accessed the data.
‘Cause that’s how we talk. Security and Privacy professionals focus on enforcement, control effectiveness and data classification. On detection and response.
There are three things I want to point out about this:
- First, there’s the fired individual. As I have pointed out elsewhere: “The organization’s need to generate value from the data it collects and the analyst’s need for the raw material of their work are powerful motivators for creating large datasets.” Data are the oxygen that the data analyst breathes. Without it, they suffocate. If the fired analyst did run those searches intentionally and knew it was wrong to do so (and since he was terminated, we must assume that the Sander’s campaign believes that happened), then we need to consider that they were seduced by how pure and rare that oxygen was. The vulnerability on display here is not just that data access can be flawed when multiple data owners share a service provider, but that data analysts can be tempted to breach more than single records. We are used to cases of health care workers snooping on the individual record of a celebrity, but this is different.
- The actual voters’ did not have their privacy breached. All the confidential information about them that the DNC has, and even the personally identifiable information about them that may be available through public records (like address) were obtained and stored legally. Assuming the service provider’s description of the access defect is correct, the Sander’s campaign was only able to access records on individuals they were entitle to access anyway. All that the Sander’s campaign accessed that was “wrong’ were attributes that had been stored/generated by or on behalf of the Clinton campaign. So, unlike many consumer data breaches, the individuals whose information was breached will not be getting letters or offered credit protection.
- Having determined that individual’s information is not exactly what was disclosed, you might ask what was breached? That is not actually specified anywhere I can find, but we can assume that it either included specific donation information (which I believe individuals giving $100 dollars or more are required to make public) or cohort identifiers created by the Clinton campaign to target various fundraising and voter outreach efforts (the third party report of the incident mentions something called “client scores”). What’s clear is that the importance of what was breached was the information that allows voter data to be sorted and grouped into cohorts. As I have also pointed out elsewhere (here and here), this is the new frontier of identity. That the Sanders campaign fired the individual that executed the searches that appeared to be deliberate (some seemed incidental), that the DNC reacted to the breach by suspending all access to Sanders campaign workers and that the Sanders campaign subsequently sued the DNC to have its access restored shows how serious access to these identities has become.
Indeed, the data science professional would describe the incident something like this:
An analyst with access to a dataset of primitives was inadvertently given access to derived attributes that had been generated at some expense by analysts working for another organization. These characteristics were proprietary and the access was inappropriate. The access has been removed. All queries ran were not able to be saved, downloaded or printed so the search results were of limited value.
When is a breach notification not a breach notification? When the breach is of intellectual property stored as a dataset of individual records.