Cybersecurity / Data Science / Privacy

I am not a number, I am a data point

Identity has changed.  The post-World War II generation was concerned about being identified as a number. The sight of emaciated humans with identifying numbers tattooed on their inner forearms made this very real and very scary.  By the late 1960’s the TV show The Prisoner portrayed the hero objecting on a regular basis: “I am not a number, I am a free man.”  Now, people are used to being referred to as not just one number, but a collection of numbers, each used at different times and in different contexts.  They even protect the numbers that they are assigned and expect those that issue, collect and transmit those numbers to keep them confidential and away from hackers.

I was at the Privacy & Security Forum held at George Washington University last week and Peter Swire and Bruce Schneier were discussing privacy and surveillance.   Describing the value of big data, Schneier mentioned that as an individual, you feel an ownership of the data about you but there are very large benefits to society if data can be aggregated.  The example he gave was having a database containing the medical history of everyone in the world.

Granted there are some wonderful benefits to big data.  But I think what was missed in the conversation was the fact that there is a concept that is assuming emerging importance because of the world of big data.  A concept that is situated between “the individual” and “the world.”  That’s “the cohort.”

The reality is that we are all members of cohorts used for analytic enterprises and once we and our data are identified, we hardly ever have an option about what cohorts we are placed into.  Unless you lie about your demographic characteristics (what professors Helen Nissenbaum and Finn Brunton classify as a type of obfuscation) your identity is merged with others.  And even in some cases where you misdirect the data collection systems, some of the most sophisticated data science can account for that.   In traditional credit scoring models, “reject inference” is used to infer the behavior of the people who reject the offer of a particular credit card and/or whose applications are rejected.

I write about this in more detail in the ebook Big Data and us little people.  Chapter 4, “For Whom the Bell Curve Tolls” examines how being in a cohort is a new state of identity.  The chapter discusses how being in a cohort is something of a loophole in the world of privacy because once you are in that cohort, that group, that result set, it is assumed you are somehow anonymous.  Sometimes that is because the data records are stripped of identifiers, but not always.

2 thoughts on “I am not a number, I am a data point

  1. Pingback: When is a breach notification not a breach notification (part three)? | {Cyber Security}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s