I generally like the General Data Protection Regulation (GDPR). It acknowledges privacy rights exist in the digital world. It defines the following three key areas of a comprehensive data protection law:
- Data Security
- Data Privacy
- Data Consent
Security and Privacy were already previously regulated, and all companies that control personal data should already have been taking measures to secure that data internally and externally. The sad fact is that even big businesses have failed at the most basic data security protections – Equifax. Self-regulation has failed, and like an old car that costs more to maintain than replace, the cost of self-regulation failures is too high to not replace it with well thought out legislation. The Digital Revolution has matured, and like many disruptive industries in transition from new to normal, it needs the right legislation to continue to grow. We can talk about personalization, automation, and AI all we want, but until we solve the privacy issue, new business models and innovations will be stunted as they are entirely dependent on using data.
Americanizing the GDPR
Since security and privacy have enjoyed a long and healthy discussion, I will focus more on consent related issues.
For all its glory, there is something in the EU’s GDPR that rubs some Americans the wrong way. I see two structural differences between the U.S. and the EU that may elicit this reaction:
- What it means to Anonymize and Aggregate data
- The difference between Primary Data and Secondary Data
In general, the American view differs with the European approach to these two topics. How we define the first difference determines where we categorize it in the second difference.
Anonymous or Pseudonymisation
When Americans think about Anonymization, they think about removing the identity from data so a specific individual cannot be identified. This may also include aggregation to further obfuscate identifying an individual. When the EU speaks of anonymizing data, it not only means an individual can’t be identified, but it also means that no two pieces of data can be correlated back to the same individual. This extreme obfuscation prevents anything from being known about anyone. While some may see this as a highly desirable state for data privacy, American pragmatism doesn’t consider this a state as it renders data useless for analytics. We don’t talk about it in these terms because there is no point in even storing such data. Whereas the European usage is technically correct, the American usage is pragmatically correct. Think Standard vs. Metric or Fahrenheit vs. Celsius. They each have their merits.
The GDPR labels useful anonymized data as pseudonymous data. What it means is that there is enough useful data to correlate data element A created by unknown person #1 with data element B created by unknown person #1. This correlation yields useful information that allows data processors to analyze it and gain insights of value while shielding the identity of the person. So, the popular idea of American Anonymization is equal to the European Pseudonymisation. Not a big deal as long as we understand the equivalency of the two terms. It becomes relevant in understanding the U.S. Federal Communications Commission’s (FCC) existing regulation (title47sec222) that requires Telecommunications providers to “aggregate customer information” where they practically define the process of Anonymizing (removing identity) and Aggregating (groups of unidentified individuals) data before using it for Secondary Purposes.
The anonymization discussion is really a case of overlapping terms with different definitions and can be resolved functionally by defining what each means by anonymization. The catch is, however, that the difference of definition also causes Americans to classify data in the Primary or Secondary Data class which in turn merits a different level of explicit regulation than the GDPR provides. I want explore these distinctions now and arrive at an understanding that will serve as a starting point for determining how data should be regulated.
Let’s define Primary Data as data that is consumer provided or consumer generated in the course of using a discrete service. For example, I want to consume a postpaid mobile phone service from a communications provider. To use the service, I have to supply personal information for a credit check and a billing address. I will need a device which will be addressable by a phone number, device number, and several other telco ids which will all be associated with me as a user of the service. In using the service, I will generate Call Detail Records (CDRs) that the provider will use to calculate my bill, monitor network service, process internal accounting, and understand internal business metrics.
All of these processing events use the data collected from the CDRs which were generated by me, the data subject. It is personal and it contains Sensitive Personal Information (SPI), Customer Proprietary Network Information (CPNI), and Personally Identifiable Information (PII). This data includes the history of where you have been (geo-location), who you have talked to, and what you have browsed on your smart phone.
The key point is that all of this data is required to provide me mobile phone service. As long as the data collector (the communications provider) uses this data (as the data processor) for the fulfilment of the primary service, then it is Primary Data. Primary Data does not need additional data subject consent beyond an informed and comprehendible End User License Agreement (EULA) or Terms of Service (TOS). Since the data is required and must be processed to provide this primary service, the consent is implied upon accepting the EULA or TOS. If you don’t accept those terms, you can’t use the service.
Secondary Data is defined as taking data generated for a contracted service (via EULA or TOS) and using it for a purpose other than providing that primary service. For example, a communications company uses my Primary Data to build a profile of my interests based on my browsing history and matches my interests with ads which are sent to my phone as I walk by a place of business catering to my interests. This same data generated as Primary Data just became Secondary Data when it was processed (to profile my interests) and used for a purpose (generating additional money from advertising) outside of the primary service that it was collected for.
Primary or Secondary Data
What about services like Netflix? They have Primary Data on what we watch. They also profile us minimally at the house hold level for our interests. Optionally, each member can register a separate identity and can be profiled on the individual level. They process Primary Data and are tracking my interests just like the communications provider does when they want to send me relevant advertising. Should the use of this Primary Data be classified as Secondary since the processing of the data has the same result of producing a profile of my interests? No, Netflix is processing that Primary Data as part of the primary service to recommend video content you might be interested in. These recommendations are aiding the primary service as I do not have 45 minutes to waste searching for content I haven’t seen every time I want to use their service. If Netflix were to turn around and use that profile of my interests to start showing me advertisements, then they would be using the data for a secondary purpose other than what I am paying them for.
Regulation that doesn’t kill Business
The GDPR doesn’t explicitly delineate between Primary Data usage and Secondary Data usage. By not distinguishing between the two types of data, data collectors/processors and their lawyers have been left with some confusion on what exactly they need to get explicit consent (opt-in) for. If you read the entire regulation, I think you will see that most collection and usage of data that I have described as Primary Data has been given an exemption to requiring explicit consent. IMO, I believe many EU companies are wasting a lot of money to secure the consent for this Primary Data that should be handled as implied consent in the EULA/TOS. The U.S. legislation can avoid unnecessary expenses and the burdens of consent gathering and enforcement on Primary Data by allowing implied consent to be granted for Primary Data usage in the EULA/TOS. All Secondary Data usage should be optional and require consent management to be implemented.