Data use has become intrinsic to the operations, strategy and business models of many companies across a wide range of industries. It has, in fact, grown exponentially in recent years, rising to a massive 44 zettabytes overall in 2020: 40 times larger than the number of stars in the observable universe. By 2025, IDC estimates that there will be 175 zettabytes of data in the global datasphere.
Simultaneously, data is helping to optimize, improve and render many sectors, such as healthcare, manufacturing and financial services, more efficient. Electronic health records (EHR), for instance, enable not only data ownership for patients, but also allow for better care in hospitals, better monitoring by insurance, ultimately contributing to better health research and reduced healthcare costs. Similarly, in retail, internal data-driven platforms are helping firms cut costs and optimize efficiencies. Starbucks’ Deep Brew, for example, uses data collected from members of the Starbucks rewards program to gather information and display predictive orders to baristas before customers order their drinks.
In manufacturing, big data solutions have also been cutting costs; Intel’s factory equipment, for example, generates live data which is analyzed to recognize patterns and detect faults, alerting engineers to any areas that require immediate attention to prevent breakdowns on the shop floor. This has been estimated to reduce reaction time from four hours to 30 seconds, saving Intel an estimated $100 million per year.
Yet, policies and regulations governing the use of data are slowly beginning to emerge, albeit in patchwork fashion, along with growing scrutiny from many journalists, activists and the general public. This presents investors with a pressing need to focus attention on current and evolving data use practices and the intended and unintended consequences pertinent to the companies in which they invest.
Objective & Methodology
This report highlights key known causes and effects related to data use practices internally within organizations. It is not a comprehensive study of all use cases and potential issues resulting from data use practices; rather, its academically rigorous review lays out critical emerging issues that investors ought to be aware of and suggests possible strategies that can guide companies to mitigating their potential risks.
The aim of the report is to support investors in developing knowledge and frameworks to positively influence organizations in using data correctly, by following technically rigorous and ethical processes.
Data use generally falls into five key steps that outline its course and practices within organizations:
- Data Collection: obtaining informed consent, limiting collection of personally identifiable information (PII), collection bias and downstream bias mitigation
- Data Storage: data security, right to be forgotten and data retention practices
- Analysis: missing stakeholder perspectives, dataset cleansing and bias, honest representation, privacy in analysis and auditability
- Modeling: selecting fair variables and metrics, using statistical best practices, testing for fairness, and communicating model limitations and biases to users
- Deployment: anticipating and preventing abuse and unintended consequences, redress, roll back procedures, and identifying unintended uses
Together, these steps highlight the areas where data can become skewed, leading to negative outcomes downstream. We link to, and discuss those downstream effects in the next section.
Five Deep Dives
The Deep Dives delve into the principal areas of focus for much of the literature reviewed, and where risks have become increasingly apparent. They lay out applicable regulation and company-level examples, as well as potential mitigation practices for each risk, with use case examples, as follows:
- Deep Dive 1: Ownership of data and monetization
- Deep Dive 2: Identity-based data profiling
- Deep Dive 3: Discriminatory pricing
- Deep Dive 4: Surveillance
- Deep Dive 5: Technology addiction
Tools and Instruments
The final element of the report lays out the findings across 50 organizations that have developed best practices, standards and principles touching on data use. The findings here also point to a lack of focus on this critical area, with most tools only tangentially touching on data use.
While data use has skyrocketed across industries and use cases, policy-makers, law-makers and ethically-focused not-for-profit organizations have started to scrutinize data practices. Under the general headline of ‘tech-lash’, EU legislators in particular have cracked down on certain issues around privacy, content moderation and transparency. Overall however, regulation of data use technologies is only slowly catching up and is often disparate across geographies and regions, as evidenced case by case. As more and more risks and harms are identified, it’s imperative for investors to influence companies to act now, take data use issues seriously and where possible, mitigate the risks.