Towards trusted data sharing: guidance and case studies

Case study 1

Databox:

allowing individuals to control how they share data with other parties

Databox aims to increase consumer trust in the use of personal data by organisations by enabling transparency and control. Individuals generate many different types of personal data from mobile phones, smart meters and media streaming services, for example. In return for letting organisations access this data, individuals are given insights into what they have been doing or receive useful advice. Data may also be processed with other people’s data, with additional benefits to the individual and other parties. Databox enables individuals to limit the ways in which their data is used and understand the implications of any data release.

Summary: the eight dimensions of data sharing

The opportunity:

Databox enables individuals to control access to their personal data by service providers. It allows service providers to pre-process data to ensure shared data is minimised and possibly desensitised.

The data and its use:

Consumer data such as Internet of Things (IoT) data from home services or smart meters can be put to multiple uses.

The business model and value creation:

Consumers will be able to obtain insights from their own data. Commercial organisations will have access to a greater range of data sources. Databox has no fixed plans for commercialisation at the time of this research.

The model for data sharing and the partnership:

Technology facilitates direct interaction between consumers and third-party service providers. It has been developed by a university consortium with industry involvement, and has future commercialisation potential.

People with the right skills and expertise:

Databox has used academic expertise in containerisation and virtual machines, data analytics, human-computer interaction and accountability in the IoT ecosystem, as well as domain expertise from industry partners.

Constraints on how data is shared and used:

GDPR and acceptability from the consumers’ perspective constrain how data is shared and used. Databox enables adherence to Privacy by Design and data minimisation requirements.

The data architectures and technologies:

Databox uses ‘containerisation’ technology to mediate access to data held ‘at the edge’ rather than in the cloud. Its interface helps users understand what an app can analyse based on the data that they are willing to give.

Governance / oversight / enabling trust:

The consumer has control over which data is shared and understands how data will be used. Only the results of the processing are provided to the individual and app developer, after which the data and app are ‘killed’.

Introduction

Databox is a multi-partner research project funded by a £1.2 million Engineering and Physical Sciences Research Council (EPSRC) grant that runs for three years until October 2019. [22] The project came about as a result of a number of converging factors including the emerging data protection regulatory regime, the development of ‘containerisation', and an understanding of the value of personal data and the need to provide individuals with control of their own data. [23] The project also built on precedents such as Dropbox, a platform that enables people to share data and give others access to it. Databox allows individuals to manage, log and audit access to their personal data by other parties, countering the prevailing practices of aggressive data harvesting by certain companies. In return, they receive useful services.

The consortium of universities carrying out the project comprises Imperial College London, the University of Cambridge and the University of Nottingham. These partners bring technical expertise in containerisation and virtual machines, data analytics, human-computer interaction and accountability in the IoT ecosystem. Industry involvement includes the BBC, BT, Internet Society and Microsoft, as well as Telefonica and Open MHealth.

Data is processed locally ‘at the edge’, rather than in the cloud, with resulting computational and social advantages. [24] The former arise from reducing the need to transport large volumes of data over communications networks for processing at remote data centres. These datasets may originate from multiple connected devices and other sources. A social advantage is maintaining privacy and safety by reducing the risk of data breaches when data is being distributed via a network.

Both consumers and commercial organisations will benefit. Consumers will be able to obtain insights from their own data, while commercial organisations will have access to a greater range of data sources of appropriate type or granularity, enabling richer and more accurate analytics.

The Databox technology mediates access to the source of data, rather than holding data. Value is extracted from the data by apps that will be developed by third parties. Individuals can specify which bits of data the apps can access. Once the data has been analysed, the results are sent to the individual and the app developer. The data itself is not kept, enabling rich analytics while limiting access to the data. Databox uses ‘containerisation’ technology provided by Docker.

Other projects with similar aims to Databox include the Hub-of-all-Things and CitizenMe. [25, 26]

“Consumers will be able to obtain insights from their own data, while commercial organisations will have access to a greater range of data sources”

Business model

The project is at an early stage and has no fixed plans for future commercialisation, although there is interest from venture capital firms. Monetisation of personal data is not being explicitly considered as part of the project.

Large-scale analytics, such as providing market researchers with data from multiple people without the need for data brokers, could create value in the future. Utility companies would benefit if, for example, they could access data sources other than smart meter data. As more sources of data become open, value that could be obtained from the box will be increased.

For example, access to financial information would be greatly improved if banks gave their customers the ability to share transaction data with third parties. [27] An increase in the number of smart devices would also add richness to the analytics that are possible.

Apps would be paid for by the consumer, or by a commercial organisation such as a market research company. A public-sector organisation such as the NHS could create an app to enable data collection for the public good, for example, for health research.

“The interface allows the user to understand what an app can and cannot analyse based on the granularity or type of data that the user is willing to give access to”

Technical and data curation arrangements

The platform under development is open source, although the apps developed by third parties that process or analyse data will not be. Data may be from local or remote sources, such as online social networks or IoT sensors. The project is developing libraries that provide functionality for other apps to access the data in the form that they need, so that the app itself does not need drivers or other ways of dealing with the data.

‘Data negotiability’ is a central concept: the project team is developing an interface that allows the user to understand what the app can and cannot analyse based on the granularity or type of data that the user is willing to give access to. For example, the sampling rate may influence the ability of the app to infer certain information sufficiently accurately for appropriate decisions to be made. This is an area of research in the field of information theory.

Another important part of the project is how to ensure transparency for the user about who has got access to which data. Capabilities for intensive logging and auditing of access, operations, apps and data sources are being developed. One challenge is how potentially detailed and complex information is communicated to the user. User studies are planned to investigate how best to present such information. The project should enable organisations to be compliant with new data protection laws.

One area for discussion is whether Databox carries out aggregation and archiving of data itself, or whether it allows an app to carry out these functions. Databox could have an archiving facility for data that is not being stored elsewhere, with a certain level of aggregation capability. However, the intention is that Databox does not become a ‘honey pot’ that attracts potential hackers. Data will be stored on multiple servers.

It is intended that security will be included as a feature in Databox. The project is exploring the concept of Databox as a kind of ‘firewall’, so that it is possible to control the exposure of IoT devices to the internet, and for example ‘whitelist’ destinations for data. [28] Apps are contained and isolated from the network and they write inferences to an export module – another container that is also secure, and the output of the apps can be monitored. The assurance of apps is an intensive process and there are plans to vet every app launched.

Legal and commercial arrangements

It is envisaged that there will be some kind of service level agreement between the consumer and the app developer that underpins what data the app is allowed to access, the data rate, where it will be stored and the service that will be provided.

The research project is not investigating in detail legal issues such as liability, although the legal grounds will need to be addressed before the project becomes operational. Liability will be minimised by ensuring Databox is not created as a ‘honey pot’.

“A central challenge is enabling meaningful engagement by individuals in the management and sharing of their personal data”

Outcomes and lessons learned

  • The value to consumers and organisations will come from the ability to mix data types and sources that cannot currently be brought together. For example, the ability to correlate an individual’s physical activity with their sleep quality will be of value to that individual in enhancing their health and wellbeing.


  • A key challenge is ensuring that users understand the value of their data, why privacy is important, what the nature of the data accesses are and the consequences of data accesses. People are not very risk-aware, for example, they may be unaware of, or indifferent to, sharing personal data with companies via connected home devices. [29]


  • A further challenge is enabling meaningful engagement by individuals in the management and sharing of their personal data.