Towards trusted data sharing: guidance and case studies
Overview
Creating value from data requires organisations and individuals to have access to data and use it effectively and appropriately. This project sets out to investigate one aspect of this: trusted data sharing. Scaling up data sharing activity, where it meets business and privacy needs, will help to release value but it requires sources of friction to be reduced and it must be done in ways that maintain, and do not erode, trust.
A series of ten case studies illustrates emerging examples of data sharing and some of the key enablers and constraints, including governance, business models, technologies and regulation. A practical checklist for organisations has been created, drawing on lessons from the case studies. Further work is needed to raise awareness of the opportunities, to progress approaches to data sharing and guidance, and to share best practice.
Through this project and other initiatives, the Academy aims to play a role in supporting better use of data in engineering sectors and in the broader economy. Best engineering practice is a vital part of realising the opportunities with its focus on the interface between technical systems, people and organisations. Data must be assembled, structured and managed over its lifecycle so that it meets business or other requirements, for which a robust engineering approach is needed.
Data sharing may be just a small, but potentially significant, part of the requirements and its success relies on the engineering being done well. Recognising the inter-disciplinary nature of the challenge, the Academy has drawn on the expertise of a wide range of stakeholders, through the working group, individuals interviewed as part of the case study research and reviewers.
This publication is for organisations that have identified the opportunity to create value through sharing data and wish to collaborate with others to develop solutions. Policymakers in government and other stakeholders who play a role in promoting the many opportunities and developing enablers for data sharing may also draw valuable lessons from this publication.
Data sharing opportunities
The opportunities for organisations to use data to improve products and processes and to innovate are widely recognised, with ensuing benefits for the economy and society. Static, regularly updated or real-time data may be collected, stored and processed by a single organisation, to improve their own business processes or create new products and services. [1]
The opportunities to create value increase when data is shared or exchanged across organisational boundaries. For example, one organisation may share weather data with another organisation that needs accurate weather forecasts for its business. Alternatively, an organisation might have valuable data while another has the expertise to create products or services from it, as is the case with companies who use authorised NHS patient data to develop artificial intelligence-based diagnosis tools for use by NHS clinicians. Additional value might be realised if data can be shared securely across sectoral and international boundaries.
The ability to reliably link datasets also creates new value. Organisations may share data that can be linked using a common identifier. For example, child-level data in England is linked across different government departments in order to increase understanding of how decisions in the family court impact on children’s educational outcomes.[2] Value may originate from many organisations pooling similar data, since analysis on the larger, pooled dataset has greater value than that on a single dataset. For example, oil and gas operators share well component failure data to improve decision-making around operations and maintenance.[3]
In the case of physical infrastructure, data about different assets and their geographical location, along with operational data, may exist as separate datasets. The ability to share and link them would provide the real-time status of an infrastructure system; the power flow in the electricity transmission grid, for example. In cities, passengers are provided with journey options via apps on the basis of linked real-time data from multiple sources.
Enabling trusted data sharing
Data may be commercially sensitive, or it may relate to individuals, with associated privacy requirements as legislated in the General Data Protection Regulation. Concerns about the sensitive nature of data can restrict data sharing. When one organisation’s data is accessed and used by others, appropriate frameworks are vital in order to ensure that data sharing meets commercial, regulatory, legal or ethical requirements and promotes trust. Trust might be enabled through the following activities:
- Ensuring that costs and benefits of collecting, storing and using data are fairly distributed.
- Defining and agreeing how data is used, with enforcement mechanisms if an agreement is breached.
- Ensuring people with the necessary skills are managing the data.
- Providing assurances about data quality, provenance and timeliness.
- Using anonymisation or other privacy enhancing techniques to enable access to data while preserving privacy or respecting commercial sensitivity.
- Ensuring that storage and transmission of data is secure.
Ways of facilitating the discovery of and access to data, where it is collected by more than one organisation, are also needed. For example, data platforms may be used to host and manage shared data from multiple organisations, and data may be accessed using application programming interfaces (APIs). Standards help enable interoperability between datasets, but also between data platforms so that it is possible to search for and access data wherever it is hosted.
Ideally, the data format and quality are appropriate to the purpose but in practice, data may need to be cleaned or converted to the right format to make it usable. Approaches to guaranteeing the provenance of data and to data linkage are also required. Good data management is at the heart of successful data sharing.
Data sharing may underpin new business models. New sources of value originate from the novel uses to which shared data is put, or alternatively from the activities that enable data sharing. New roles are emerging such as data enhancer or data broker. Intermediaries with their own business models may catalyse the data sharing opportunity or develop the technologies that enable it.
Above all, strong oversight ensures the opportunity is realised well and that it meets commercial, regulatory, legal and ethical requirements. Where many organisations come together to share data, ideally all would be represented in the oversight body. The participation of consumer organisations or patient associations may be appropriate where personal data is used. Oversight mechanisms are sustainable if they can be maintained even when one participating company is purchased by another. Transparency and accountability are vital so that an independent body can scrutinise the data sharing arrangements and the enforcement of any rules.
As data sharing becomes more pervasive and new ways of sharing data emerge, broader questions need to be addressed, such as whether access to data is equitable and promotes or stifles competition, and what the implications are for society and the economy.
Outstanding challenges
This work illustrates the multiple technical, economic and governance issues involved in setting up a data sharing solution. Each of the case studies in this publication have tackled some or all of these issues. They provide insight into how these are being addressed in the real world and possible approaches to finding solutions.
The challenges have not been completely solved, however. For example, there is still uncertainty about defining data rights and ensuring that data acquired by one user is not copied and distributed further. Some business models remain unproven.
The effort required to share data often reflects the imperfect context in which data sharing occurs: inadequate data quality requires data cleaning effort, a lack of standard data formats makes data linkage tricky, and an inadequate legal framework makes the creation of robust data sharing agreements challenging. As these aspects and others are increasingly addressed, the friction involved in data sharing will decrease.