Towards trusted data sharing: guidance and case studies
Practical challenges
This section sets out some of the key practical challenges in data sharing. In some cases, these require further development effort to create a working system of data sharing or better data management, rather than more research.
Good data management
Good data management is a critical part of effective data sharing. It must respond to an organisation’s business requirements and consider data’s full lifecycle. Data sharing may be just a small, but potentially significant, part of the lifecycle management. Considerations such as whether the data is being updated, or how it might be securely destroyed, are relevant here.
The quality of data and metadata is a key consideration, as it in turn affects the quality of the data analytics and the confidence to make robust decisions based on the outputs. In the case of personal data, the General Data Protection Regulation includes requirements around data minimisation, the right to be forgotten and the right to an explanation, which all need to be addressed as part of data management.
If data is shared, it will be vital to ensure that data is being managed in line with data-sharing agreements between parties exchanging and using the data. Commitments to manage and use data appropriately may extend over many years. This is even more important where tensions exist, for example, between the need for an organisation to meet its own aims and the need to maintain privacy or meet other regulatory or legal requirements.
Data integration and data linkage
Ideally, shared data can be considered integrated if it can be treated as if it came from one system. It needs to be accessible and it should be possible to query the data meaningfully in the same way wherever it originated. The use of integration data modelling techniques may be useful here. [6]
In practice, standard data formats may not exist, and therefore approaches that facilitate data sharing and linkage will be required. These can be considered in two stages.
The latter stage requires specialist skills, and indeed data linkage is becoming a skill set in its own right.
Facilitating data sharing and linkage
Stage 1:
Creation and storage of data in a way that makes it easier to share. For example, raw data in spreadsheet form is easier for others to use than data in pdf format. A vital part of this is clear, readable and accessible metadata describing the data.
Stage 2:
Where datasets from different sources are being linked, the development of strategies, methods and tools for combining datasets.
Engineering the enabling functions of data sharing
Internal components need to be engineered to enable data exploration, interoperability, identity management, quality control, and monetisation, for example. These enabling functions must include metadata management, creating data catalogues, managing access to data, and handling contractual obligations to destroy data, for example. Governance arrangements and technologies ideally work within the necessary legal structure, while delivering the intended benefits to participants. The case studies illustrate solutions that involve multiple participants, enabling functions and types of transaction, underpinned by enabling technologies.
Figure 3 (below) attempts to conceptualise the challenge with a diagram of a data-sharing system, illustrating the range of type of participant, enabling functions, transactions (either financial or data) and enabling technologies. Enabling functions are grouped into related activities and colour-coded accordingly. In practice, a data-sharing arrangement will include some combination of these elements, which depends on the application and the nature of the data being shared. Figures 4 and 5 (below) illustrate this for two of the case studies, the Data and Analytics Facility for National Infrastructure (DAFNI) and Smart meters.
Figure 3: Data-sharing system
Figure 4: DAFNI
Figure 5: Smart meters
Data sharing and digital twin technology
Data sharing is a key element of the plans to create a national digital twin for infrastructure – a federation of digital twins that will enable better decision-making in the delivery, operation, maintenance and use of infrastructure.[7] A digital twin is a digital representation of entities such as assets, processes or systems. Data from these entities is fed back into the digital twin, which in turn supports improved decision-making about that entity.
Digital twins may exist for a variety of purposes and operate at a range of scales. The ability to share data in a secure, resilient and interoperable way between many digital twins is a requirement of the national digital twin and requires the development of an information management framework.