Towards trusted data sharing: guidance and case studies
Case study 5
DAFNI :
Data and Analytics Facility for National Infrastructure
DAFNI is a secure facility for assembling, hosting and creating datasets on infrastructure assets and networks, and the human and natural environments in which they are located. It will provide a shared data resource with professionally managed security and access arrangements, to ensure the appropriate level of assurance around commercial confidentiality and the protection of sensitive personal data. It aims to reduce the technical challenges and security risks associated with consuming data used in infrastructure research, simplify the mechanisms for accessing data, and increase the efficiency of data discovery, consumption and utilisation.
Summary: the eight dimensions of data sharing
The opportunity:
Creation of a national infrastructure database and repository of modelling tools to enhance capacity for researchers and practitioners to analyse performance and resilience of infrastructure systems.
The data and its use:
The project will gather infrastructure asset and network data from multiple sources across infrastructure sectors, socio-economic data, geospatial data, consumer behaviour characterisation data to be used in infrastructure modelling, simulation and visualisation.
The business model and value creation:
The project has received capital funding from UK Collaboratorium for Research on Infrastructure and Cities (UKCRIC). Operational funding is expected to come from research projects, UK Research and Innovation operational funding of research infrastructure, government departments and agencies with infrastructure responsibilities and business.
The model for data sharing and the partnership:
The technology provides a common platform for controlled access to data by infrastructure researchers, government and business.
People with the right skills and expertise:
The project is part of the UKCRIC, with academic partnerships led by the University of Oxford and industry involvement. The facility was developed by the Science and Technology Facilities Council.
Constraints on how data is shared and used:
Commercial sensitivity, national security and GDPR are all constraints.
The data architectures and technologies:
DAFNI is a platform that enables the management, validation and quality assurance of data. It provides a multi-model database architecture that allows storage of data types, including geospatial, columnar and network in their respective optimum storage formats.
Governance / oversight / enabling trust:
It is a shared data resource with professionally managed security and access arrangements. These ensure an appropriate level of assurance around commercial confidentiality and the protection of sensitive personal data.
Introduction
DAFNI is a major UK national facility to advance infrastructure system research. It is currently a capital project lasting four years, funded by the Engineering and Physical Sciences Research Council, delivered by the Science and Technology Facilities Council (STFC), and overseen by a governance board of 11 universities, chaired by the University of Oxford. It is both a hardware and software project, providing the platform for a national infrastructure database as well as the capability to carry out infrastructure modelling, simulation and visualisation. [39]
Infrastructure systems modelling is used for a variety of purposes. For example, it may be used to examine the big picture: how infrastructure needs might change in the future, where investment is needed and the benefits. Other types of data such as sensor data might inform real-time modelling that is used to forecast faults in the system, where failures might occur, and to inform maintenance regimes. Modelling can also be used to examine the nature of cascading failures, either because of cyber-attacks or natural hazards, through network infrastructure risk analysis.
DAFNI will not itself develop methods of analysis, but will instead provide the platform to make the analysis more convenient, robust and accessible. The aim is to encourage others, such as government or academia, to supply their data and models to the facility. While utilities companies are unlikely to put their own operational and control models on the platform, it provides a facility for researching and developing operational applications.
The attraction is that the models sit above a properly curated national platform, with high-performance computing and cloud computing resources available. The challenge is to build momentum and achieve positive spillovers that result from data and models residing in a common place.
The software platform is being developed in tandem with a series of pilot projects that are bringing in academic models, building up the capabilities of the database, and demonstrating its use to others. These include projects on optimal maintenance and real-time operation and control. The database will be built up in a pragmatic way focusing first on the datasets required for pilot projects, while also brokering arrangements with data providers.
DAFNI builds on work carried out around NISMOD, a National Infrastructure Systems MODel developed over the last seven years, which made use of the NISMOD-DB++ database developed by Newcastle University. The database contains several types of data including: asset and network data; usage and demand data – including the factors driving demand; building data; geospatial socioeconomic data; and new sources of big data such as consumer behaviour characterisations.
“Infrastructure models sit above a properly curated national platform, with high-performance computing and cloud computing resources available”
Business model
Once the capital project has been developed, the project will need a sustainable business model to ensure that it has the resources needed for operation, maintenance and upgrade. In future, funding sources might include: continued research council funding and service-level funding; funding from government departments who have migrated models onto DAFNI; and businesses such as utilities companies.
“The database provides the processes necessary to ensure that security, provenance and traceability of data are addressed”
Technical and data curation arrangements
DAFNI provides the processes necessary to ensure that security, provenance and traceability of data are addressed for both researchers and data providers. The database will allow the creation, management and publication of data assets, along with services that support data validation and quality assurance. One challenge is that the data required to understand current infrastructure and how infrastructure needs might change in the future is highly diverse – past work developing the NISMOD-DB++ database has addressed how to deal with this diversity. This is important to ensure that data formats are sufficiently consistent to allow interoperability and usability. The work needed to clean and reformat data to make it usable for modelling is highly labour-intensive.
Asset and network data includes data on power plants, road and rail networks, telecoms, and wastewater treatment facilities. Certain types of data are easier to access than others. For example, there are national databases containing information about power plants, their capacity and technologies, and National Grid makes data about electricity supply and transmission available online. Other types of data are more challenging to access if, for example, they exist in pdf format or they are commercially sensitive, such as telecoms data. Alternatively, different companies within a sector may hold useful information but in varying formats and degrees of accessibility. In the case of the water sector, this is a result of digitisation occurring after privatisation, and the absence of any initiative to develop a national picture of water assets. In some cases, the regulators hold useful information: for example, Ofcom manages to extract a large amount of information. Most geospatial data comes from the Ordnance Survey and contains the location of infrastructure assets, so resource is needed to add attributes about the data’s assets.
Census data provides a good resource for obtaining demographic data that informs changes in demand. Geospatial socioeconomic data that form the basis of future scenarios is also useful, including population projections and scenarios around regional economic growth. New sources of big data that allow consumer behaviour characterisation exist, such as those held by the Consumer Data Research Centre, include sales data, market research data and reward scheme data. Data on buildings is also important for determining the future demand for infrastructure.
Not all data will reside on DAFNI; instead, it will provide an index and a means of accessing the data. High volume datasets of secondary significance will reside elsewhere; for example, climatic data will remain on the existing JASMIN database at STFC Harwell. Increasingly, it does not matter where the data is physically stored unless there are strict security requirements. Ideally users should be able to build applications based on data without needing to be aware of its location or format.
Timeliness is a challenge, and there is a spectrum of capability in data suppliers. Some organisations have automatic updates and timed releases. Others have static datasets that were collected at a past point in time, and that are updated in an ad-hoc and undocumented way. In this case, it would be necessary to note that this has occurred.
Legal and commercial arrangements
There are several ways of commercialising the platform, which includes high-performance computational hardware, as well as database facilities. For example, technically advanced users could pay to use the platform, where they see the benefits of carrying out model development or analysis on the platform.
The platform could sell data products or pay-per-use analyses, requiring another tier of application to be built on top of the underlying data and analytical capability. Alternatively, a user could pay to develop a service on DAFNI and then take it elsewhere. It is intended that the platform will trigger interest from third parties.
“One of the challenges is encouraging organisations to open up their data in the public interest or for academic purposes”
Outcomes and lessons learned
- The database will be most successful if it holds a critical mass of data. One of the challenges is to encourage organisations that hold data that is in the public interest or for academic purposes to open it up. Even where national databases are nominally accessible, there may be restrictive licensing and pricing arrangements that preclude access to data.
- A further challenge for DAFNI is raising awareness and getting people on board. The approach of carrying out rapid prototyping and agile software development, along with early pilots, is useful so that the benefits can be demonstrated early in the process. It is important to help people understand what the database is – their mental model may be very different from the reality.
- There is potential for the database to link to the work on developing digital twin pilot projects. A digital twin is a digital model of national infrastructure that will be able to both monitor infrastructure in real-time and simulate the impacts of possible events. [40]