Towards trusted data sharing: guidance and case studies
Case study 9
Grampian Data Safe Haven (DaSH):
a facility for sharing sensitive health data
The Grampian DaSH allows the secure processing and linking of health data for the Scottish population when it is not practicable to obtain consent from individual patients. A virtual network allows researchers to remotely access but not download the data, which is held on different servers. Strict controls are placed on where data is stored, who can access it, the type of analysis applied to the data and the results that are extracted. The Chi number, a unique identifier that is present on certain datasets in Scotland, enables linkages between different datasets. Users of the data are primarily academic researchers, although industrial partners are increasingly interested in accessing the data.
Summary: the eight dimensions of data sharing
Secure processing and linking of unconsented healthcare data for research purposes.
The data and its use:
Healthcare data, social data and other types of sensitive data will be used in mainly healthcare research, plus some non-healthcare applications.
The business model and value creation:
The project was initially funded by the University of Aberdeen and NHS Grampian CSO investment. It is not for profit. There is a standard access charge and hosting fee, as well as fees based on staff time to support a particular project.
The model for data sharing and the partnership:
DaSH technology provides a means for remote access to datasets in a highly controlled and secure way. Partners of the project include NHS Grampian, the University of Aberdeen, academic institutions and industry.
People with the right skills and expertise:
The project has a clinical lead, DaSH technical lead, research coordinators, quality assurance specialist, analysts and programmers.
Constraints on how data is shared and used:
Constraints include regulations on healthcare data, GDPR and ethical approvals.
The data architectures and technologies:
A virtual network that allows researchers remote access to data, with heavy controls on access and exporting data and results.
Governance / oversight / enabling trust:
Oversight is provided by the DaSH steering committee with a lay member, the Caldicott guardian and the privacy advisory committee. The usual ethical approvals are required by researchers.
DaSH is a joint facility between NHS Grampian and the University of Aberdeen, in response to national guidance on improving the safe handling of linked data sets for research. In 2011, as part of the Scottish Health Informatics Programme, a blueprint was established for enabling researchers to access unconsented healthcare data. One safe haven was set up for each of the four major health research boards across Scotland to create a federated network that underpins a system of trust for sharing data. There are five safe havens across Scotland – one national safe haven and four regional ones.
There had always been an interaction between NHS and university researchers, so creating the safe haven environment was a natural development. It provides access to important healthcare datasets, such as the Scottish mortality records and hospital admissions data. The Chi number is a unique identifier that exists to ensure that patients can be correctly identified and is allocated on first registration with the system. It enables linkages between different datasets, since the numbers in different datasets can be matched. In addition, the University of Aberdeen and other universities have large cohort datasets. For example, the Aberdeen Children of the 1950s cohort study provides data about school tests scores and social background, as well as current data about the participants of the original studies. It is possible to carry out studies on the factors in early life that affect health later in life.
DaSH is a virtual network that allows researchers to access the data held on remote servers. Once logged in, researchers do not have a connection to the internet and cannot download or print the data or results of any data analyses. They have access to pre-agreed datasets and statistical packages that they use to run the analysis, including any bespoke code that researchers have written themselves that can be run within a statistical package. The data itself is pseudo-anonymised. Once the researcher is ready to publish their findings, they request the data that they want to extract from the safe haven. Patient-level data cannot leave the safe haven; instead, aggregated results are allowed with a minimum of 10 records. If researchers are interested in smaller groups, they need permission from the custodian of the dataset.
To date, the majority of researchers accessing the safe haven have been academic. However, a number of current bids involve industrial partners such as artificial intelligence companies that would like access to healthcare data to train algorithms that are used in artificial intelligence solutions for hospitals. In the future, there is potential to link to databases containing genetic information. Linkages have already been made to the University of Edinburgh’s Generation Scotland database that contains consented genetic data from human biological samples. DaSH is investigating how to best enable access to very large datasets to train algorithms. The model is also of value in non-healthcare applications. For example, DaSH has been used by the DVLA with MOT data, where there was concern that the location of MOT centres could potentially cause an individual’s address to be identified. DaSH is working in partnership with a similar facility in India with mutually trusted levels of security and governance, so that cohort data can be shared between the two facilities.
“Once logged in, researchers do not have a connection to the internet and cannot download the data”
DaSH was mainly funded by the University of Aberdeen and NHS Grampian, with some funding from the Chief Scientist Office in addition. It is a cost recovery service, so costs are included in research grant applications wherever possible. There is a standard access charge, and a hosting fee is offered. Researchers are charged a fee that depends on the time spent by DaSH staff in guiding researchers, preparing data, carrying out linkages and the size of the datasets.
The facility helps to attract research funding to the university. If commercial companies are involved, the facility is required to follow the relevant university guidelines.
“Sometimes a data custodian has an anonymised dataset but prefers to retain some level of control”
Legal and commercial arrangements
Terminal services are used to gain access into remote desktops. The networks of the various organisations involved are separated to ensure the data is secure. There are strict controls on where and how data is stored, and who can access it. The storage of the data is split between the NHS network and the University of Aberdeen network, with the sensitive identifiable information stored on the NHS network, and ‘payload data’ stored on university servers. The researcher accesses the university server. Data is pseudo-anonymised to make it possible to relink it. However, for each project, data is pseudo-anonymised in a different way, which means that an individual patient’s data cannot be linked between multiple different projects.
Only trained ‘approved’ analysts can access identifiable information. For highly sensitive data, one analyst sees only the identifiers and another sees only clinical and demographic data. In addition, linked datasets are stored on separate servers and access to linked data is restricted. No patient level data leaves the DaSH. Each project has its own Data Linkage Plan and a Data Management Plan.
While the facility is primarily for accessing unconsented linked data, consented or anonymised data can be accessed in a similar way. The facility has also been used in a situation where a data custodian has a large, anonymised dataset but prefers to retain some level of control and not give it directly to the researcher.
There are some data inconsistencies, such as wrong CHI numbers or incorrect names, which makes fully accurate linking of data challenging. There are ways of checking whether the linkage is correct. Some clinical checking of the raw data is carried out, the extent of which depends on the dataset. Clinical staff will also check the analyses to ensure that the results are as expected.
Technical and data curation arrangements
The staff of DaSH act as facilitators, guiding researchers through the process of identifying the appropriate data to answer their research questions, putting them in touch with the custodians of the datasets they will use and advising on how to obtain the relevant approvals.
DaSH has a Caldicott guardian, an individual that provides oversight of the arrangements for the use and sharing of clinical information, and a privacy advisory committee. Individual researchers need to obtain ethics approvals for their particular projects, but any review will be proportionate and take into account the governance arrangements for DaSH.
Approvals may also be needed from the data custodian, academic institution and NHS R&D. In addition, DaSH has a steering committee that includes a lay member. If there are major changes to how DaSH is used, such as the involvement of industrial partners, the committee is consulted.
Researchers have to fulfil certain requirements in order to be able to access data: they have to be an approved researcher with the appropriate research experience and hold a valid Information Governance Training Certificate, which gives access for a limited length of time. Industrial organisations would need a researcher on their project team. DaSH staff monitor who has accessed the facility and keep audit logs.
“The safe haven has a Caldicott guardian - an individual that provides oversight of the arrangements for the use and sharing of clinical information - and a privacy advisory committee”
Technical drawing of Grampian Data Safe Haven
Outcomes and lessons learned
- The close relationship between DaSH and the health board, partly a function of the small size of each, has contributed to the success of the facility. It was initially challenging to bring on board researchers to the new system of governance, when they had got used to a less rigorous system. However, the facility has enabled many students and other researchers access to data, and has allowed them geographical flexibility; for example, if a researcher relocates to a different country as part of their research programme, they are still able to access data.
- An expansion of the service is the hosting of a safe haven within a safe haven: it has been possible to bring a physical machine that allows researchers access to Administrative Data Research Network data, which they would otherwise have had to travel to Edinburgh to access.
- With hindsight, it may have been more useful to have a uniform approach to setting up the five safe havens; instead they have all been set up in slightly different ways. It may have also been preferable to aim for ISO27001 information security management accreditation from the start.