Google Deepmind may have received the personally identifying medical records of million NHS patients on a legally inappropriate basis

Sky News, 15th May 2017

Taking into account what you have now clarified, it is my view and that of my panel that the purpose for the transfer of 1.6 million identifiable patient records to Google DeepMind was for the testing of the Streams application, and not for the provision of direct care to patients. Given that Streams was going through testing and therefore could not be relied upon for patient care, any role the application might have played in supporting the provision of direct care would have been limited and secondary to the purpose of the data transfer. My considered opinion therefore remains that it would not have been within the reasonable expectation of patients that their records would have been shared for this purpose.

Dame Fiona Caldicott, National Data Guardian, February 2017

Trust and openness

Fundamentally, this is about trust and openness. It is what is reasonable and expected.

However, while that’s quite easy to say, who decides what is reasonable and expected? Do you know how your medical information is stored and to whom they are shared? For example, would you expect an IT supplier to potentially have access to your information?

Here is an extract from a contract between a UK health organisation and an IT system supplier (and it is not the Google DeepMind contract):

Personal data and medical records. The Supplier shall at all times comply with the Data Protection Legislate. In particular the Supplier acknowledges that in the course of providing the Works and/or Services it may access Personal Data. As needed the Supplier undertakes: a) to maintain technical and organisational security measures sufficient to comply at least with the obligations imposed on the Trust by the Seventh Data Protection Principle; b) only to process Personal Data for and on behalf of the Trust, in accordance with the instructions of the Trust and for the purpose of performance of the; c) to allow the Trust to audit the Supplier’s compliance with the requirements of this Clause on reasonable notice and/or to provide the Trust with evidence of its compliance with the obligations set out in this Clause.

Indeed, the Royal Free have published a useful video on how patient information may be used within an organisation and with suppliers who are acting for and on behalf of them:

It is therefore clear that a large health organisation will use patient information for direct clinical care but will share those data when appropriate when other organisations or suppliers are acting “for and on behalf” of them to support the care of patients.

I’m surprised by the level of criticism on Twitter this evening and impressed by the balanced view given by the Sky News item. Dame Caldicott has written to the Information Commissioner’s Office (ICO) to raise her concerns. There will be an investigation from the ICO who will make a decision regarding whether data sharing was appropriate or not.

What is “testing”?

The issue at hand appears to be the use of patient data to test a software application. Dame Caldicott opines that as data were transferred for testing, then such transfer was unreasonable.

To make a judgement on this matter, one needs to understand the law relating to confidential information, the information governance principles and software development. The Data Protection Act requires that data be used fairly and lawfully in a way that is adequate, relevant and not excessive. The first Caldicott Committee report was released in 1997 and stipulated six key principles for information governance. The second report refined these core principles and emphasised the difference between implicit consent for direct patient care and explicit consent for research. The third report, commissioned after the debacle, was published last year and we await an official Government response.

Is the Streams application for direct care?

Therefore, the first question to ask is whether the Streams application is for the direct care of patients?

The Streams application runs on mobile devices and receives notifications about patients who have developed an acute kidney injury. Such an application needs a considerable underlying infrastructure in order to make it work. Much like an online shopping application on your phone, the actual work occurs in a server in a data centre. When a clinician gets an alert, they can view the patient’s record, including information about admissions, discharges, diagnoses and previous blood results in order to prioritise the care of that patient. It is not simply a notification but a window into the patient record. It is my opinion that this application satisfies the requirements of direct care for patients. Watch the Sky News report and how the consultant nurse describes its use. I am not aware of any serious argument to suggest that it is not for direct care.

We’re not buying a new car

The next consideration is how one builds software that works safely and consistently. In general, the best way of prove that software works correctly is to test it. There are multiple levels of testing in which larger and larger components of a total solution are tested iteratively. If you want to read more, I cover testing in my Domain-driven design for clinical information systems document.

Unit testing checks that a single unit of functionality works as expected, testing the module in isolation to check that the outputs are what one would expect given the specified inputs. It is possible to run these tests every time code is changed and such tests are possible using dummy fake data designed to test how a module performs given both usual and unusual data inputs.

Integration and system testing test the combination of modules together as well as their interactions with other components within a wider ecosystem. For a simple application with few interactions, integration tests can be straightforward but such testing can be complex and involved when there are multiple different interactions, particularly when those interactions are with other systems that have been created by other vendors.

It is possible to run such tests in a specific environment for tests in which there are fake ‘mock’ services. For example, one might have a fake patient administration system or a fake laboratory information system running with dummy information in order to test interactions between those systems. However, at some point during application development, the application must be deployed.

Software applications are not like buying a car. When I go and buy a car, I know that it will work on the road and it will fit onto my drive and I’ll take care that it’s got the right number of seats for my family. The car design has been tested and quality tests would have been performed before it was ready for purchase. I can buy it and drive it home.

Instead, software development is much more like teaching your child to ride a bicycle. We can use dummy data like we use stabilisers, but at some point we have to take off those stabilisers! Similarly, at some point, we need to expose our new application to real data.

Now if you are anything like me when teaching a child to ride a bicycle, you took off the stabilisers and then ran very quickly alongside the child as they wobbled precariously from side-to-side! In the same way, any responsible organisation writing software applications cannot and should not hand over their application without further monitoring and testing to ensure that what you think will work does not work in a live environment. Once in the live environment, it would be unreasonable for a software company to not perform additional testing to be sure that their software works as expected. Similarly, we run next to our child and check that they are safe and don’t forget to keep on pedalling!


It is important to have robust scrutiny in relation to the use of patient-identifiable information by third-parties. However, I am surprised that Dame Caldicott has suggested that it might be inappropriate to test software systems designed for direct care with real patient information. I would argue that such testing is necessary in the final stages of development to ensure that new technology is safely deployed in live clinical environments.

Update 4th June 2017:

It has been pointed out to me that I have not explicitly explained the difference between two distinct workstreams in this post. In order to build software, it is usually appropriate to break up a large problem into distinct and manageable chunks of work. If it were me developing this application, we’d break it up like this:

  1. Integrate data from two different laboratory and patient administrative systems across sites within the Trust and provide a unified view of results from those systems to clinicians.
  2. Develop an acute kidney injury algorithm in code.
  3. Deployment.

Now, all aspects of that work need “testing”. Only (2) needs certification as a medical device. It seems entirely reasonable to me that a transfer of data would be needed in order to develop and test (1), and that could be done before or after the certification of (2). Indeed, testing and certifying (2) doesn’t actually need real patient data - the MHRA simply needs to know that a medical device “does what it says on the tin”.