Using the PDS API against the INT environment, we have found that the returned results don’t always match the search parameters.
Searching for a patient with these parameters:
{
birthdate: '2011-05-09',
gender: 'female',
family: 'Shipperbottom'
}
returns a Bundle containing a single matched Patient resource with these attributes:
{
birthdate: '1926-09-24',
gender: 'male',
family: 'Shipperbottom'
}
We expected that the matched Patient resource would be consistent with the search parameters.
Why does the service return this result?
Having done some further investigation, we can see that this problem occurs for around of the first patients listed in the PDS test data spreadsheet so we can confirm that this is not an isolated problem with a single patient.
The following combinations of parameters exhibit the problem:
{
birthdate: '1943-05-19',
gender: PatientGenderCodes.FEMALE,
family: 'AINDOW'
},
{
birthdate: '2011-05-09',
gender: PatientGenderCodes.FEMALE,
family: 'SHIPPERBOTTOM'
},
{
birthdate: '2009-02-15',
gender: PatientGenderCodes.MALE,
family: 'ANDERTON'
},
{
birthdate: '2008-05-21',
gender: PatientGenderCodes.MALE,
family: 'FIELDING'
},
{
birthdate: '2001-01-26',
gender: PatientGenderCodes.FEMALE,
family: 'SEO'
}
Some background:
For “simple” searches, PDS uses inverted indexes, these are usually updated when a pdsRecord is saved.
It looks like, in the case of these particular records, the way they’ve been loaded has not updated those indexes and we’ve been left with some stale entries.
I’m investigating as to the tool used and will recommend we add some improvements to prevent this happening in future. I’ll also look to clear out the aforementioned stale indexes for these records.
This will not happen in live as we do not use these tools to load test data.
2 Likes
Thanks for this update - it’s always useful to understand the system architecture.
For our hazard log entry, the same problem could occur in production but the risk is mitigated through operational controls (i.e. not using test tooling to load data) and the probability of occurrence will be very low. For our use case I don’t think we’ll then need to add any further mitigation.
We’ve retested this following the fix and find that searching for a patient with these parameters taken from the All PDS Data dataset no longer returns any matches:
{
birthdate: '2011-05-09',
gender: 'female',
family: 'Shipperbottom'
}
We can retrieve a patient by searching using these parameters (i.e. the values previously returned as an erroneous response):
{
birthdate: '1926-09-24',
gender: 'male',
family: 'Shipperbottom'
}
The returned patient has NHS Number 9449306621
, but has materially different demographics than for this patient in the test dataset.
Is it expected that the INT system contains different patients to the All PDS data test cases?
NHS API Management team let me know that the INT system should be consistent with the test pack. They haven’t provided any information about if/when this will be resolved.
Tested again and the problem is still present.
Hi. Is anyone from the NHS team able to provide an update to this, please?
We’ve just found another instance where the GET /Patient
endpoint response in the INT environment is different from the query and from all-pds-data2.xlsx
in PDS FHIR API test data - NHS Digital. The request specifies BRITTAIN
family name, 1928-12-12
date of birth, and KT19 9SA
postcode (per the spreadsheet), but the endpoint returns a patient with 2021-03-07
date of birth.
Thanks!
Whilst the test data page says that the data are read-only, in reality they can be updated by users of the system and therefore can differ from the details in the spreadsheet. This is compounded by the problem that @chris.clarke described where the data underpinning read
and search
interactions become unsynchronised.
You can request a dataset specifically for your own use that is less likely to be modified.
Thank you, @dunmail ! This is really unfortunate. Hopefully the NHS team can get this fixed soon as it just results in wasted time investigating the issue every time someone stumbles into it.
Must admit I gave up some time ago that the INT data is representative of the spreadsheets. Most of my ‘favourite’ test patients seem to have different demographics.