Do we need documentation for NHS content (?) API

Is anyone else using the NHS content API, i.e. the one that gets web content for health conditions etc… that you can use to pull into your own site/product to display health related content rather than write it yourself?

The hardest thing we find in using it is the documentation. The JSON format returned is a generic standard used for lots of different content and is quite vast. Because it’s a fairly generic format trying to write a mechanism to handle it is pretty much impossible. Without attempting to import every single piece of content there is no way to know if what we have written will handle everything that might come back from the NHS content API.

What we end up with is an integration whereby it seems to work until a customer raises a support ticket when they try to import something that is not handled correctly.

I’d like to see the documentation not just be a link to the generic JSON format, but describe how it is used by the NHS API so that we can handle things fully.

What is everyone else’s experience? Do you agree some NHS specific documentation would be helpful?

1 Like

Hi Peter - just wanted to chime in to say, yes, my organisation is doing that (‘collecting’ or ingesting the Health Conditions, Mental Health, Live Well, Medicines APIs into our own internal system for consuming in apps/products as part of a wider set of data resources)

I suspect we have been in the same or similar boat (other than, unlike you, my technique does involve importing the actual data into our own backend - arguably even more difficult!). I too felt overwhelmed by the way the ‘content’ API seems to effectively represent the backend of the NHS website’s CMS.

Things that are especially difficult are the way ‘sections’ or paragraphs of any single ‘page’ are broken into separate JSON sections (the hasPart, mainEntityOfPage stuff).

The other problem is the text content is full of HTML elements, sometimes with elements that relate specifically to the NHS website’s CSS classes etc, which have no relevance on any other app.

In my case I have to try to ‘flatten’ out the relationships to get the data stored in a conventional relational DB, it’s been very difficult and the biggest risk is missing some attribute simply due to the vastness of it. For example, I initially tried to extract all the ‘text’ fields from the ‘hasPart’ sections and concatenate them together into one big ‘text’ field, only to realise that that meant I was missing the ‘headline’ attribute, which is sometimes used to set headings on the page - which is important context for the information being provided.

With the Medicines API, I ran into even more fun: some medicines have a dedicated ‘relatedLink’ section of the JSON for Related Conditions and Useful Resources. See the bottom of Benzydamine – a non-steroidal anti-inflammatory (NSAID) used for teething, sore throats and mouth ulcers. - NHS for example. But other medicines store Related Conditions and Useful Resources not in the ‘relatedLink’ section but as another ‘mainBodyofPage’ hasPart thing (for example, note the difference in the appearance of those sections in Alogliptin: medicine to treat type 2 diabetes - NHS, compared with the Benzydamine one). In other words, inconsistent data entry techniques also are at play here, making it very hard for consumers of the API.

Finally, another thing we realised is that the JSON is actually non-compliant! It uses single quotes for the encapsulation of its keys and values, instead of double quotes (probably in order to support the double quotes in the HTML content). This breaks various programming languages which refuse to decode the JSON , or even things like MySQL, which supports a JSON field type, refuses to insert the data because of the non-compliance.

It would be (have been) great to know if there were any tricks with the API such as:

  1. Getting all the text content in one big hit rather than in ‘paragraph parts’
  2. Getting the content without HTML elements

… documentation ideas just to start with.

All of which is to say - I share your pain, but I’m glad it’s not just me!

I know that the NHS Content APIs are going to be ‘moving’ or merging into the other APIs and the endpoints will change, from May onwards. It will be interesting to see if this has any effect on the documentation or format.

Hey mig5,

I’m glad we are not the only ones who having difficulties with a reliable content API integration.
We came across the issues with things like headers, :unamused:. At least on the JSON side it hasn’t been a problem for us, although we are using the MS stack, .NET/SQL Server. Although we are not storing the JSON in the DB we are extracting the bits we want and storing that in our information page format which allows HTML.

I recently received an email from NHS digital saying the content API was changing again, all the keys will become invalid and all the end points are changing and linked to a page to register/configure. I had a quick look at the web pages and, admittedly they do have a warning notice, the new dev site pages are wildly in consistent between different logins and appearance. Worst of all it says the call rate is going to be something like 10 times a minute and 1000 calls a month (or something similar) so this is going to be a real pig to test. In the past they’ve say you can contact them to increase the limits but the time to get a reply is long as well. I don’t understand why they can’t have a test system with the same call limits as the live system. If anything the test system needs higher limits as trying to get the integration right is a lot of trial and error. On top of that we can’t try it until May.