Hi Peter - just wanted to chime in to say, yes, my organisation is doing that (‘collecting’ or ingesting the Health Conditions, Mental Health, Live Well, Medicines APIs into our own internal system for consuming in apps/products as part of a wider set of data resources)
I suspect we have been in the same or similar boat (other than, unlike you, my technique does involve importing the actual data into our own backend - arguably even more difficult!). I too felt overwhelmed by the way the ‘content’ API seems to effectively represent the backend of the NHS website’s CMS.
Things that are especially difficult are the way ‘sections’ or paragraphs of any single ‘page’ are broken into separate JSON sections (the hasPart, mainEntityOfPage stuff).
The other problem is the text content is full of HTML elements, sometimes with elements that relate specifically to the NHS website’s CSS classes etc, which have no relevance on any other app.
In my case I have to try to ‘flatten’ out the relationships to get the data stored in a conventional relational DB, it’s been very difficult and the biggest risk is missing some attribute simply due to the vastness of it. For example, I initially tried to extract all the ‘text’ fields from the ‘hasPart’ sections and concatenate them together into one big ‘text’ field, only to realise that that meant I was missing the ‘headline’ attribute, which is sometimes used to set headings on the page - which is important context for the information being provided.
With the Medicines API, I ran into even more fun: some medicines have a dedicated ‘relatedLink’ section of the JSON for Related Conditions and Useful Resources. See the bottom of Benzydamine – a non-steroidal anti-inflammatory (NSAID) used for teething, sore throats and mouth ulcers. - NHS for example. But other medicines store Related Conditions and Useful Resources not in the ‘relatedLink’ section but as another ‘mainBodyofPage’ hasPart thing (for example, note the difference in the appearance of those sections in Alogliptin: medicine to treat type 2 diabetes - NHS, compared with the Benzydamine one). In other words, inconsistent data entry techniques also are at play here, making it very hard for consumers of the API.
Finally, another thing we realised is that the JSON is actually non-compliant! It uses single quotes for the encapsulation of its keys and values, instead of double quotes (probably in order to support the double quotes in the HTML content). This breaks various programming languages which refuse to decode the JSON , or even things like MySQL, which supports a JSON field type, refuses to insert the data because of the non-compliance.
It would be (have been) great to know if there were any tricks with the API such as:
- Getting all the text content in one big hit rather than in ‘paragraph parts’
- Getting the content without HTML elements
… documentation ideas just to start with.
All of which is to say - I share your pain, but I’m glad it’s not just me!
I know that the NHS Content APIs are going to be ‘moving’ or merging into the other APIs and the endpoints will change, from May onwards. It will be interesting to see if this has any effect on the documentation or format.