Clinical information system design: 3. Data standards and terminology
A terminology is a body of terms used within a particular subject of study or profession. A classification is a systematic organisation of things into classes. A classification system such as ICD-10 cannot be used as a terminology and many terminologies cannot be easily used as a classification system. This is an important distinction and relates to the how those data are primarily used.
SNOMED CT
SNOMED CT is a very large and comprehensive terminology. Importantly terms can be linked by multiple relationships to other terms.
This makes it possible for software to determine that “multiple sclerosis” is a disease defined by “demyelination” of the “central nervous system”.
When implemented properly, SNOMED CT enables software to make intelligent decisions about what to show, what data to request and what forms to present, based on the diagnoses entered. For example, the database would know that a patient had epilepsy if they were given a diagnosis of juvenile myoclonic epilepsy or frontal lobe epilepsy or any of the hundreds of other terms that are equivalent to a diagnosis of epilepsy.
Thus a command to ‘send an alert when a patient, belonging to a particular consultant, with motor neurone disease loses 5% of their body mass compared to their baseline at diagnosis’, can be implemented easily. SNOMED CT allows the underlying logic to simply ask whether the patient has a type of “motor neurone disease” and this would automatically include all patients with related diagnoses such as “primary lateral sclerosis” and “pseudobulbar palsy”.
SNOMED CT is not confined to diagnostic and procedural information. There are hierarchies covering a wide range of medical terminology including anatomical structures, pathology, occupations and ethnic origins. With local extensions such as the NHS’ DM&D (dictionary of medicines and devices) these codes can be used in any field that needs structured coded information.
Another advantage is support for synonyms. A distinct clinical concept can and usually has multiple synonyms - for example “Granulomatosis with polyangiitis” was previously known as “Wegener granulomatosis”. With synonym support, a user entering an outdated or synonymous term would find the synonym and see it mapped into the new modern preferred description of the term.
Within SNOMED CT, clinical terms are “concepts”, “synonyms” are “descriptions” and the relationships between concepts are recorded as “relationships”. While seemingly simple, as relationships themselves are defined by concepts (such as ‘IS-A’ as in “‘Motor neurone disease—IS-A—Disorder of nervous system”) it means that the relationship tree is infinitely extendable over time.
SNOMED CT is owned by Snomed International - and is an international terminology, with the UK version managed by the UK Terminology Centre (UKTC) of the Health and Social Care Information Centre (HSCIC) - now called NHS Digital. There are online training resources as well as a simple online SNOMED CT browser.
I have developed an open-source terminology server that provides fast free-text search and navigation around the SNOMED CT hierarchy as well as providing semantic understanding for any concept, allowing client software to answer questions like “does this patient have a type of granulomatous disease?”, “was this patient born in Europe?” or even “is this drug a type of beta-blocker?” . It would answer yes to the first question simply by understanding the diseases that the patient has been listed as having, answering “yes” if a patient had sarcoidosis and “no” if they had multiple sclerosis. Similarly, it would respond with “yes” if a patient was recorded as being born in France but “no” if they were born in Afghanistan. SNOMED CT provides the logical relationships in order to drive such computerised decision-making.
Information models
Each SNOMED CT concept, description and relationship has a unique and persistent identifier that can be stored in a data store. Most clinical applications persist information in a relational database and so a simple implementation may simply store the identifier as a foreign key to the row that represents that entity. However, whilst these terms have meaning when used in isolation (e.g. storing the identifier representing”myocardial infarction”), it is only when these terms are combined together in a logical way as part of a larger data model, that true meaning can be understood. It is analogous to definitions for individual words in a dictionary but true expression results only when these words are combined into sentences and paragraphs. As such, most concepts are only useful when considered within the information model in which it is recorded.
The information model used in which SNOMED CT concepts are stored is therefore critical to derive understanding from what can be inferred from the recording of a concept. This is particularly important for a terminology such as SNOMED CT in which terms may be recorded together to form a compositional (post-coordinated) term such as “Family history of…” (281666001) and “Obesity” (414916001) or out of convenience represented as a single SNOMED CT term “Family history: Obesity” (160311006).
If SNOMED CT had not only defined a terminology but also a wider information model then such compound pre-coordinated terms would be unnecessary and the recording of “obesity” within a model defining a family history would be sufficient. However, while compound pre-coordinated terms risk an explosion of terms to cater for multiple combinations, they do make it easier for end-users to find concepts that represent what they are trying to record and support workflows in which clinical terms are recorded in a relatively unstructured information model, such as that used in primary care historically using Read codes. Of course, SNOMED CT was developed as an amalgamation of SNOMED from the College of American Pathologists and Read (Clinical Terms Version 3) codes with the latter recorded prospectively and longitudinally in UK primary care systems in a relatively unstructured format. There are advantages in SNOMED CT being independent from the surrounding data model within which it is transmitted or stored particularly as SNOMED CT terms generally reflect core concepts relating to health, disease and the processes of care. As a result a range of models can be used with SNOMED CT and similarly, different terminologies can be bound to a model such as LOINC - LOINC is an alternative terminology focused on tests, measurements and observations — see LOINC.
I generally recommend a highly structured approach to the storage of SNOMED CT codes in which the information model in which they are stored ensures no uncertainty in interpretation and that to simplify subsequent analysis and retrieval, compound pre-coordinated terms are decomposed into their components are stored appropriately. As such, users may enter information via a highly structured workflow in which the context is evident as part of the user interface such as recording family history or allergies or a less structured workflow in which terms can be entered and decomposed and entered into a more structured information model. Alternatively, an implementation may instead store the codes as entered and deal with ensuring that a diagnostic term recorded in a family history model is equivalent to the compound term.
To provide evidence to support a highly structured approach, I suggest considering how one might use SNOMED CT to record examination findings. While there is a code for “supranuclear gaze palsy” (420675003) there is no compound code to record that there is an absence of this examination finding. While one could request an addition to SNOMED CT, it is much more appropriate to consider examination findings in two categories, those found and those not found. The recording of a lack of clinical sign is important in clinical practice as well as for medico-legal reasons. As such, the information model in which this clinical finding is recorded is critical in providing understanding. SNOMED CT does allow post-coordinated terms in which multiple terms are combined together to give meaning. One possible way of representing the ‘lack of’ a clinical finding is to post-coordinate with a negation concept to form a compositional term but I advocate using a robust information model as a simpler method of expressing clinical knowledge particularly when one considers how users are expected to record such findings.
Example information models which can record SNOMED CT terms in context are HL7 and openEHR. OpenEHR uses the term ‘archetype’ as a synonym for ‘information model’. The use of validated and published data structures support subsequent interoperability between disparate systems which can process that model. However, the use of a specific information model does not necessarily force the use of that model as a format in which to store data, but may be used only as a representation of data to be used for interoperability with other systems. Indeed, a focus on a model of any form improves the potential for interoperability because that model is likely to be an abstraction of real-life concepts and thus it becomes possible to map from one information model to another.
However, even if two information models represent the same real-life concept and they look superficially similar, the process of mapping can introduce ambiguity and potentially even errors, particularly if a data element is present in one model but not in another. In addition, different terminologies may be used with an information model in a process called ‘terminology binding’ and so simply mandating a particular kind of information model does not guarantee interoperability.
HL7
Health Level Seven (HL7) is an international standards development organisation that publishes standards for healthcare interoperability. For more information, see the HL7 website.
HL7 publish a range of interoperability standards including HL7 V2, HL7 V3 and CDA, and the HL7 FHIR.
HL7 V2
HL7 V2 refers to HL7’s currently most used health standard from HL7 first released in 1989 and deployed internationally. See more information here and here.
It is fundamentally a messaging standard and early versions focused on ‘ADT’ messages, messaging relating to the admission, discharge and transfer of patients. Such messages are sent as ‘triggers’ and therefore adopt a ‘push’ model of health interoperability. HL7 has grown organically and iteratively over many years with an increasing number of message types including those to record clinical observations and laboratory results for example. The latest version is HL7 v2.6 which was approved as an ANSI standard in 2007.
HL7 V3 and the RIM
HL7 V3 was developed from 1992 to define a Reference Information Model (RIM) describing healthcare-related messages and trigger events relating to those messages. The RIM defines an object-orientated model in which types are sub-specialisms of a more generic type. For example, in the same way as bicycles and cars are types of vehicles, a ‘Person’ is a type of a ‘Living Subject’ but veterinary patients such as dogs and cats are represented as a ‘Non-person living subject’. As in object-orientated programming languages, specialist models inherit attributes and behaviours from their more generic parent types.
The three core classes in HL7 RIM are ‘Act’, ‘Entity’ and ‘Role’:
- An act is a record of something that has or will happen. This will usually include what has been done, to whom, by whom, when, where and how and possibly documenting why.
- An entity is a living or non-living thing such as a person, animal or organisation.
- A role represents a skill or competency of an Entity, such as patient, employee, place or organisation.
The HL7 V3 standard is large and complex and many healthcare organisations with an existing infrastructure built using HL7 V2 have been reluctant to adopt the newer standard so HL7 RIM is not used as much as HL7 V2 internationally.
HL7 V3 CDA
The HL7 V3 Clinical Document Architecture (CDA) defines a document-based information model in which components of the HL7 V3 RIM is used as a header together with a document body consisting of a mixture of unstructured and structured data. As I discuss in my post on clinical modelling, a document-based architecture is an appropriate abstraction of real-life processes and explicitly ensures that clinical information is recorded together with its context; such a document can stand in isolation and be understood by the reader. When contextual clues are removed, interpretation of structured and unstructured information is potentially hazardous. A document can be immutable once created and document lifecycle and process management including creation, editing and repudiation can be modelled in a straightforward manner.
There are three CDA levels:
- CDA level one: has a header and human-readable body usually in an unstructured format such as free text or file types such as images or documents (e.g. Adobe PDF).
- CDA level two: extends level one by including more structured data within the body of the document.
- CDA level three: allows highly-structured data to be encoded at a high level of granularity.
With the document paradigm limiting mismatch between model and real-life, adoption of the CDA standard has been widespread internationally. It is the most adopted HL7 V3 standard. In addition, the different CDA levels permit flexibility in the recording of unstructured and structured data, easing adoption of the standard. Such an approach permits implementers to use level one of the HL7 CDA to store relatively unstructured documents initially but evolve over time to permit newer applications to store more structured information and yet remain interoperable.
FHIR
FHIR, Fast Health Interoperability Resources, is a new framework created by HL7 which uses modern web services over HTTP to create, edit and share modular resources. Importantly, the HL7 FHIR standards are free to use without restriction and focus on data standards and the implementation of those standards within clinical systems.
The use of FHIR does not mandate that existing and new clinical systems use FHIR standards to define their internal architecture or internal storage formats, but the use of FHIR can provide an open and standard interface to permit interoperability between different systems created by different vendors. As such, an application or service can provide access to data in an open and extensible format by providing a FHIR server and consume different data as a FHIR client.
There are a range of FHIR frameworks including:
- Search: within a resource, such as searching for a patient.
- Operations: on a resource such as fetching an encounter record or processing a message. These are in addition to the standard CREATE, READ, UPDATE and DELETE REST-based interactions.
- Documents: representing a composition of other resources with a fixed presentation linked to context.
- Messages: to allow messages to be sent in response to a specific event.
- Services: to provide a representation of a service-orientated architecture within healthcare. See the FHIR service documentation for more information.
There is backwards-compatibility so that FHIR documents can contain HL7 V3 CDA documents. FHIR potentially improves the integration and interoperability between disparate systems. The standard uses open web standards over the HTTP protocol and so development can be straightforward. However, FHIR is still under active development at the time of writing and the current release is designated as a “draft standard for trial use”. As such, future versions of FHIR may result in non-compatible future changes.
openEHR
openEHR is an open standard for the recording of health-related data maintained by the OpenEHR foundation. It defines a multi-level information model in which the core model (the ‘reference model’) does not include clinical information but focuses on core generic abstractions and process. Archetypes represent the clinical information model, with models created, edited and curated as part of a ‘clinical knowledge manager’ (CKM). This multi-level modelling approach means that health professionals can undertake clinical modelling independent of the underlying implementation. In addition, small and re-usable domain models can be aggregated into a ‘template’. For example, if a domain model contains ‘blood pressure’ and ‘heart rate’ then this model can be incorporated within a larger ‘template’ representing an emergency unit admission. An overview is available here
In addition, there is a new openEHR REST-based application programming interface (API) which proposes to define standards to which an openEHR implementation should adhere. However, the development of these standards is still in development and has not yet been finalised.
LOINC
LOINC is a freely available international standard for tests, measurements and observations. The official website is https://loinc.org.
Although limited in scope, it has been adopted by HL7 as the standard code system for laboratory results. In addition, there is a project to map LOINC codes to SNOMED-CT and vice versa.