Semantic value sets
One of the difficulties with building semantic interoperability between clinical systems is how we define value sets.
In essence, a value set is a set of values that can be used for a data field. It is a vocabulary.
Many health interoperability technologies leave value sets as something to be profiled. This means that one actually can’t exchange meaningful information unless you agree on what you are using for a particular field. In many cases that means you can talk, but you’re both talking different languages.
Often value sets model real-world operations very closely. Sometimes, they don’t because they reflect central reporting requirements and act as a classification system rather than a terminology. You can learn a lot by asking yourself who is the primary user of that value set.
The best way to think of this, in a non-technical manner, is that for central reporting we might want to categorise a clinical encounter by main specialty (e.g. neurology), but in fact the patient was really seen by a number of different specialties or clinicians in subspecialty practice (e.g. epilepsy surgery pre-assessment clinic). The reporting is important, but it is a prism through which we view the record in order to simplify our analytics. It isn’t the “truth”.
It should be obvious that a flat classification system is great for reporting, because you categorise patients into groups, but an ontology does much more. Ontologies not only provides value lists but provides information about how each value relates to one another. For example, paediatric neurology is-a
paediatric speciality, paediatric neurology is-a
neurology specialty). That makes ontologies much better for operational clinical systems.
It’s seductive to use an information standard for operational systems (“we’ve used a standard”), and sometimes that’s the right approach. More often, it’s much better to define the value set in terms of an ontology.
The logical consequence is that we need systems that can transparently flip between different classification and ontology systems. When we can do that with losing information, it is called ‘round-tripping’, but in many cases it’s not possible to map without losing information because a classification simply doesn’t have the necessary level of granularity. The logical consequence of that is that we should do most of our work in a finely-grained terminology that has an ontological basis and map to less granular, lossy classification schemes for reporting purposes.
Any national data project in health and care needs to give careful consideration to how to process data. In general, real-life health and care data is granular, hierarchical, graph-like and complex, while many central reporting requirements prefer a simpler tabular, categorised view.
This example here using real code, in Clojure, to demonstrate how we can build stable, reliable clinical systems that can seamlessly switch between ontological value sets and classification value sets, and sometimes back again. This example uses a value set that is useful both for reporting and operationally: triage category.
I’ve deliberately chosen this as we can round-trip without losing information.
This is my real-time write up of coding some simple logic to handle mapping between different value sets. I wrote it in the space of an afternoon, so it’s only a demonstration. The code is trivial - 32 lines of actual code. The declarative data definitions and maps are the important thing here.
It’s written in Clojure, a lisp than runs on the Java virtual machine (JVM). It makes it very easy to do coding in an exploratory way. This is my first live-coding blog in Clojure. It’s an experiment to show how easy it is to map between identifiers if we have our classifications published in a machine readable format with appropriate namespacing.
(ns com.eldrix.janus.standards)
A worked example: triage category
Here’s our NHS Wales information standards data from the emergency department dataset. It should be a goal for us to publish datasets like this in a format that can be read by machines but that’s not available, so I’ve created a machine-readable version here.
It defines a value set for triage categories, representing the patient’s priority in an emergency system at time of triage.
Note that I’ve added a namespace. This means that the combination of namespace and value is globally unique.
One job for data standards is to define publicly accessible namespaces and their values.
This means that https://data.standards.cymru/Id/datasets/edss/triage|02
will be “very urgent”. It’s impossible to process the code “02” by itself - you have to have insider knowledge of a table structure and what it means. Namespacing codes means that they can standalone and be interpreted appropriately. I’ve made up that namespace - but these need to be published well-known namespaces.
(def datasets
"These are the NHS Wales data dictionary datasets.
See http://www.datadictionary.wales.nhs.uk
Ideally these should be imported from a machine-readable source,
but that doesn't yet exist"
{:emergency-department-dataset
{:name "Emergency department dataset"
:url "http://www.wales.nhs.uk/sitesplus/documents/299/20090401_DSCN_022009%28W%29.pdf"
:dscn "DSCN (2009) 02 (W)"
:namespace "https://data.standards.cymru/Id/datasets/edss"
:items
{:triage-categories
{:name "Triage categories"
:namespace "https://data.standards.cymru/Id/datasets/edss/triage"
:values
[{:id "01"
:active true
:description "Priority One - Immediate"
:info.snomed/sct 1064891000000107}
{:id "02"
:active true
:description "Priority Two - Very urgent"
:info.snomed/sct 1064911000000105}
{:id "03"
:active true
:description "Priority Three - Urgent"
:info.snomed/sct 1064901000000108}
{:id "04"
:active true
:description "Priority Four - Standard"
:info.snomed/sct 1077241000000103}
{:id "05"
:active true
:description "Priority Five - Non urgent"
:info.snomed/sct 1077251000000100}
{:id "06"
:active false
:description "See and Treat"}]}}}})
This is a fragment of the larger dataset, and only includes the triage categories. The code below works for other defined categories in other datasets within the NHS Wales information standards catalogue.
You can see that I’ve included a map to SNOMED CT for each category. There isn’t an equivalent for “See and treat” so we need to raise this with SNOMED International. (It is also where NHS Wales diverges in its standard from NHS England).
We need some helper functions to process these information. You don’t need to understand how these work. Skip if you can’t read lisp! This is a toy implementation just to show the logic.
Clojure allows arbitrary structures to be used as keys and values in associative arrays (also called dictionaries or hash-maps). This means I use tuples here as a key to permit toy lookup functionality. This doesn’t provide real ontological inference - only providing equivalence.
(defn categories->ns
"Convert a list of categories into namespaced identifiers"
[cats]
(let [prefix (:namespace cats)]
(->> (:values cats)
(map #(merge % (hash-map :system prefix :value (:id %)))))))
(defn all-identifiers
"Return a simple list of all identifiers from the datasets"
[ds]
(->> (vals ds)
(map :items)
(mapcat vals)
(mapcat categories->ns)))
(defonce registry (atom {}))
(defn reg-equiv-asymm
"Registers equivalence from one identifier to another"
[from to]
(swap! registry #(update % from (fn [old] (assoc old (:system to) to)))))
(defn reg-equiv
"Register that the specified identifiers, expressed as {:system :value}
are equivalent, symmetrically"
[id1 id2]
(reg-equiv-asymm id1 id2)
(reg-equiv-asymm id2 id1))
So now we can process our machine-readable dataset, and simply register that one identifier from one classification or terminology is equivalent to another. Let’s do that now for our triage category. In a real production system, we’d use a service that provided an abstract ontological service and could provide an inference engine off-the-shelf.
(defn reg-datasets-snomed
"Register SNOMED maps for the value sets (categories) from the items
in the datasets"
[datasets]
(doall (->> (all-identifiers datasets)
(map #(reg-equiv {:system (:system %) :value (:value %)} {:system "https://snomed.info/sct" :value (:info.snomed/sct %)})))))
We’ve registered a symmetric map between SNOMED CT and the dataset definitions from the NHS Wales information standards. Real ontologies provide many more options to define relationships between identifiers.
Let’s double check that this works by round-tripping between our triage identifier and SNOMED CT and back
(reg-datasets-snomed datasets)
(get-in @registry [{:system "https://data.standards.cymru/Id/datasets/edss/triage" :value "05"} "https://snomed.info/sct"])
;;-> returns => {:system "https://snomed.info/sct", :value 1077251000000100}
(get-in @registry [{:system "https://snomed.info/sct" :value 1077251000000100} "https://data.standards.cymru/Id/datasets/edss/triage"])
;; -> returns => {:system "https://data.standards.cymru/Id/datasets/edss/triage", :value "05"}
That means we can build an ‘alias’ system, taking arbitrary collections of identifiers and expanding/denormalising into all of the registered codesystems we know. In a real implementation, we’d recursively expand identifiers so we could alias all equivalent identifiers.
(defn expand-identifier
"Determine the equivalent identifiers for the specified identifier tuple"
[[sys v]]
(apply hash-map
(flatten
(conj (->> (vals (get @registry {:system sys :value v}))
(map #(vector (:system %) (:value %)))) v sys))))
So let’s check it works. Here we expand a single identifier tuple:
(expand-identifier ["https://data.standards.cymru/Id/datasets/edss/triage" "05"])
;; => {"https://snomed.info/sct" 1077251000000100, "https://data.standards.cymru/Id/datasets/edss/triage" "05"}
And we can send in arbitrary identifiers which will be ignored if there are no expansions
(apply merge (->> {:name "This won't be mapped"
"https://data.standards.cymru/Id/datasets/edss/triage" "05"
"https://data.standards.cymru/Id/datasets/edss/other" "N/A"}
(map expand-identifier)))
;; =>
;{"https://snomed.info/sct" 1077251000000100,
; :name "This won't be mapped",
; "https://data.standards.cymru/Id/datasets/edss/triage" "05",
; "https://data.standards.cymru/Id/datasets/edss/other" "N/A"}
So we can use these functions to build maps between arbitrary value sets in both classification and terminology systems - we could add an arbitrary map to another terminology or classification
What about HL7 FHIR and openEHR?
But look! We have triage models defined by both HL7 FHIR and openEHR.
One approach is to try to centralise all of the definitions in one place. That’s a seductive approach, but brittle and difficult to change. It makes it harder in the future because you need a central authority to manage and curate. It’s much better to built a dynamic registry to which multiple code systems can register independently. It’s important to decouple at a technical level. That registry could be a small microservice that provides identifier mapping, semantics and inference, but that registrations at that registry can be made by multiple teams who don’t necessarily need to coordinate.
An openEHR triage archetype is available at https://ckm.openehr.org/ckm/archetypes/1013.1.304 but the data item is free-text! So much for semantic interoperability! This archetype is no good for our purposes.
Fortunately, openEHR could support defining a value set either internally, or using a codeable concept. But this illustrates an important problem of openEHR; you still need to agree on a set of archetypes that will be used across your organisations and software systems if you are going to have semantic interoperability. It isn’t a magic bullet.
The HL7 FHIR categories are https://hl7.org/fhir/STU3/v3/ActPriority/vs.html
The nice thing about both openEHR and HL7 FHIR is that we can get definitions in machine-readable formats. We can therefore write tools that import and register those value-sets.
The HL7 FHIR value set is defined under the namespace http://hl7.org/fhir/ValueSet/v3-ActPriority
In this simple proof-of-concept, let’s register the equivalence of the semantics manually. In a real-life application, we’d download the HL7 FHIR definitions in machine-readable formats and process accordingly in our inference engine.
(reg-equiv
{:system "http://hl7.org/fhir/v3/ActPriority" :value "UR"}
{:system "https://data.standards.cymru/Id/datasets/edss/triage" :value "03"})
Now that means we can take FHIR identifiers and turn them into what we need. Notice we turn a FHIR ‘UR’ code into an NHS Wales emergency triage “03” code.
(expand-identifier ["http://hl7.org/fhir/v3/ActPriority" "UR"])
;; => {"https://data.standards.cymru/Id/datasets/edss/triage" "03",
;; "http://hl7.org/fhir/v3/ActPriority" "UR"}
We can do it in reverse as part of a collection of different identifiers:
(apply merge (->> {:name "This won't be mapped"
"https://data.standards.cymru/Id/datasets/edss/triage" "03"
"https://data.standards.cymru/Id/datasets/edss/other" "N/A"}
(map expand-identifier)))
;; =>
;{"https://snomed.info/sct" 1064901000000108,
; :name "This won't be mapped",
; "https://data.standards.cymru/Id/datasets/edss/triage" "03",
; "http://hl7.org/fhir/v3/ActPriority" "UR",
; "https://data.standards.cymru/Id/datasets/edss/other" "N/A"}
And that means we can now do logic such as asking, was this an urgent category and it doesn’t matter whether the information is encoded in SNOMED CT, the NHS Wales information standard or HL7 FHIR.
It should be no surprise that we can do the same for cross-border interoperability. The NHS England data dictionary has the same triage categories, except code “06” (See and Treat). We can either work to share definitions and therefore share the same namespace, or instead provide a declarative map from which the logic of mapping can be encoded and used by machines. If the namespace is shared, then coordination will be necessary so that “06” isn’t accidentally re-used for a semantically different triage category. If different namespaces are used, no coordination would be necessary except in tracking a declarative map from one to another.
I’ve built mapping between SNOMED CT, ICD-10, Read codes and other codesystems in my own terminology server, written in golang. My real version of the Janus tool uses that to drive inferential logic using SNOMED as the lingua franca.
My next step is to explore the use of off-the-shelf ontology tools to see whether they have the right performance and scaling characteristics to handle such a large ontology.
For NHS Wales, we need to publish our information standards in machine-readable formats, adopt namespaces for our value sets and publish maps to those classifications from operational terminology systems.
Learning points:
- We need to namespace identifiers used for central reporting to give them meaning
- We need to publish datasets and category definitions in machine-readable formats so that they can be processed
- We need to publish a declarative mapping from one identifier namespace to another if we wish to foster semantic interoperability; an ontology of ontologies.
- Clojure (and other lisps) are really nice for doing exploratory programming using data.
Mark