This blog post was originally presented at the FHIR North 2023 conference. You can rewatch the original presentation here.
One of the biggest challenges that remain in healthcare is integration and interoperability. In fact, majority of hospitals in Canada still uses HL7v2 for integration capabilities for various business critical systems such as the ADT (Admission, Discharge, Transfer) feeds, and appointment scheduling feeds.
A brief introduction on HL7v2
HL7v2 is a standard that was created in 1989 and meant to be a communication standard. As a quick recap, here is an sample HL7v2 scheduling message:

As you can see above, HL7v2 messages are text-based messages that are separated by pipes and carets. Each line in a message represents specific information. For example, the PID line represents Patient Demographics Information, whereas the PV1 line represents a patient visit.
One reason why it is formatted in such a compact manner is due to the limitations of programming back in the days - HL7v2 parsers are extremely memory-efficient, character based and written in C. The standard was built mainly for computers and not for humans. HL7 FHIR on the other hand is built to be human-readable in JSON format.

In general, HL7v2 can be read through understanding the specifications that is published. At this point, it may seem easy to translate HL7v2 messages to a more structured HL7 FHIR message.
Existing solutions to translate HL7v2 to HL7 FHIR
There are various parsers that have been created by big corporations that aim to perform translations. Here are 2 parsers which one may find useful:
- https://github.com/microsoft/FHIR-Converter
- https://github.com/LinuxForHealth/hl7v2-fhir-converter
However, for reasons that will be obvious later, these converters are not able to fulfil even simple integration capabilities, as it only offers "syntactic interoperability." In short, that means we understand each field at face-value, and nothing more than that.
Real Problem #1: Many implementations store additional details in custom segments

Here are 3 lines in the HL7v2 message that was just shown above. NTE stands for Notes and Comments, and if one was to pass this through a conventional converter, if one is lucky, it may be translated to a FHIR DocumentReference object. However, as a human looking at this, you may be able to see that the message actually means the communication consent the patient has given to the clinic in question. The patient has given consent for phone communication and declined consent for email based communication.
Real problem #2: Workflows varies across implementations

Verto has worked with many HL7v2 implementations before and we have realized that even in the same hospital network, the same HL7v2 message type may not contain the same information, as it depends on how the organization originally adopted the standard. Although the Visit Type Field (such as Diabetes Followup) is available in the scheduling system in Hospital A, the scheduling system in Hospital B does not contain this information, instead, deferring to the "Pre-admission" message in their ADT (Admission, Discharge, Transfer) system.
Current Solution: Hire Business Analysts
The current solution is to hire business analysts to discover and understand the high level workflows implemented by hospitals, as well as perform a deep-dive into each message and manually create mappings (recall that we cannot use existing converters out of the box).
It doesn't take an explanation to see that this is both labor intensive and extremely time consuming.
At Verto, we always think about how to make integrations faster, as reducing the time-to-value is one of our primary differentiators.
The Verto AI Solution
Verto has invented a solution to this problem via our patent: Method and system for consolidating heterogeneous electronic health data.
The first step is to passively listen to one or more HL7v2 feeds and analyzing it with our defined "4 Pillars": Positional Encoding, Data Type, Temporal and Adjacency.
Positional Encoding
Positional Encoding performs syntactic analysis with existing HL7v2 specifications. We are not reinventing the wheel here - this pillar can be as simple as running the message through an existing HL7v2 translator and obtain what the field is supposed to mean, in face value. This works very well for some segments, for example, the Patient Demographics.
However, it is easy to see through the above example that positional encodings can only get us so far... we'll need to understand much more.
Data Type
This pillar attempts to independently verify whether or not the positional encoding holds up. It is achieved through performing feature engineering on independent fields, such as:
- Alphabetical
- Alphanumeric
- Custom regexes
- Dictionary matching
- Common NER categories
With this, we can train a model to detect an Ontario Healthcard purely based on its attributes, such as whether or not it is alphanumeric, if the number contains a check digit, etc.

Another possibility is that the value matches a dictionary element that we support, such as ICD-10 clinical concepts (e.g., I10 - Essential Primary Hypertension) or even a last name. Since we are analyzing it for hundreds of thousands of messages, it is sufficient to come to a conclusion even if only 70% of the values match the "Last Name dictionary".
Temporal
This layer aims to independently infer data type by performing temporal comparisons with other values in the same position. Again, since we are analyzing a group of messages, we can perform analysis on many values that belong to a single field.
For example, we might detect that this field is integer incremental in nature (i.e. values that count up by 1). These values are very often some sort of identifier, such as patient identifiers, appointment identifiers, etc.
Another example would be boolean values. If we have detected only 2 values, it is likely that it represents True or False. What's more likely is detecting 3 values, where 1 value is NULL.
Equally likely is analysis on empty values (null values). The percentage of null values might give information on how valuable mapping it would be.
Adjacency
This layer aims to infer data type based on adjacent elements. This pillar is one of the most important due to how HL7v2 messages are formatted. In this example, through observing the N-1 and the N+1 word, we are able to come to a conclusion that there are multiple patient identifiers involved (i.e. the unique patient identifer and the Medical Record Number)

Our Consent Mappings
With the above pillars understood, let's go back to our original problem and see how these pillars performed in this situation:

We can see through the analysis, it is able to recognize that the "Notes and Comments" section actually stores patient consent. With this automated mapping, we are now able to more confidently perform orchestration - that is, confidently sending emails and text messages that are compliant with the patient consent!
Manual Review of Mappings
The human should not kept out of the loop in any responsible use of AI. As with any AI process, Verto provides an opportunity for humans to review mappings generated and to confirm or reject mappings. For each field mapped, justifications will be outputted for each Pillar that made it, like how the consent segment was deconstructed above.
Conclusion
What we've done here has increased integration time tenfold. Our results speak for themselves - we were able to obtain OLIS (Ontario Lab Information System) certification in 4 weeks, whereas the normal timeline for integration is 8 months.

The outcome is that HL7v2 messages are translated to HL7 FHIR without any information loss. The translated messages can now be used in our Digital Twin, for comprehensive population exploration, to power automations such as patient engagements, and better and more focused alerting for clinicians.
If you find what we're doing here interesting, we are always looking for like-minded individuals to join our team! Feel free to take a look at open positions here and we'll talk!