9 Reasons Parsing Unstructured Medical Record Data is Difficult
One of Tenasol’s primary operations and areas of expertise is extraction of data from a wide variety of healthcare data sources. Medical records make up the bulk of these, and come in many different flavors (HL7v2 ADT, HL7v3 C-CDA, HL7v4 FHIR, Image/PDF/TIFF, and more).
Digital representations are not often easy for human eyes. JSON and XML files often appear as jumbled words with brackets and while automated software can make some decisions on these, they are most often rendered as HTML or as a PDF file for a human to view and make medical decisions on. In previous blogs (ADT, CCDA, FHIR) we cover the digital formats.
The majority of records exchanged outside of the point of care are in unstructured PDF or multipage TIFF format since that is what insurers typically receive when requesting information from hospitals. A minority of patient information is transmitted in digital format to these entities. Because of this, Tenasol’s mission often relies upon these difficult-to-parse formats as few other vendors are able to abstract information to the same precision that Tenasol is capable of.
This blog post is meant to describe some of the drawbacks and hurdles of rendered and paper medical records.
Importance of Unstructured Data
The grand majority data in healthcare is unstructured.
Massive amounts of data are contained in a record, regardless of if it is unstructured or not. Codified data, forms, images, free text alone are just one component. This data may all also be timestamped to build a patient timeline of their location and overall health.
Modern machine learning techniques further allows us to forecast patient risks and how imminent they are, as well as even detecting what information may be accidentally undocumented. Going further, the metadata of the file itself offers more information.
Tenasol is capable of extracting this data (such as diagnosis codes, lab codes, drug codes) even when they are not explicitly stated and turning these into structured data for such use cases as generalized data extraction, risk adjustment, HEDIS quality measures, prior authorization, and more.
Unstructured Data Hurdles
1) Medical record variation is massive
Almost no assumptions may be made about a medical record. Far too often I have met development teams that have tried to detect sections of medical records to make assumptions about data found - but these are fluid in definition. There are no rules about structure. A medical record may contain no visits and simply describe patient information, or it may describe just one or many encounters with practitioners.
2) There is no assurance that pages are in sequence
Further more, pages may easily be shuffled during handling, faxing, or scanning. This means that information that is split across pages may be unrelated or unreliable.
3) A medical record may be about one or many patients
Depending on how it was rendered and transported, a rendered medical record may contain more than one patient. This may be the case if a hospital is sending medical data to an insurance agency on numerous patients at the same time based on a payer chase. It may be up to the payer to split those back apart for review purposes.
9) Data Duplication
Electronic: A deceivingly common feature of all types of medical records is duplicated data. On electronic records, this occurs either due to issues with a software interface identifying a unique record or a practitioner re-attaching all past visits to a record that already has them.
Pages: For a paper medical record, this can happen with pages if a practitioner transmits the same patient chart multiple times on accident during the same transmission.
Data: Within the record it can happen if information like vitals are represented repeatedly in multiple sections - for example once in the encounter and once in a vitals summary table.
Duplicate Transmissions: Furthermore, it is possible for points of care to send the exact same medical record multiple times. Tenasol offers deduplication features on a sliding scale - so documents that are not 100% identical due to features such as fax headers are still treated the same. In human review for verification, Tenasol has discovered clients with up to 40% of duplicate medical records in queue for human review, resulting in substantial resource costs that are no longer necessary with this technology.
Let Us Know How Tenasol Can Help!
Tenasol excels in transforming unstructured medical records into usable, structured data through advanced AI and machine learning technologies. By addressing challenges such as variable formatting, poor legibility, page rotation, and data duplication, Tenasol ensures precise extraction and organization of vital patient information. These capabilities empower healthcare providers and insurers to efficiently manage vast amounts of data, reducing resource costs and improving decision-making. By leveraging its expertise in handling diverse formats and developing robust solutions for complex records, Tenasol continues to set a benchmark for innovation in healthcare data management, driving better outcomes for patients and stakeholders alike.