7 Roadblocks in Healthcare Generative AI
Generative AI’s media presence lends to the question of its present applicability in healthcare. This is not limited to just “Healthcare Large language Model (LLM) / Transformer” systems used in text-to-text applications, but also image-to-text and text-to-image generative AI use cases. These systems have grabbed a large media presence and moved the goal posts of what is possible to a seemingly unknown distance.
The 2023 book “The AI Revolution in Medicine: GPT-4 and Beyond”[1] by Isaac Kohane, Lee, Peter, Carey Goldberg, and a forward by OpenAI CEO Sam Altman was one of the first print media to fully delve into the implications of textual generative AI within the hospital. The bulk of use cases analyze patient care and ‘AI in the loop’ scenarios that are still beyond what is currently permissible by ethical or legal standards. The most near and present of these suggested services is record summarization. AI is still very useful - but generative AI will remain restricted in its utility within healthcare use cases.
So it begs the question, if these systems are narrowly present in healthcare now, what barriers are preventing wider adoption?
1. Healthcare LLM & HIPAA [partially solved]
HIPAA[2] is a set of laws designed to protect healthcare data. A portion of it requires that healthcare data, according to best practices be encrypted in transfer and at rest.
While generative AI systems exist that can process this information, few of the currently available services hosted in environments that meet HIPAA standards. This includes all 3 of the top cloud vendors. Furthermore, healthcare-specific sequence-to-sequence models theoretically trained on PHI do not meet the standards of HIPAA when deployed for public (or private) use. While this issue in specific remains unsolved, generically trained LLM’s may still be used.
Solutions:
A. Time: Over time, these top brands, just as cloud services have done, will offer HIPAA-compliant flavors of their generative AI products that require a BAA agreement and possibly a higher cost premium. Furthermore, there is no technological barrier one of these firms performing this now, and delivering a result safely to a user.
B. Self-Hosting: Hugging-Face[3], the opensource transformer library / leaderboard for generative AI offers models that may already be hosted by private companies themselves on their own servers that meet HIPAA compliance, neglecting the need for solution A. It is worth noting that these models have about a 10% deficiency compared to state-of-the-art trained models [4].
2. Healthcare LLM & Human Resources [partially solved]
The competitive market of healthcare comes with often very tight margins to spend on research and development for already-producing healthcare organizations. Normally these organizations have small development teams leading these efforts, composed of out-of-college talent to tackle them while more forefront talent take higher paying salaries at non-health domain centers of development.
Solutions:
A. Time: As the social media market consolidates freeing up more developers, computer science education grows, and the ease of deployment of these services increases, the human resource challenges will get easier. This is already being seen by a large shift to consumer facing healthcare technologies built by this new generation of talent.
3. Language Structure of Medical Records [partially solved]
The bulk of medical data in the United States, as described in a previous post, still rests in unstructured formats. Even though the text appears as characters, there is often handwriting, tables, and form features like check boxes that are custom to individual practices. In addition to this, unstructured medical language is far from standard language that these models are trained on.
This presents several issues. A) image text extraction with deep learning directly is significantly higher in cost than production OCR solutions which employ a mix of machine learning and image hashing techniques. B) the text drawn from these systems may sometimes appear inconsistent or non-sensical based on how it is extracted - this is different from how the LLM is trained. C) the language is medical in nature which is far from how the language model was trained. D) Training a sequence to sequence text model on medical (PHI) data comes with extremely high risks subject to HIPAA violations if those trained sequences can be extracted during deployment.
Solutions:
A. Substantial improvement and cost-effectiveness of image-to-text OCR: A new form of OCR, that makes use of image-to-text generative AI capabilities to fully extract tabulated or form-level information and condense it to identical text output without errors or hallucinations
B. Retirement of image-based medical record formats: By no longer necessitating OCR, its errors become redundant in processing medical records. However all current and past existing medical records that are stored as images would need to be converted or cease utility, putting this at least a decade off.
C. LLM’s that can be trained on smaller amounts of data: This is an extremely interesting space where LLM models can be retuned, but this would permit a smaller amount of manually de-identified records to adjust a general language LLM.
4. Healthcare LLM Performance & Errors [unsolved]
Current performance of very top generative AI systems still sit around 90% accuracy for the bulk of tasks, which is still well below the demands of many healthcare use cases.
With the advent of AI, comes the notion that no AI system will ever be correct 100% of the time for a sufficiently complex task. Presently, hallucinations are known to fill gaps where an answer may not be sufficiently determined which in healthcare has massive consequences. Furthermore, an LLM is not transparent and cannot be debugged. A minor error of the system with a large negative impact will therefore invalidate the entire system, without an explanation as to why.
Furthermore, generative AI does not produce confidence levels for each scenario as traditional machine learning approaches do, which puts the output as whole at risk rather than individual outputs.
Solutions:
A. Restrict to use cases with a low penalty for error: Making use of generative AI in cases where errors can be permitted, such as those not involved with the care of a patient may presently be the best resolve.
B. Evolution to generative AI that has confidence intervals: While this is only theoretical for now, it may allow for output to only be restricted to what a system truly can state as fact, rather than the knowledge gaps in between.
5. Healthcare LLM Compute Cost [mostly unsolved]
Generative AI is expensive [5].
Presently competitive generative AI models (text-to-text specifically) sit on the order of 1.5 trillion parameters [6]. This must be held in-RAM, usually of multiple video cards, rather than held in the hard drive of a computer. Furthermore, because current video card systems make use of CUDA, a software layer exclusively owned by Nvidia for processing with video cards, the resources required to do so remain at high prices.
Translated, this cost manifests itself in user subscriptions associated with generative AI services, or to venture capitalists covering the costs to maintain those services, which were estimated in 2023 to be approximately $0.36 in costs per query for OpenAI [7].
Solutions:
A. Restrict use of generative AI to scenarios that justify costs: This is coverable by subscriptions pricing, ad revenue, or per-unit processing services that cover the costs of the same system.
B. More neural network hardware options: This would drive price down for AI services by creating more fragmentation in the AI processing world or simply more efficient processing of those networks. Advances are being made on hardware with TPUs (Google)[8] to form a more competitive market with Nvidia.
C. More efficient neural networks: Reducing the size of a neural network while producing the same output, more efficiently computing output, and pre-caching the answers to common questions are large subjects of interest at present that are making grounds in the cost of compute.
6. Healthcare LLM I/O Limits [partially solved]
While seemingly infinite, the amount of text a generative AI software may be asked, and the amount it may create is limited. On the higher end of public options this sits at around 32,000 words. Stated more accurately, the total amount of words permitted is capped at 32,000 words for the sum of the input and the output. So if you are to ask a question with 31,999 words, you will only receive a one-word answer if that is it’s limit.
The average size of a paginated medical record is roughly 80 pages (mean is 30), with 300 words per page, leaving a token count of 24,000. Records have been recorded to go as high as 50,000 pages in rare instances. In the 50,000 page case, that is 15 million tokens. This disregards token data used to describe positional data of words, which may be valuable in understanding the content of a record.
Solutions:
A. Part-wise processing: In the case of the large document, processing records piece by piece enable the processing of exceptionally large records, and then in a final step combining or possibly deduplicating the information.
B. Increasing I/O limits of neural networks: While expected, it is not technically infeasible to make this an infinite parameter, but has yet to be done on a high-performing generative system.
7. Time-Boxing and Limit of Knowledge [partially solved]
Somewhat tied with the previous question, the recency of a system is of high importance, especially in healthcare where there are standards and systems released and updated on a regular basis. Any generative AI system not aware of recent releases, trends, and changes is not able to quickly adapt without an infinite-reaching ability to utilize both public, and in some cases private data.
As an explicit example, consider the SNOMED coding system, which is updated every 6 months. Systems such as ChatGPT 3.5 is time-boxed to public information prior to 2022, and it only has to data hosted on easily and publicly accessible data sources.
Solutions:
A. Real-time internet access: To some extent, this has been completed with GPT-4. However the code above, which is the SNOMED translation of a squirrel bite, remains unknown as it is not posted on any public website, but rather in downloadable directories, which would also need to be accessible and processable by some means. This remains unsolved by opensource systems.
B. Custom engines given an updated and manicured dataset for their purpose: In the event of processing large volumes of unstructured data, providing them with constant access to a system that is treated as a source of truth for its purposes could permit this further.
Conclusion
Generative AI holds transformative potential for healthcare, promising to enhance patient care and operational efficiency. However, its deployment is hampered by multifaceted challenges including regulatory compliance, human resource constraints, technological limitations, and ethical considerations. Solutions like HIPAA-compliant services, advancing OCR technologies, and improved neural network capabilities are emerging, yet obstacles such as AI errors and the high computational demands persist. As these barriers gradually diminish through technological progress and regulatory adaptation, the broader integration of generative AI into healthcare seems inevitable, promising significant benefits but also necessitating careful navigation of its complexities.
Sources
[2] Health Insurance Portability and Accountability Act (HIPAA). www.hhs.gov. Retrieved from https://www.hhs.gov/hipaa/index.html
[3] Open LLM Leaderboard. huggingface.co. Retrieved from https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
[4] Anil, Rohan, et al. "Palm 2 technical report." arXiv preprint arXiv:2305.10403 (2023).
[5] The AI Revolution Is Already Losing Steam. wsj.com. Retrieved from https://www.wsj.com/tech/ai/the-ai-revolution-is-already-losing-steam-a93478b1
[6] Bastian, M. (2023, July 3). GPT-4 has more than a trillion parameters - report. THE DECODER. https://the-decoder.com/gpt-4-has-a-trillion-parameters/
[7] Patel, Dylan, and Afzal Ahmad. "The Inference Cost Of Search Disruption – Large Language Model Cost Analysis." SemiAnalysis, 9 Feb. 2023, www.semianalysis.com/p/the-inference-cost-of-search-disruption.
[8] Google. (n.d.). Tensor processing units (tpus). Google. https://cloud.google.com/tpu/?hl=en