Summarization remains the highest utility of Healthcare LLM AI. Here we cover expected healthcare LLM tech developments in 2025 from most to least significant for the healthcare domain excluding advances in hardware and policy.

2024 saw high growth in LLM use - which largely are yet to hit healthcare. Reasons for this are covered in previous blogs (Roadblocks in Healthcare AI, The Paradox of Healthcare AI).

A few definitions/points to start:

Large Language Models (LLM) are a class of deep learning where tokens (words, punctuation) are numerically represented as a function of (A) the token is itself and (B, usually) the context in which it rests. Generative Pretrained Transformers (GPT) are a subclass of LLMs where text is generated using the “transformer” neural network architecture. In deployment, given an input text and usually a ‘context‘, it creates an output. This is the fame of ChatGPT and Claude AI models. These models are almost always batch trained as opposed to “learning in real time“ due to legal risks.

Healthcare LLM architecture — *High level architecture of a Healthcare LLM GPT AI*

Healthcare LLM quality is driven by a lot of factors. Quality is dependent on architecture, parameters, training strategy, outside data access, hardware used, how it is hosted, and data filters applied to the pipelines to clean and structure output just to name a few.

Compute cost reduction will precede further large AI breakthroughs. These models are not cheap to train and experiment with. Smaller models that can fit on individual machines permit more experimentation by the community and will quickly drive additional innovation.

HIPAA Compliant Healthcare LLM Systems

HIPAA compliance of tech always lags the consumer market. Currently these systems when hosted by companies like Tenasol are custom and run on private HIPAA-compliant servers to prevent issues associated with HIPAA and/or AI ethics violations.

HIPAA Compliant Healthcare LLM systems will proliferate in 2025: This requires tight governance and legal agreements in place which many of the more consumer-focused GPT systems will likely begin to offer in an attempt to widen their market share with new investment resources.

Healthcare LLM Compression

Compression (reducing the size) of Healthcare LLM systems is a must. It reduces hardware requirements and cost of deployment, but usually not the cost of training. It is heavily studied at present and is a very complex subject.

Current top performing models sit at 300 gigabytes in size, meaning that specialized hardware is required just to host them on a cloud as they must be held in VRAM during operation (video card random access memory). By comparison you’re phone probably has 2-4 gigabytes of VRAM.

The cost of training these is on the order of tens of thousands of dollars at a minimum. Deploying them, while significantly cheaper (and seemingly free to the public) it is still expensive and encounters a myriad of other issues. Compression reduces deployment costs.

Compression may be done by:

Reducing floating-point precision of parameters (decimal places).
Strategically simplifying architecture or removing nodes of the final neural network without performance being damaged.
Distillation (models teaching other models)
…and others

Healthcare LLM Chunking

LLM GPTs suffer from critical input and output size limits.

If ChatGPT currently limits to 128,000 input tokens, this is an issue as a medical record can easily reach 10,000 pages (300 words * 1.25 (token-word ratio) * 10000 pages = 3.75M tokens) which easily exceeds the input limits of most GPT systems. In this example we even excluded token positional data, which is a must.

Commonly this is resolved by chunking whereby the data is broken up, fed, and then aggregated in a final step, requiring multiple runs, and destroying the ability to use information in other chunks.
It is a well known fact that increasing input size reduces performance, however there may be research breakthroughs here.
Output size limit is also a separate issue.

Healthcare LLM within Mixture of Experts

Healthcare LLM GPT Mixture of Experts — *Mixture of experts chooses the best model per task*

If current systems have limits on parameters due to hardware and cost limitations, there is a theoretical limit of knowledge. By making models that are unique to specific domains these parameters may best be utilized to make “subject matter expert LLM“ systems, which are becoming more common. Healthcare is probably the most prominent example as it has a significantly different vocabulary than other domains.

Mixture of Experts (MOE): taking this a step further, larger platforms are increasingly interested in the concept of mixture of experts. This is where several models are individually subject matter experts, and a parent or selecting model is used to decide which expert receives the input query.

Healthcare LLM RAG Features

Healthcare LLM RAG permits external data use — *RAG permits external data use*

Mixture of experts does not solve the limit of knowledge or time boxing of knowledge problem - the concept that information outside the training set, or that was created after training is not accessible.

Retrieval-Augmented Generation (RAG) is the solution to this problem and is in early stages. This system seeks to pull external data when required to add to input when generating output. and is one of the most important features in the proliferation of LLM technology.

Specifically in healthcare, this could possibly permit CMS or HHS updates to be incorporated when necessary, even if the model was trained before their release. Also the vastness of healthcare knowledge and code database sizes require RAG implementations to make healthcare LLM AI more possible.

Harmful Bias Removal in Healthcare LLM Systems

harmful bias removal in Healthcare LLM GPT AI — *Harmful bias removal, depicted*

In healthcare, there is no fear greater than the introduction of harmful bias by these models, which at the core is the reason why they remain largely in the summarization domain at the moment.

Being AI models, they are built on bias such that all outputs are created from an input. It is however the harmful biases that may result from training data, such as those of minorities or lower income individuals. The removal of these biases in specifically Healthcare LLM AI models are often done in the following ways:

Removal of instances of bias from training data.
Applying an agent to evaluate output to instances of bias. An agent is in effect an extra LLM that sits in the background to monitor output before it delivered to the end user based on an instruction set.

Multimodal Processing in Healthcare LLM Systems

multimodal Healthcare LLM processing — *Multimodal processing permits multiple input types*

The ability to process all forms of data is a growing trend - here at Tenasol we process almost every single healthcare data format. LLMs as a whole are increasingly permitting more file types to be handled but are currently largely constrained to images and text. It is handled in two ways:

Conversion: For example, if a CSV or XLSX is loaded it is instead often treated as a text file. A PDF file may be interpreted as an image or run through an OCR engine by an app prior to be running through a Healthcare LLM.
Direct: One great solution that is emerging is direct-file interpretation such as Donut which makes use of image comprehension without OCR. These offer simpler user architectures with overall lower compute at the cost of some performance, depending on the task.

Federated Healthcare LLM Training

Federated training is the training of these models using distributing computing - a high number of individuals pool their individually small computer resources together to train a singular, massive model. The users do not have control over how the training is performed, nor do they supply the training data.

Federated Healthcare AI LLM Example — *‘Users‘ here are simply donating compute to the coordinator*

Think of federated training as the crowd-sourcing, or bitcoin mining of LLMs. Actually, the development of an LLM via bitcoin mining operation is in the realm of possibility where users are rewarded digital assets for their hardware testing model changes. Folding@home is the most successful demonstration of federated systems used for protein folding and is still considered one of the most powerful computer systems in the world.

Healthcare LLM Federation opportunity — *Comparison of the largest federated network in the world (*Folding@home), used for healthcare, to the top 3 super computers.

While the idea and implementations are weak at present, a significant amount of research is going into how to better understand it and make it work. The problem with this is that high volumes of nodes in a neural network are usually effected during training, making the training of individual portions of a network very difficult in isolation.

Conclusion

The rapid growth of healthcare LLM systems is poised to revolutionize the industry, though challenges remain. In 2025, advancements will focus on these developments and signify a shift from summarization to actionable, real-world applications, fostering integration into EMRs, improving patient outcomes, and driving efficiency.

Current benchmarks for the Open-LLM Leaderboard are posted here.

Contact us to see how we can help you.

2025 Healthcare LLM Technology