Introducing Healthcare-Specific Large Language Models from John Snow Labs

John Snow Labs recently released a new LLM called BioGPT-JSL and capabilities tuned specifically to the medical domain. This article summarizes three things you should know about it. 

By KDnuggets on April 28, 2023 in Partners

1. They cover a range of common healthcare use cases. Here are the common healthcare use cases:

Inquire about medical topics: Use BioGPT-JSL, the first closed-book medical Q&A LLM based on BioGPT.
Interpret medical research: Provide a PubMed abstract for extracting key results.
Produce clinical text: Offer introductory phrases for completion.
Condense clinical encounters: Transform clinical notes (visit summaries, discharge notes, radiology reports, or pathology reports) into a single paragraph.
Simplify patient inquiries: Convert patient emails, calls, or posts into one-sentence questions.

a. Ask medical questions: Try asking questions to new BioGPT-JSL  (the first ever closed-book medical question answering LLM based on BioGPT):

input: How to treat asthma ?

Output: Asthma is itself an allergic disease due to cold or dust or pollen or grass etc. irrespective of the triggering factor. You can go for pulmonary function tests if not done. Treatment is mainly symptomatic which might require inhalation steroids, beta agonists, anticholinergics as MDI or rota haler as a regular treatment. To decrease the inflammation of bronchi and bronchioles, you might be given oral antihistamines with mast cell stabilizers (montelukast) and steroids (prednisolone) with nebulization and frequently steam inhalation. To decrease the bronchoconstriction caused by allergens, you might be given oral antihistamines with mast cell stabilizers (montelukast) and steroids (prednisolone) with nebulization and frequently steam inhalation. The best way to cure any allergy is a complete avoidance of allergen or triggering factor. Consult your pulmonologist for further advice.

b. Understand medical research: Give it a PubMed abstract and ask about what the key results were:

paper_abstract: "The visual indexing theory proposed by Zenon Pylyshyn
(Cognition, 32, 65–97, 1989) predicts that visual attention mechanisms are employed when mental images are projected onto a visual scene. Recent eye-tracking studies have supported this hypothesis by showing that people tend to look at empty places where requested information has been previously presented. However, it has remained unclear to what extent this behavior is related to memory performance. The aim of the present study was to explore whether the manipulation of spatial attention can facilitate memory retrieval. In two experiments, participants were asked first to memorize a set of four objects and then to determine whether a probe word referred to any of the objects. The results of both experiments indicate that memory accuracy is not affected by the current focus of attention and that all the effects of directing attention to specific locations on response times can be explained in terms of stimulus–stimulus and stimulus–response spatial compatibility."

Question: What is the effect of directing attention on memory?
Answer: The results of the present study suggest that the visual indexing theory does not fully explain the effects of spatial attention on memory performance.

Question: Does directing attention improve memory for items?
Answer: no

c. Generate clinical text: Prompt to complete.

Prompt: The patient is admitted to the clinic with a severe back pain
Text: The patient is admitted to the clinic with a severe back pain and we found that the patient had a large tumor in the lumbar area. The tumor was removed surgically. The patient was diagnosed to have a large tumor in a lumbar vertebra. The tumor was diagnosed as a chordoma based the pathological and immunohistochemical findings, but no evidence for metastasis or invasion to adjacent structures were observed during follow - up after operation.

d. Summarize clinical encounters: Ask to turn a visit summary, discharge note, radiology report, or pathology reports into one paragraph.

Input: Medical Specialty: Allergy / Immunology, Sample Name: Allergic Rhinitis
Description: A 23-year-old white female presents with complaint of allergies. (Medical Transcription Sample Report)
SUBJECTIVE: This 23-year-old white female presents with complaint of allergies. She used to have allergies when she lived in Seattle but she thinks they are worse here. In the past, she has tried Claritin, and Zyrtec. Both worked for short time but then seemed to lose effectiveness. She has used Allegra also. She used that last summer and she began using it again two weeks ago. It does not appear to be working very well. She has used over-the-counter sprays but no prescription nasal sprays. She does have asthma but doest not require daily medication for this and does not think it is flaring up.
MEDICATIONS: Her only medication currently is Ortho Tri-Cyclen and the Allegra.
ALLERGIES: She has no known medicine allergies.
OBJECTIVE: Vitals: Weight was 130 pounds and blood pressure 124/78. HEENT: Her throat was mildly erythematous without exudate. Nasal mucosa was erythematous and swollen. Only clear drainage was seen. TMs were clear. Neck: Supple without adenopathy. Lungs: Clear.
ASSESSMENT: Allergic rhinitis.
PLAN:
1- She will try Zyrtec instead of Allegra again. Another option will be to use loratadine. She does not think she has prescription coverage so that might be cheaper.
2- Samples of Nasonex two sprays in each nostril given for three weeks. A prescription was written as well.

Summary: A 23-year-old female presents with complaints of allergies, including worsening in her home state. She has tried Claritin and Zyrtec, but they haven't worked well. She has asthma and has no known medicine allergies. Her vital signs are stable, and she has mildly erythematous and swollen nasal mucosa. She will try Zyrtec instead of Allegra and use loratadine for nasal sprays. Samples of Nasonex were given for three weeks.

e. Summarize questions from patients: With 5 models for 5 contexts, MedicalSummarizer can also turn an email or post from a patient into a one-sentence question that person is asking:

Input: Hey all. I was diagnosed with pulmonary sarcoidosis in 2011 by biopsy. I'm now 54. Prednisone works so well on me it's scary, but so are the long list of issues with it. I live in Montana and forest fire smoke is already here from other states fires and the air is at moderate. Exposure kicks of my symptoms and then I spiral down. I wait until I have a histamine reaction from my immune system with itching in my ears etc to go in. I just want as much of a quality of life as I can get not revolving around docs, meds etc. I can't find a doc who will only give me prednisone when I'm in a flare where regular modalities aren't working. I'm paleo and gluten free. Almost completely decaf no dairy, don't drink smoke or use any drugs...soooo frustrating. I bought an infrared sauna to detox, it seems to help. I wonder if anyone who also has environmental triggers has used oxygen? It seems like having non-polluted air available would be wiser?? Thanks

Summary Question: What are the treatments for pulmonary sarcoidosis?

2. They’re more accurate than general-purpose LLM’s

It should come as no surprise that models trained with domain-specific data & experts outperform general-purpose models. The Python notebooks with these benchmarks are public if you need to reproduce or customize the benchmarks. We also released our own annotations (labelled datasets) for full transparency and welcome feedback and suggestions on how to improve them.

Clinical note summarization is 30% more accurate, in terms of Bleu and Rouge scores, than general state-of-the-art LLMs that is finetuned on summarization tasks (BART, Flan-T5, Pegasus).

Demo app: Medical Summarization
Colab notebook: Medical Summarization Notebook

On clinical entity recognition, these models make half of the errors that ChatGPT does. Here is a comparison of F1 scores of 12 common entities, compared to manual annotation by a team of medical doctors on sentences from the MTSamples website:

Demo app: Detect clinical entities with Spark NLP
Colab Notebook: Clinical Named Entity Recognition (NER)

De-Identification out-of-the-box accuracy is 93% compared to ChatGPT’s 60% on detecting PHI in clinical notes. Here is a comparison on full & partial recognition of sensitive entities from a manually curated set of medical notes:

Demo app: Detecting PHI in clinical notes
Colab Notebook: Clinical Deidentification and Obfuscation

Extracting ICD-10-CM codes is done with a 76% success rate versus 26% for GPT-3.5 and 36% for GPT-4. Here is a comparison of entity resolution to ICD-10-CM codes across the three large language models:

Demo app: Resolve Clinical Findings using the ICD-10-CM taxonomy
Colab Notebook: Clinical entity resolution

3. They’re production ready

Most current LLM’s are just not ready to be used in healthcare: they do have the right privacy & compliance guarantees, can return hallucinations or falsehoods with high confidence, and are not updated at the rate required for production applications. The models from John Snow Labs are designed from the ground up for high-compliance industries, and specifically they:

Run on your infrastructure, behind your firewall, under your security controls. No text is ever sent to any third party or cloud service.
No need to buy a shipload of GPU’s. The LLM’s are engineered to run on commodity hardware, which makes them both much faster and much cheaper to scale.
Regularly updated. LLM’s are regularly tuned as new research papers, clinical trials, guidelines and terminologies are published. Never go to production with a stale model.

These language models are available as part of John Snow Labs’ Healthcare NLP, which is a suite of Python libraries, models, and notebooks, that allows data scientists to easily put these capabilities to good use. The subscription includes access to regularly refreshed models and all new releases, which is important because models will be frequently rebuilt from scratch. Research is moving fast – we’ve gone through several model architectures in the past six months, and expect to keep rebuilding as research evolves. Because only one thing is certain about today’s state-of-the-art LLM’s: If you train one today, it will be outdated in 3-6 months.

Introducing Healthcare-Specific Large Language Models from John Snow Labs

1. They cover a range of common healthcare use cases. Here are the common healthcare use cases:

2. They’re more accurate than general-purpose LLM’s

3. They’re production ready

More On This Topic

Latest Posts

Top Posts