LifeGuard

A Gemma 4 based clinical decision-support prototype for community health workers, fine-tuned on WHO IMCI (Integrated Management of Childhood Illness) child-health protocols.
Overview
- Name: LifeGuard
- Type: Personal, open-source clinical decision-support prototype
- Live demo: mrutyunjaypatil.dev/lifeguard
- Purpose: Adapt a general-purpose small language model into a specialized assistant that helps community health workers triage and assess sick children using WHO IMCI methodology, including in low-resource and offline settings
- Important: LifeGuard is a decision-support tool, NOT a medical device. It does not diagnose and should not replace local clinical protocols or professional judgment. As the project states, it is "a hackathon prototype for decision support and workflow exploration. It is not a medical device, does not diagnose, and should not replace local clinical protocols or professional judgment."
- Approach: An iterative loop of data generation, Unsloth fine-tuning, GGUF conversion, and repeated evaluation against a held-out clinical scenario set
Key Features
- Fine-tuned Gemma 4 E4B on WHO IMCI-style child-health assessments using Unsloth with LoRA adapters, producing measurable gains over the base model
- WHO IMCI classification that returns Pink / Yellow / Green / Unknown triage colors along with key signs, recommended next actions, and missing information, without inventing diagnoses or treatment details
- Offline-capable inference via a quantized GGUF model designed to run locally with llama.cpp style runtimes, plus on-device inference paths in the mobile app (LiteRT-LM and Ollama backends)
- Hosted live demo with a React chat interface backed by a Modal FastAPI inference endpoint, fronted by a Cloudflare Worker proxy that adds per-IP rate limiting
- Mobile scaffold (React Native / Expo) for Android and iPhone, with both on-device and hosted inference options and a structured drug-dose lookup tool
- Open artifacts: Kaggle training and evaluation notebooks, raw kernel logs, evaluation JSON, and deployment guides committed to the repository
How It Works
Dataset and fine-tuning
- Dataset: 13,345 deduplicated examples (12,943 train / 402 validation) built from IMCI-style decision trees and reference material, spanning 12 task kinds (facts, scenarios, counseling, multilingual, procedures, danger signs, classification, follow-up, dosing, function calling, and refusals)
- Format: Chat-style
messageswith user / assistant turns; tool-call examples are converted into Gemma 4 compatible alternating turns before training - Reference data:
decision_trees.json(IMCI-style decision trees) anddrug_doses.json(structured medication and dose lookup used by the tool layer) - Training: Gemma 4 E4B fine-tuned with Unsloth and LoRA on Kaggle, followed by GGUF conversion and a Hugging Face push, all captured in versioned notebooks (v1, v2 with weak-bucket top-up, and an interactive-chat notebook)
Deployment
- Backend: A Modal app exposing a FastAPI service on an NVIDIA T4 GPU. It downloads the quantized model
mrutyunjay-patil/lifeguard-gemma4-e4b-gguf-v2(filegemma-4-e4b-it.Q4_K_M.gguf) from Hugging Face into a Modal Volume, validates it, and serves it with llama-cpp-python compiled for CUDA, using a 4096-token context with full GPU layer offload - Endpoints:
GET /healthreturns model metadata;POST /generateaccepts a prompt (1 to 4000 chars),max_tokens(default 512),temperature(default 0.1),top_p(default 0.85), and optional stop sequences - Frontend: A Vite + React chat UI with predefined clinical scenarios, talking to the backend through a Cloudflare Worker proxy that enforces per-IP rate limiting
- Offline path: The same GGUF model can run on-device; the mobile app ships LiteRT-LM and Ollama inference backends alongside a hosted option, so an assessment can be driven without a network connection
Results and Evaluation
Evaluation runs on a fixed set of 50 IMCI scenarios, scoring two things: the primary classification (exact match) and the triage color (Pink / Yellow / Green). The fine-tuned model shows a real jump over the base model:
| Metric | Base Gemma 4 E4B | LifeGuard v2 | Delta |
|---|---|---|---|
| Primary classification accuracy | 18% (9/50) | 32% (16/50) | +14 pp |
| Triage color accuracy | 72% (36/50) | 78% (39/50) | +6 pp |
The project is candid that the v2 GGUF "is not clinically perfect, but it demonstrates a real jump from the base model on the project evaluation set." Per-bucket breakdowns (cough, diarrhoea, fever, ear problem, malnutrition, young infant, danger signs, HIV, immunization, and more) and an LLM-judge variant are included in the artifacts/evaluations/ directory for both v1 and v2 runs.
Technical Stack
- ML: Gemma 4 E4B, Unsloth fine-tuning with LoRA, GGUF Q4_K_M quantization, Hugging Face model hosting
- Backend: Modal (serverless GPU), FastAPI, llama-cpp-python with CUDA
- Frontend: Vite, React, TypeScript, Cloudflare Workers (rate-limiting proxy)
- Mobile: React Native / Expo with on-device (LiteRT-LM, Ollama) and hosted inference backends
- Tooling and artifacts: Kaggle notebooks, JSON evaluation reports, kernel logs, and deployment guides