LifeGuard

A Gemma 4 based clinical decision-support prototype for community health workers, fine-tuned on WHO IMCI (Integrated Management of Childhood Illness) child-health protocols.

Overview

Name: LifeGuard
Type: Personal, open-source clinical decision-support prototype
Live demo: mrutyunjaypatil.dev/lifeguard
Purpose: Adapt a general-purpose small language model into a specialized assistant that helps community health workers triage and assess sick children using WHO IMCI methodology, including in low-resource and offline settings
Important: LifeGuard is a decision-support tool, NOT a medical device. It does not diagnose and should not replace local clinical protocols or professional judgment. As the project states, it is "a hackathon prototype for decision support and workflow exploration. It is not a medical device, does not diagnose, and should not replace local clinical protocols or professional judgment."
Approach: An iterative loop of data generation, Unsloth fine-tuning, GGUF conversion, and repeated evaluation against a held-out clinical scenario set

Key Features

Fine-tuned Gemma 4 E4B on WHO IMCI-style child-health assessments using Unsloth with LoRA adapters, producing measurable gains over the base model
WHO IMCI classification that returns Pink / Yellow / Green / Unknown triage colors along with key signs, recommended next actions, and missing information, without inventing diagnoses or treatment details
Offline-capable inference via a quantized GGUF model designed to run locally with llama.cpp style runtimes, plus on-device inference paths in the mobile app (LiteRT-LM and Ollama backends)
Hosted live demo with a React chat interface backed by a Modal FastAPI inference endpoint, fronted by a Cloudflare Worker proxy that adds per-IP rate limiting
Mobile scaffold (React Native / Expo) for Android and iPhone, with both on-device and hosted inference options and a structured drug-dose lookup tool
Open artifacts: Kaggle training and evaluation notebooks, raw kernel logs, evaluation JSON, and deployment guides committed to the repository

How It Works

Dataset and fine-tuning

Dataset: 13,345 deduplicated examples (12,943 train / 402 validation) built from IMCI-style decision trees and reference material, spanning 12 task kinds (facts, scenarios, counseling, multilingual, procedures, danger signs, classification, follow-up, dosing, function calling, and refusals)
Format: Chat-style messages with user / assistant turns; tool-call examples are converted into Gemma 4 compatible alternating turns before training
Reference data: decision_trees.json (IMCI-style decision trees) and drug_doses.json (structured medication and dose lookup used by the tool layer)
Training: Gemma 4 E4B fine-tuned with Unsloth and LoRA on Kaggle, followed by GGUF conversion and a Hugging Face push, all captured in versioned notebooks (v1, v2 with weak-bucket top-up, and an interactive-chat notebook)

Deployment

Backend: A Modal app exposing a FastAPI service on an NVIDIA T4 GPU. It downloads the quantized model mrutyunjay-patil/lifeguard-gemma4-e4b-gguf-v2 (file gemma-4-e4b-it.Q4_K_M.gguf) from Hugging Face into a Modal Volume, validates it, and serves it with llama-cpp-python compiled for CUDA, using a 4096-token context with full GPU layer offload
Endpoints: GET /health returns model metadata; POST /generate accepts a prompt (1 to 4000 chars), max_tokens (default 512), temperature (default 0.1), top_p (default 0.85), and optional stop sequences
Frontend: A Vite + React chat UI with predefined clinical scenarios, talking to the backend through a Cloudflare Worker proxy that enforces per-IP rate limiting
Offline path: The same GGUF model can run on-device; the mobile app ships LiteRT-LM and Ollama inference backends alongside a hosted option, so an assessment can be driven without a network connection

Results and Evaluation

Evaluation runs on a fixed set of 50 IMCI scenarios, scoring two things: the primary classification (exact match) and the triage color (Pink / Yellow / Green). The fine-tuned model shows a real jump over the base model:

Metric	Base Gemma 4 E4B	LifeGuard v2	Delta
Primary classification accuracy	18% (9/50)	32% (16/50)	+14 pp
Triage color accuracy	72% (36/50)	78% (39/50)	+6 pp

The project is candid that the v2 GGUF "is not clinically perfect, but it demonstrates a real jump from the base model on the project evaluation set." Per-bucket breakdowns (cough, diarrhoea, fever, ear problem, malnutrition, young infant, danger signs, HIV, immunization, and more) and an LLM-judge variant are included in the artifacts/evaluations/ directory for both v1 and v2 runs.

Technical Stack

ML: Gemma 4 E4B, Unsloth fine-tuning with LoRA, GGUF Q4_K_M quantization, Hugging Face model hosting
Backend: Modal (serverless GPU), FastAPI, llama-cpp-python with CUDA
Frontend: Vite, React, TypeScript, Cloudflare Workers (rate-limiting proxy)
Mobile: React Native / Expo with on-device (LiteRT-LM, Ollama) and hosted inference backends
Tooling and artifacts: Kaggle notebooks, JSON evaluation reports, kernel logs, and deployment guides

Architecture Diagram

Drag to pan, scroll to zoom