Research

Methods, models, and clinical evaluations underwriting the platform.

Our research advances medical AI for healthcare — foundation models that learn from real patient data and are evaluated on real clinical endpoints. Published work from our team across world models for longitudinal EHR, whole-genome encoders, molecular LLMs, and high-resolution vision–language models.

2026

World Model / Longitudinal EHR

The patient is not a moving document: a world-model training paradigm for longitudinal EHR

Introduces SMB-Structure, a world model for structured EHR that combines a Joint-Embedding Predictive Architecture (JEPA) with supervised next-token prediction. SFT grounds the model to reconstruct future patient states in token space; JEPA predicts those futures in latent space from the initial representation alone, forcing trajectory dynamics to be encoded before the next state is observed. Validated across 40,000 patients on long-horizon prediction tasks.
2025

Oncology & Whole-Genome Sequencing

GenVarFormer: Predicting gene expression from long-range mutations in cancer

A whole-genome-sequencing foundation model trained to predict the functional consequence of variants on gene expression. Distinguishes rare driver mutations from passenger mutations in the non-coding genome. State-of-the-art on downstream cancer tasks.
2025

Molecular Language Model

Patient-specific biomolecular instruction tuning of Graph-LLMs

Links proteomic graph neural networks to language, creating a shared representation space between molecular and cellular foundation models. Approach generalizes to any graph-based representation at the cellular level.
2025

Electronic Health Records

Building the EHR foundation model via next-event prediction

Reframes EHRs as timestamped chains of clinical events and fine-tunes large language models to predict the next event, improving temporal reasoning over disease trajectories. +4.6% AUROC over task-specific EHR models.
2024

Vision–Language / Medical Imaging

Advancing high-resolution vision–language models in biomedicine

Foundational paper showcasing the strength of the Standard Model approach across high-resolution biomedical imagery and language. Establishes the vision–language backbone that later scale-specific papers build on.

Methods, models, and clinical evaluations underwriting the platform.

The patient is not a moving document: a world-model training paradigm for longitudinal EHR

GenVarFormer: Predicting gene expression from long-range mutations in cancer

Patient-specific biomolecular instruction tuning of Graph-LLMs

Building the EHR foundation model via next-event prediction

Advancing high-resolution vision–language models in biomedicine