The brain is the only omnimodal system that does it right

BrainVI turns it into an API.

Cognition API · Cognitive research lab for AI

Give machines a sense
of human emotion.

Agents and robots are fluent in language and blind to people. BrainVI is building a cortical-embedding API — a neuroscience-grounded read on how a human mind responds.

Get API key Read the research

API in private beta · join the waitlist for early access

505 SUBJECTS· 755+ fMRI HOURS· 9 PUBLIC DATASETS· 81,924 CORTICAL VERTICES· fsaverage6 SURFACE· 6-STREAM ENCODER· CC0 / CC-BY LICENSED· 4 RESEARCH PAPERS· 505 SUBJECTS· 755+ fMRI HOURS· 9 PUBLIC DATASETS· 81,924 CORTICAL VERTICES· fsaverage6 SURFACE· 6-STREAM ENCODER· CC0 / CC-BY LICENSED· 4 RESEARCH PAPERS·

01↘ The gap

Intelligence reads your words.
It doesn't read you.

The blind spot

A robot in your home, an agent on your phone — each acts with no model of the person in front of it. Today's embeddings learn what looks alike on the internet, not how a mind responds.

The brain already solves this. Vision, sound, language, and feeling converge in one place — the cortex. That convergence is the representation machines are missing.

Principles · observed

α

The cortex is natively omnimodal.
β

Predict the brain, don't scan it.
γ

Honesty over flattery.
δ

Standing on prior art.

02↘ The thesis

The next frontier,
rooted in neuroscience.

Our bet: an embedding grounded in the cortex beats one scraped from the web. It carries how content is processed, not just what co-occurs online.

We call these cortical embeddings. Our flagship model already matches the published state of the art in brain encoding — on a third of the training data.

Read: The Average Brain Is No Brain At All→

EMBEDDING · CONCEPTPRIVATE BETA

# Cortical embedding — one omnimodal vector per stimulus
from BrainVI import MARY

emb = MARY.embed("clip.mp4")        # → 81,924-dim cortical vector
sim = MARY.similarity(a, b)         # brain-space similarity, not pixel-space

# How alike are two stimuli in a mind — not on the internet?
# Interface shown for illustration · API in private beta

↑ illustrative · not a live endpoint yet

03↘ The model · scroll to explore each stream

A six-stream
cortical encoder.

The encoder

MARY maps any stimulus to predicted cortical activity via six backbones that mimic the brain's pathways — 81,924 vertices of the fsaverage6 surface, one stream per sense.

MARY Nano 1.0, our research-preview model, matches the Algonauts 2025 state of the art — Pearson r = 0.216 vs Meta TRIBE's 0.2146 (Schaefer-1000) — on just ~23h of fMRI, a third of TRIBE's data. On held-out films it generalizes at r = 0.170, rising to 0.185 with MARY Nano 1.1.

01 · Vision

Primary visual cortex (V1) · occipital

Spatiotemporal motion — movement, dynamics, and timing.

02 · Scene / VL

Fusiform & lateral occipitotemporal

Vision-language — scene understanding & semantics.

03 · Audio

Superior temporal gyrus · A1

Audio — sound events, music, and ambience.

04 · Speech

Inferior frontal gyrus · Broca's

Speech — words, voice, and prosody.

05 · Narrative

Prefrontal cortex

Long-context language — narrative & comprehension.

06 · On-screen text

Visual word-form area (VWFA)

On-screen text — captions & legible detail.

Frozen backbones are used as feature extractors only · we do not train them.

Read the MARY-Nano whitepaper→

04↘ The corpus

Trained on a clean corpus of real brains.

SUBJECTS

505

Across 9 public video-watching fMRI datasets.

fMRI HOURS

755+

Commercially-safe, CC0 / CC-BY licensed.

VERTICES

81,924

Cortical surface points, fsaverage6 — 4× the field's resolution.

OF TRIBE'S DATA

33%

MARY Nano 1.0 matched the Algonauts SOTA on ~23h of fMRI. The rest is runway.

Sources · all public

We train on openly licensed (CC0 / CC-BY) datasets. With close ties to academic faculty, we conduct our own fMRI studies.

05↘ The API · private beta

One signal.
Many surfaces.

Score how a scene lands in visual cortex. Compare two voices by predicted response. Read a user's engagement. Condition a generative model on a cortical target.

Capabilities below are in private beta · join the waitlist for access

01 · EMBED

Cortical embeddings

Turn any image, video, audio clip, or text into a single omnimodal cortical vector — a representation of how a mind processes it.

MARY · 81,924-DIM · OMNIMODAL

02 · SCORE

Region-level response

Read predicted activity across cortical networks — attention, emotion, memory, and more — each grounded in a brain region, not a proxy classifier.

YEO-7 NETWORKS · PER-SECOND

03 · DECODE

Cross-modal translation

Move between modalities through one cortical state — image, text, and audio. Shown in our research.