Give machines a sense
of human emotion.
Agents and robots are fluent in language and blind to people. BrainVI is building a cortical-embedding API — a neuroscience-grounded read on how a human mind responds.
It doesn't read you.
A robot in your home, an agent on your phone — each acts with no model of the person in front of it. Today's embeddings learn what looks alike on the internet, not how a mind responds.
The brain already solves this. Vision, sound, language, and feeling converge in one place — the cortex. That convergence is the representation machines are missing.
-
α
The cortex is natively omnimodal.
-
β
Predict the brain, don't scan it.
-
γ
Honesty over flattery.
-
δ
Standing on prior art.
rooted in neuroscience.
Our bet: an embedding grounded in the cortex beats one scraped from the web. It carries how content is processed, not just what co-occurs online.
We call these cortical embeddings. Our flagship model already matches the published state of the art in brain encoding — on a third of the training data.
Read: The Average Brain Is No Brain At All→# Cortical embedding — one omnimodal vector per stimulus from BrainVI import MARY emb = MARY.embed("clip.mp4") # → 81,924-dim cortical vector sim = MARY.similarity(a, b) # brain-space similarity, not pixel-space # How alike are two stimuli in a mind — not on the internet? # Interface shown for illustration · API in private beta
cortical encoder.
MARY maps any stimulus to predicted cortical activity via six backbones that mimic the brain's pathways — 81,924 vertices of the fsaverage6 surface, one stream per sense.
MARY Nano 1.0, our research-preview model, matches the Algonauts 2025 state of the art — Pearson r = 0.216 vs Meta TRIBE's 0.2146 (Schaefer-1000) — on just ~23h of fMRI, a third of TRIBE's data. On held-out films it generalizes at r = 0.170, rising to 0.185 with MARY Nano 1.1.
We train on openly licensed (CC0 / CC-BY) datasets. With close ties to academic faculty, we conduct our own fMRI studies.
Many surfaces.
Score how a scene lands in visual cortex. Compare two voices by predicted response. Read a user's engagement. Condition a generative model on a cortical target.
Turn any image, video, audio clip, or text into a single omnimodal cortical vector — a representation of how a mind processes it.
MARY · 81,924-DIM · OMNIMODALRead predicted activity across cortical networks — attention, emotion, memory, and more — each grounded in a brain region, not a proxy classifier.
YEO-7 NETWORKS · PER-SECONDMove between modalities through one cortical state — image, text, and audio. Shown in our research.
RESEARCH PREVIEWDrop-in multimodal memory for agents, indexed by cortical response — what a user saw, and how it landed.
AGENT INTEGRATION-
2026The Average Brain Is No Brain At All: A Zero-Shot Evaluation of TRIBE v2 on Out-of-Distribution Naturalistic Video
-
2026MARY-Nano: A Six-Stream Multimodal Brain Encoder for In-Silico Neural Prediction
-
2026Cross-Modal Neural Translation via Synthetic fMRI
-
2026Retinotopic Decoding from Synthetic fMRI: Mapping Predicted Cortical Activity to Visual Field Images
for generative AI.
Cortical embeddings as a conditioning signal for generative models.