avatar
Tim Kellogg @timkellogg.me

i haven't fully dug in yet, i'm imagining that they're doing a fair bit of model merging of MoE experts

jul 9, 2025, 3:11 pm • 0 0

Replies

avatar
Caracter @caracter.bsky.social

This seeems limited - who has enough data to train a MoE? Even if you merge, I assume there is limit until you lose quality?

jul 9, 2025, 3:28 pm • 1 0 • view
avatar
Ruo Shui @ruoshuiresearch.bsky.social

if each model is effectively a black box container, it ends up being a self-contained (mini) network API the weighting if any output conflicts would be interesting (probably some level of "trust" measure)

jul 9, 2025, 3:15 pm • 1 0 • view
avatar
Tim Kellogg @timkellogg.me

how is it “integrated” though? select the best experts and merge them?

jul 9, 2025, 3:16 pm • 0 0 • view
avatar
Ruo Shui @ruoshuiresearch.bsky.social

One example is probably like the architecture of Polaris, which consists of multiple domain-specific agents to support the primary agent that is trained for nurse-like conversations

The architecture of Polaris consists of multiple domain-specific agents to support the primary agent that is trained for nurse-like conversations, with natural language processing patient speech recognition and a digital human avatar face communicating to the patient. The support agents, with a range from 50 to 100 billion parameters, provide a knowledge resource for labs, medications, nutrition, electronic health records, checklist, privacy and compliance, hospital and payor policy, and need for bringing in a human-in-the-loop.
jul 9, 2025, 3:21 pm • 0 0 • view