i haven't fully dug in yet, i'm imagining that they're doing a fair bit of model merging of MoE experts
i haven't fully dug in yet, i'm imagining that they're doing a fair bit of model merging of MoE experts
This seeems limited - who has enough data to train a MoE? Even if you merge, I assume there is limit until you lose quality?
if each model is effectively a black box container, it ends up being a self-contained (mini) network API the weighting if any output conflicts would be interesting (probably some level of "trust" measure)
how is it “integrated” though? select the best experts and merge them?
One example is probably like the architecture of Polaris, which consists of multiple domain-specific agents to support the primary agent that is trained for nurse-like conversations