Yumi is compatible with three major Large Language Model providers. You can choose which model acts as her "Mind" depending on your speed, cost, and intelligence preferences.
1. Groq (Llama-3.3-70b-versatile)
- The Vibe: Ultra-Fast, Conversational, Efficient.
- Speed: Sub-300ms time-to-first-token. It feels incredibly responsive, almost like speaking to a real person.
- Price: Free tier available on console.groq.com.
- Best For: Everyday standard conversations and fast question-answering.
2. OpenAI (GPT-4o)
- The Vibe: Highly Intelligent, Logical, Structured.
- Speed: Moderate (~800ms - 1.2s response time).
- Price: Paid API rates (platform.openai.com).
- Best For: Accurate complex reasoning, advanced task execution, and precise tool calling.
3. Anthropic (Claude-3.5-Sonnet)
- The Vibe: Deeply Emotional, Nuanced, creative roleplay.
- Speed: Moderate (~1s - 1.5s response time).
- Price: Paid API rates (console.anthropic.com).
- Best For: Rich, immersive roleplay. Claude excels at maintaining subtle subtext, matching custom personalities, and simulating genuine empathy.
Model Initialization & Caching (llm.py)
To optimize system memory and speed, Yumi implements lazy instantiation and caching of the LLM agents inside src/yumi/agent/llm.py:
- When you switch personalities (e.g. from Caring to Tsundere), Yumi builds the prompt and compiles the LangChain agent for that personality.
- She caches this instance inside
_agent_cache. - Subsequent responses or personality switches back to previously loaded personas are instantaneous, requiring zero compile overhead!
Proceed to Structured Outputs to see how the LLM controls the Live2D body animations!