Thinking (LLM) Providers

Yumi is compatible with three major Large Language Model providers. You can choose which model acts as her "Mind" depending on your speed, cost, and intelligence preferences.


1. Groq (Llama-3.3-70b-versatile)

  • The Vibe: Ultra-Fast, Conversational, Efficient.
  • Speed: Sub-300ms time-to-first-token. It feels incredibly responsive, almost like speaking to a real person.
  • Price: Free tier available on console.groq.com.
  • Best For: Everyday standard conversations and fast question-answering.

2. OpenAI (GPT-4o)

  • The Vibe: Highly Intelligent, Logical, Structured.
  • Speed: Moderate (~800ms - 1.2s response time).
  • Price: Paid API rates (platform.openai.com).
  • Best For: Accurate complex reasoning, advanced task execution, and precise tool calling.

3. Anthropic (Claude-3.5-Sonnet)

  • The Vibe: Deeply Emotional, Nuanced, creative roleplay.
  • Speed: Moderate (~1s - 1.5s response time).
  • Price: Paid API rates (console.anthropic.com).
  • Best For: Rich, immersive roleplay. Claude excels at maintaining subtle subtext, matching custom personalities, and simulating genuine empathy.

Model Initialization & Caching (llm.py)

To optimize system memory and speed, Yumi implements lazy instantiation and caching of the LLM agents inside src/yumi/agent/llm.py:

  • When you switch personalities (e.g. from Caring to Tsundere), Yumi builds the prompt and compiles the LangChain agent for that personality.
  • She caches this instance inside _agent_cache.
  • Subsequent responses or personality switches back to previously loaded personas are instantaneous, requiring zero compile overhead!

Proceed to Structured Outputs to see how the LLM controls the Live2D body animations!