Showcase & Demos

Yumi is brought to life by bringing together multiple real-time pipelines. Below are core demonstrations of how she interacts, speaks, and switches personas on the fly.

Real-Time Conversation

Unlike standard chatbots that require you to press buttons, Yumi streams audio continuously. As soon as you finish speaking, her Silero VAD triggers transcription, and she begins thinking.

User: "Hey Yumi, what time is it?"
Yumi: *Nods head, tilts slightly* "It's exactly 3:15 PM! Hope you are having a productive afternoon!"

Personality Hot-Swapping

Yumi supports six core personalities, which you can switch by simply asking her to change. Watch how her language, expressions, and posture shift immediately:

1. Caring (Default)

Vibe: Empathetic, warm, encouraging.
Visuals: Bright smiles, gentle head tilts.
Prompt style: Supportive guidance.

2. Tsundere

Vibe: Teasing, initially cold but secretly affectionate.
Visuals: Pouting, crossing arms, slight blush.
Example: "H-huh? It's not like I wanted to talk to you or anything! ...but I guess I can help you out."

3. Kuudere

Vibe: Quiet, calm, rational, intellectual.
Visuals: Still, direct eye contact, minimal movement.
Example: "Understood. Please present your query, and I will analyze it optimally."

Barge-In (Speech Interruption)

One of Yumi's most realistic features is barge-in (modeled after advanced voice systems like Gemini Live).

While Yumi is speaking out loud, you can start talking.
The local Silero VAD instantly detects speech onset.
An audio_start/interrupt socket message is fired.
The WebUI instantly cuts the playing audio buffer, Yumi closes her mouth, and she transitions back into her listening state.

This makes conversations feel like a dialogue rather than a series of one-way monologues. To experience this yourself, proceed to the Your First Conversation guide.