Key Features

Yumi is engineered with an emphasis on local performance, visual fluidness, and cryptographic security.

🎙️ Real-time Voice Capture (VAD & STT)

Zero-Config Microphone Streams: Handled via standard HTML5 media devices over WebSockets.
Silero Neural speech detection: Runs locally on CPU with extremely high accuracy, ignoring keyboard clicks and room echoes.
Ultra-Fast Whisper Inference: Choose local Faster-Whisper (quantized to int8 for fast CPU processing) or Cloud-based Groq Whisper for 150ms transcription latencies.

🧠 State-of-the-Art Brain (LangGraph)

Structured Output Control: Leverages Pydantic schemas to bound LLM output format strictly, preventing structural errors.
Tool Integration: LangChain tools allow Yumi to call local system tools, fetch weather, read dates, and coordinate functions.
Hot-Swappable Persona Matrix: Instantly change Yumi's behavior mid-sentence. Features six customized personalities.

🗣️ Lifelike Lip Sync & Expressive Visuals (Live2D)

Sub-Second Streaming Audio: ElevenLabs or CAMB.ai streaming audio chunks are pushed over WebSockets directly to the web client.
Real-time Waveform RMS Lip Sync: Computes the amplitude of playing sound buffers on the fly to open/close her mouth naturally in perfect sync with the voice.
Fluid Visuals: Powered by PixiJS 6 and Cubism SDK for high-performance GPU-accelerated rendering inside the browser.

🔐 Hardware-Encrypted Security (OS Keychain)

Zero Plaintext Keys on Disk: Unlike common setups that write API keys to .env or configuration JSON files, Yumi leverages the keyring package.
OS-Level Vault Storage: Saves keys securely inside:
- Windows: Windows Credential Manager.
- macOS: macOS Keychain Access.
- Linux: GNOME Keyring / KWallet via libsecret.

Ready to get started? Head directly to Attuning Senses to set up your APIs!