Gaze to Voice: Real-Time Storytelling

C# | Unity Engine | Gen AI

Playthrough Video

Tools

Engine: Unity (C#)
Gen AI: OpenAI GPT, ElevenLabs
Interaction: Gaze Tracking (Meta Quest)
Languages Supported: English and Korean (switchable via palm UI)
Platform: Meta Quest

Project Overview

In this prototype, I explored how AI-driven narratives can create personalised, emotional experiences in XR. The experience centers on a boy’s memories of a lost relationship. By simply looking at certain objects in the scene, users can trigger an emotional moment that unfolds in real time.

The generated narration adapts to the user's attention and emotional cues, allowing for a deeply immersive and reflective storytelling experience. The system supports both Korean and English, and users can toggle the language dynamically using a virtual button on their palm.

System Structure

This prototype uses real-time AI-driven storytelling that responds to user attention:

👁️ Gaze ➔ 📜 Prompt ➔ 🧠 GPT ➔ 🔊 Voice ➔ 🎧 Narration

When a user gazes at a meaningful object, the system generates a customised pre-prompt based on that object’s emotional context. GPT generates personalised narrative text, and ElevenLabs converts it into natural-sounding voice narration on the fly.

Key Features

Gaze-Based Interaction: Look at objects to trigger dynamic narratives
Real-Time AI Narration: Text generation via GPT, voice-over via ElevenLabs
Dynamic Language Switching: Toggle between Korean and English via palm UI
Emotional Story Setting: A boy’s memories of longing for a lost girl, shaped by user attention
Exploration of AI-Driven XR Storytelling: Contextual, user-adaptive, emotional