한국어 | English

Gaze to Voice: Real-Time Storytelling

C# | Unity Engine | Gen AI
Playthrough Video



Tools

Engine: Unity (C#)
Gen AI: OpenAI GPT, ElevenLabs
Interaction: Gaze Tracking (Meta Quest)
Languages Supported: English and Korean (switchable via palm UI)
Platform: Meta Quest

Project Overview

In this prototype, I explored how AI-driven narratives can create personalised, emotional experiences in XR. The experience centers on a boy’s memories of a lost relationship. By simply looking at certain objects in the scene, users can trigger an emotional moment that unfolds in real time.

The generated narration adapts to the user's attention and emotional cues, allowing for a deeply immersive and reflective storytelling experience. The system supports both Korean and English, and users can toggle the language dynamically using a virtual button on their palm.

System Structure

This prototype uses real-time AI-driven storytelling that responds to user attention:

👁️ Gaze ➔ 📜 Prompt ➔ 🧠 GPT ➔ 🔊 Voice ➔ 🎧 Narration

When a user gazes at a meaningful object, the system generates a customised pre-prompt based on that object’s emotional context. GPT generates personalised narrative text, and ElevenLabs converts it into natural-sounding voice narration on the fly.

Key Features