Gaze to Voice

Playthrough Video

Real-time storytelling through gaze

In this prototype, I explored how AI-driven narratives can create personalised, emotional experiences in XR. The experience centres on a boy's memories of a lost relationship. By simply looking at certain objects in the scene, users can trigger an emotional moment that unfolds in real time.

The generated narration adapts to the user's attention and emotional cues, allowing for a deeply immersive and reflective storytelling experience. The system supports both Korean and English, and users can toggle the language dynamically using a virtual button on their palm.

Concept

Gaze to Voice treats attention as the primary input for narrative. Instead of pressing buttons or selecting dialogue options, the user discovers story by looking — each meaningful object in the scene carries emotional context that the system responds to on the fly.

The prototype was developed as part of early exploration for 8pm and the Cat, testing whether real-time AI narration could feel intimate and responsive enough to support grief-driven storytelling in VR.

System Design

This prototype uses real-time AI-driven storytelling that responds to user attention. When a user gazes at a meaningful object, the system generates a customised pre-prompt based on that object's emotional context. GPT generates personalised narrative text, and ElevenLabs converts it into natural-sounding voice narration on the fly.

Pipeline

Gaze Prompt GPT Narration (ElevenLabs)

Key Features

Gaze-based interaction

Look at objects to trigger dynamic narratives.

Real-time AI narration

Text generation via GPT, voice-over via ElevenLabs.

Dynamic language switching

Toggle between Korean and English via palm UI.

Emotional story setting

A boy's memories of longing for a lost girl, shaped by user attention.

Exploration of AI-driven XR storytelling

Contextual, user-adaptive, emotional.