Yunge Wen
M.S. Computer Science @New York University
Research Intern at Multisensory Intelligence Lab @MIT Media Lab
Human-AI Interaction | Computer Vision | Perceptual Engineering
yw3776@nyu.com
LinkedIn | Github | Google Scholar
RESEARCH IN PROGRESS
Sketch Copilot: Autoregressive Next Stroke Prediction
Yunge Wen, Kaichen Zhou, Paul Pu Liang, Jing Qian
Manuscript In Preparation
Stroke-Based Rendering (SBR) represents images using parametric strokes (e.g., XYWHΘ, RGBA) instead of raw pixels, enabling images to be synthesized as structured stroke compositions, but prior work infers stroke parameters from a final target image, such approaches offer little practical assistance to human artists. We present Sketch Copilot, an interactive drawing assistant that observes an artist’s early strokes and current canvas state to predict the next stroke in real time. We curated 10,000 classic portrait paintings and generated stroke sequences that reflect human painting conventions by segmenting facial structures (eyes, nose, mouth) and incorporating depth cues, and trained a transformer-based model to take the first few strokes and the partial canvas as input, and output the most likely next stroke. Our system thus provides predictive, human-aligned stroke guidance, bridging SBR research and real-world artistic creation.
Real-Time Detection and Reconstruction of Smell
MIT Media Lab, Multi-sensory Intelligence Group
Olfaction remains an underexplored modality in machine learning, largely due to the cost and impracticality of obtaining molecular-level data. Building on our previous work SmellNet, we present a portable pipeline for real-time odor recognition and reproduction. Our system uses a 4-channel gas sensor array to capture odors, and a transformer-based model ScentRatioNet to predict mixture ratios across a palette of 12 base odorants. A wearable olfactory interface then synthesizes and delivers the predicted scents. To support this work, we contribute a 55-class dataset of base and mixed odors, together with both computational and human evaluations. This approach moves beyond static odor libraries and represents a step toward mobile, real-world olfactory interaction.
Graph-Based Gameplay Planning with LLMs
Yunge Wen*, Chenliang Huang*, Hangyu Zhou, Zhuo Zeng, Chun Ming Louis Po, Julian Togelius, Timothy Merino, Sam Earle
CHI Poster 2026 Under Reivew | https://arxiv.org/abs/2508.02132
Narrative archetypes (e.g. Hero's Journey, Three-act structure) provide universal story structures that resonate across cultures and media and are important for video game storytelling, yet existing LLM-based methods lack narrative planning integrated with gameplay mechanics. We present \textbf{Forking Garden}, a framework that generates narrative arc-guided games from brief storylines. The user's input storyline transforms into a branching story graph where every possible path conforms to a coherent narrative arc, and each story node achieves multimodal alignment of gameplay elements. We develop an end-to-end interactive system that instantiates the framework.
Smell with Genji: Rediscovering Human Perception through an Olfactory Game with AI
Awu Chen, Vera Yu Wu, Yunge Wen, Yaluo Wang, Jiaxuan Olivia Yin, Yichen Wang, Qian Xiang, Richard Zhang, Paul Pu Liang, and Hiroshi Ishii
CHI Poster & Interactive Demo 2026 Under Review
Genji-kō (源氏香) is a traditional Japanese incense game that structures olfactory experience through comparison, memory, and shared interpretation. We present Smell with Genji, an AI-mediated olfactory interaction system that reinterprets Genji-kō as a collaborative human–AI sensory experience. The system integrates an olfactory sensor and smell AI described in SmellNet, together with a mobile application and an AI co-smelling large language model, enabling a physically sensing–enabled AI to participate in olfactory experience alongside users.
Whispering Water: Materializing Human-AI Dialogue as Interactive Ripples
Ruipeng Wang*, Tawab Safi*, Yunge Wen*, Christina Cunningham, Hoi Ling Tang, and Behnaz Farahi
SIGGRAPH Art Paper 2026 Under Review
Whispering Water is an interactive installation that materializes human–AI dialogue through cymatic patterns on water. Participants confess secrets to a water surface, initiating a four-phase ritual—confession, contemplation, response, and release—through which their speech enters a multi-agent system that generates responses expressed via water. We propose a novel algorithm that decomposes speech into component waves and reconstructs them in water, establishing a translation between linguistic expression and the physics of material form. By rendering machine reasoning as emergent physical phenomena, the installation explores possibilities for emotional self-exploration through ambiguous, sensory-rich interfaces.
Semantic Zooming Interface for Fly Neuron Connectome
Yunge Wen, Jizheng Dong, Yuancheng Shen, Yu Cheng, Emma Obermiller, Erdem Varol, Robert Krueger
Manuscript In Preparation
The FlyWire dataset reconstructs full adult fly brain neurons using slicing and 3D reconstruction techniques, but large-scale neuronal networks often appear visually entangled, making the connectome hard to interpret. We are creating an interactive visualization tool that integrates morphological clustering and single-neuron downsampling with multi-resolution rendering and semantic zooming, enabling intuitive, Google Maps-style exploration and significantly reducing visual clutter.
LLM-guided XR Window Management Tool
Jing Qian*, George X. Wang*, Xiangyu Li, Yunge Wen, Guande Wu, Sonia Castelo Quispe, Fumeng Yang, Claudio Silva.
IMWUT 2026 (UbiComp) Under Review | https://arxiv.org/abs/2511.15676
DuoZone is a mixed-initiative XR window management system that reduces cognitive load through a dual-zone architecture. The Recommendation Zone uses LLM-based NLP to parse voice/text task descriptions and automatically generates spatial layout templates with context-appropriate application sets. The Arrangement Zone enables precise manual refinement through direct 3D manipulation (drag, resize, snap). This separation allows AI to handle initial setup complexity while preserving user control for adjustments. A user study against baseline manual management demonstrated faster task completion, reduced mental workload, and higher perceived control, validating the mixed-initiative approach for spatial computing workflows.
AVA-Align: Generating Rubric-Aligned Feedback from Long Classroom Videos
Ao Qu*, Yuxi Wen*, Jiayi Zhang*, Yunge Wen, Yibo Zhao, Alok Prakash, Andrés F. Salazar-Gómez, Paul Pu Liang, Jinhua Zhao
CHI 2026 under review | https://arxiv.org/abs/2509.18020
Classroom observation requires generating rubric-aligned feedback from long video recordings, yet existing video–language models struggle with long-context understanding, temporal precision, and instruction following in multimodal settings. We propose AVA-Align (Adaptive Video Agent with Alignment for long rubrics), an architecture designed to address these challenges. AVA-Align segments classroom recordings, synchronizes captions and transcripts at second-level resolution, and applies structured rubrics to identify pedagogical hotspots. For each hotspot, the system generates and validates rubric-aligned feedback grounded in video evidence and temporally accurate events. By combining targeted guidance, multimodal reasoning, and validation, AVA-Align ensures fidelity to classroom activities while producing actionable, standards-based feedback.
"See What I Imagine, Imagine What I See": Human-AI Co-Creation System for 360° Panoramic Video Generation in VR
Yunge Wen | https://arxiv.org/abs/2501.15456
Current immersive experiences in virtual reality are limited by pre-designed environments that constrain user creativity. To address this, we introduce Imagine360, a proof-of-concept system that integrates co-creation principles with AI agents for panoramic video generation in immersive settings. Imagine360 enables users to generate panoramic videos with AI assistance, evaluate outcomes in real time, provide refined speech-based prompts guided by the AI agent, and recenter the video’s focal point from an egocentric perspective. This co-creative approach establishes a transformative VR paradigm in which users seamlessly transition between “seeing” and “imagining,” shaping virtual environments through their own intent.
PROJECTS
Bayesian Motion Trajectory Prediction
Finetuned YOLOv8 with the VisDrone dataset to enhance small object tracking, using Kalman filters to track single and multiple objects and predict their motion trajectories.
[Github]
Neural Style Transfer
Reproduced the 2015 seminal paper on image style transfer, with step-by-step visualization of content and style convolution results.
[Github]
Video2Video Search
Trained a convolutional autoencoder on Coco dataset. Extracted feature maps from video screenshots, stored in a vector database, and compared with query images through vector similarity.
[Github]
Enhancing LLM Accuracy with RAG
Created Huggingface WebApp to demonstrate retrieval-augmented generation for non-technical corporate users.
COMPUTATIONAL DESIGN