Yunge Wen

Master’s of Science in Computer Science @New York University
Research Intern at Multisensory Intelligence Lab @MIT Media Lab

Machine Learning | Multimodal Interaction

yw3776@nyu.com
LinkedIn | Github | Animation Portfolio

ABOUT

RESEARCH IN PROGRESS

Real-Time Detection and Reconstruction of Smell

Alistair Pernigo*, Yunge Wen*, Dewei Feng, Wei Dai, Kaichen Zhou, Jas Brooks, Paul Pu Liang
Aiming for UIST 2026

Olfaction remains an underexplored modality in machine learning, largely due to the cost and impracticality of obtaining molecular-level data. Building on our previous work SmellNet, we present a portable pipeline for real-time odor recognition and reproduction. Our system uses a 4-channel gas sensor array to capture odors, and a transformer-based model ScentRatioNet to predict mixture ratios across a palette of 12 base odorants. A wearable olfactory interface then synthesizes and delivers the predicted scents. To support this work, we contribute a 55-class dataset of base and mixed odors, together with both computational and human evaluations. This approach moves beyond static odor libraries and represents a step toward mobile, real-world olfactory interaction.

Semantic Zooming Interface for Fly Neuron Connectome

Yunge Wen, Jizheng Dong, Yuancheng Shen, Yu Cheng, Emma Obermiller, Erdem Varol, Robert Krueger
Aiming for VIS 2026

The FlyWire dataset reconstructs full adult fly brain neurons using slicing and 3D reconstruction techniques, but large-scale neuronal networks often appear visually entangled, making the connectome hard to interpret. We are creating an interactive visualization tool that integrates morphological clustering and single-neuron downsampling with multi-resolution rendering and semantic zooming, enabling intuitive, Google Maps-style exploration and significantly reducing visual clutter.

Sketch Copilot: Autoregressive Next Stroke Prediction

Yunge Wen, Kaichen Zhou, Paul Pu Liang
Aiming for UIST 2026

Stroke-Based Rendering (SBR) represents images using parametric strokes (e.g., XYWHΘ, RGBA) instead of raw pixels, enabling images to be synthesized as structured stroke compositions, but prior work infers stroke parameters from a final target image, such approaches offer little practical assistance to human artists. We present Sketch Copilot, an interactive drawing assistant that observes an artist’s early strokes and current canvas state to predict the next stroke in real time. We curated 10,000 classic portrait paintings and generated stroke sequences that reflect human painting conventions by segmenting facial structures (eyes, nose, mouth) and incorporating depth cues, and trained a transformer-based model to take the first few strokes and the partial canvas as input, and output the most likely next stroke. Our system thus provides predictive, human-aligned stroke guidance, bridging SBR research and real-world artistic creation.

Genji-Go: Human Compete with Smell AI

Awu Chen*, Yunge Wen*, Vera Wu*, Olivia Yin*, Hiroshi Ishii, Paul Pu Liang
Aiming for CHI Interactive Demo 2026

The Tale of Genji is a cornerstone of Japanese classical literature, and it inspired the traditional Genji-kō game, in which each chapter is associated with a unique fragrance and visual pattern. Players identify the corresponding chapter by matching scents with patterns. Originating in the Edo period (17th–19th century), the game holds a long-standing place in Japanese cultural history. Building on SmellNet, we developed an AI system capable of recognizing traditional incense. We collected 12 hours of data from five distinct incense types and trained a transformer-based model for olfactory recognition. We then designed a human–AI competitive game, where players and the AI agent attempt to identify fragrances in the spirit of Genji-kō.

AVA-Align: Generating Rubric-Aligned Feedback from Long Classroom Videos

Ao Qu*, Yuxi Wen*, Jiayi Zhang*, Yunge Wen, Yibo Zhao, Alok Prakash, Andrés F. Salazar-Gómez, Paul Pu Liang, Jinhua Zhao
Submitted to CHI 2026 under review | https://arxiv.org/abs/2509.18020

Classroom observation requires generating rubric-aligned feedback from long video recordings, yet existing video–language models struggle with long-context understanding, temporal precision, and instruction following in multimodal settings. We propose AVA-Align (Adaptive Video Agent with Alignment for long rubrics), an architecture designed to address these challenges. AVA-Align segments classroom recordings, synchronizes captions and transcripts at second-level resolution, and applies structured rubrics to identify pedagogical hotspots. For each hotspot, the system generates and validates rubric-aligned feedback grounded in video evidence and temporally accurate events. By combining targeted guidance, multimodal reasoning, and validation, AVA-Align ensures fidelity to classroom activities while producing actionable, standards-based feedback.

All Stories Are One Story: Emotional Arc Guided Procedural Game Level Generation

Yunge Wen*, Chenliang Huang*, Hangyu Zhou, Zhuo Zeng, Chun Ming Louis Po, Julian Togelius, Timothy Merino, Sam Earle
Aiming for CHI Interactive Demo 2026 | https://arxiv.org/abs/2508.02132

The emotional arc is a universal narrative structure underlying stories across cultures and media -- an idea central to structuralist narratology, often encapsulated in the phrase "all stories are one story." We present a framework for procedural game narrative generation that incorporates emotional arcs as a structural backbone for both story progression and gameplay dynamics. Leveraging established narratological theories and large-scale empirical analyses, we focus on two core emotional patterns -- Rise and Fall -- to guide the generation of branching story graphs. Each story node is automatically populated with characters, items, and gameplay-relevant attributes (e.g., health, attack), with difficulty adjusted according to the emotional trajectory. Implemented in a prototype action role-playing game (ARPG), our system demonstrates how emotional arcs can be operationalized using large language models (LLMs) and adaptive entity generation.

"See What I Imagine, Imagine What I See": Human-AI Co-Creation System for 360° Panoramic Video Generation in VR

Yunge Wen | https://arxiv.org/abs/2501.15456

Current immersive experiences in virtual reality are limited by pre-designed environments that constrain user creativity. To address this, we introduce Imagine360, a proof-of-concept system that integrates co-creation principles with AI agents for panoramic video generation in immersive settings. Imagine360 enables users to generate panoramic videos with AI assistance, evaluate outcomes in real time, provide refined speech-based prompts guided by the AI agent, and recenter the video’s focal point from an egocentric perspective. This co-creative approach establishes a transformative VR paradigm in which users seamlessly transition between “seeing” and “imagining,” shaping virtual environments through their own intent.

PROJECTS

Bayesian Motion Trajectory Prediction

Finetuned YOLOv8 with the VisDrone dataset to enhance small object tracking, using Kalman filters to track single and multiple objects and predict their motion trajectories.

[Github]

Neural Style Transfer

Reproduced the 2015 seminal paper on image style transfer, with step-by-step visualization of content and style convolution results.

[Github]

Video2Video Search

Trained a convolutional autoencoder on Coco dataset. Extracted feature maps from video screenshots, stored in a vector database, and compared with query images through vector similarity.

[Github]

Enhancing LLM Accuracy with RAG

Created Huggingface WebApp to demonstrate retrieval-augmented generation for non-technical corporate users.

[Huggingface Page]

COMPUTATIONAL DESIGN

About