Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes

Abstract

As more applications of large language models (LLMs) for 3D content for immersive environments emerge, it is crucial to study user behaviour to identify interaction patterns and potential barriers to guide the future design of immersive content creation and editing systems which involve LLMs. In an empirical user study with 12 participants, we combine quantitative usage data with post-experience questionnaire feedback to reveal common interaction patterns and key barriers in LLM-assisted 3D scene editing systems. We identify opportunities for improving natural language interfaces in 3D design tools and propose design recommendations for future LLM-integrated 3D content creation systems. Through an empirical study, we demonstrate that LLM-assisted interactive systems can be used productively in immersive environments.

Media

Example Workflow
Example workflow for scene editing with our proposed ASSISTVR technique. Left: The user adopts the bulk modification strategy to select all blue objects in the original scene to modify their appearance together. Middle: The user adopts the incremental exploration strategy to modify the appearance on individual objects. Right: The final scene matches the target scene of one of the tasks in our empirical user study.

Publication

Junlong Chen, Jens Grubert, & Per Ola Kristensson. Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes. In Proceedings of IEEE Virtual Reality (VR) 2025 (to appear). arxiv preprint