EmbodiedFly: Embodied LLM Agent with an Autonomous Reconfigurable Drone

Minghui (Scott) Zhao, Kaiyuan Hou, Junxi Xia, Yanchen Liu, Stephen Xia, Xiaofan Jiang

October, 2025

Abstract

Large Language Models (LLMs) have shown immense human-like capabilities for reasoning and generating digital content. However, their ability to freely sense, interact, and actuate the physical domain remains significantly limited due to three fundamental challenges: (1) physical environments require specialized sensors for different tasks, yet deploying dedicated sensors for each application is impractical; (2) events and objects of interest are often localized to small areas within large spaces, making them difficult to detect with static sensor networks; and (3) foundation models need flexible actuation capabilities to meaningfully interact with the physical world. To bridge this gap, we introduce EmbodiedFly, an embodied LLM agent combining a foundation model pipeline with a reconfigurable drone platform to observe, understand, and interact with the physical world. Our co-design approach features 1) a FM orchestration framework connecting multiple LLMs, VLMs, and an open-set object detection model; 2) a novel image segmentation technique that identifies task-relevant areas; and 3) a custom drone platform that autonomously reconfigures with appropriate sensors and actuators based on commands from the FM orchestration framework. Through real-world deployments, we demonstrate that EmbodiedFly completes diverse physical tasks with up to 85% higher success rates compared to traditional approaches leveraging static deployments.

Type

Journal article

Publication

In Proceedings of ACM Transactions on Internet of Things

EmbodiedFly: Embodied LLM Agent with an Autonomous Reconfigurable Drone

Abstract

Minghui (Scott) Zhao

Ph.D. Candidate in Electrical Engineering