EmbodiedFly: Embodied LLM Agent with an Autonomous Reconfigurable Drone

Abstract

Large Language Models (LLMs) have shown immense human-like capabilities for reasoning and generating digital content. However, their ability to freely sense, interact, and actuate the physical domain remains significantly limited due to three fundamental challenges: (1) physical environments require specialized sensors for different tasks, yet deploying dedicated sensors for each application is impractical; (2) events and objects of interest are often localized to small areas within large spaces, making them difficult to detect with static sensor networks; and (3) foundation models need flexible actuation capabilities to meaningfully interact with the physical world. To bridge this gap, we introduce EmbodiedFly, an embodied LLM agent combining a foundation model pipeline with a reconfigurable drone platform to observe, understand, and interact with the physical world. Our co-design approach features 1) a FM orchestration framework connecting multiple LLMs, VLMs, and an open-set object detection model; 2) a novel image segmentation technique that identifies task-relevant areas; and 3) a custom drone platform that autonomously reconfigures with appropriate sensors and actuators based on commands from the FM orchestration framework. Through real-world deployments, we demonstrate that EmbodiedFly completes diverse physical tasks with up to 85% higher success rates compared to traditional approaches leveraging static deployments.

Publication
In Proceedings of ACM Transactions on Internet of Things
Minghui (Scott) Zhao
Minghui (Scott) Zhao
Ph.D. Candidate in Electrical Engineering

My research focuses on developing embodied and embedded AI systems that enable intelligent agents to perceive, understand, and act in the physical world through hardware-software co-design and physics-informed machine learning.