Large Language Models (LLMs) have shown immense human-like capabilities for reasoning and generating digital content. However, their ability to freely sense, interact, and actuate the physical domain remains significantly limited due to three fundamental challenges: (1) physical environments require specialized sensors for different tasks, yet deploying dedicated sensors for each application is impractical; (2) events and objects of interest are often localized to small areas within large spaces, making them difficult to detect with static sensor networks; and (3) foundation models need flexible actuation capabilities to meaningfully interact with the physical world. To bridge this gap, we introduce EmbodiedFly, an embodied LLM agent combining a foundation model pipeline with a reconfigurable drone platform to observe, understand, and interact with the physical world. Our co-design approach features 1) a FM orchestration framework connecting multiple LLMs, VLMs, and an open-set object detection model; 2) a novel image segmentation technique that identifies task-relevant areas; and 3) a custom drone platform that autonomously reconfigures with appropriate sensors and actuators based on commands from the FM orchestration framework. Through real-world deployments, we demonstrate that EmbodiedFly completes diverse physical tasks with up to 85% higher success rates compared to traditional approaches leveraging static deployments.