Control an Android Phone with Gemini 3.5 Flash Computer Use
Gemini 3.5 Flash has built-in Computer Use capability that lets it control Android devices by analyzing screenshots and issuing structured actions like taps, swipes, and text input. This guide walks through setting up an Android emulator on Mac without Android Studio, installing Python dependencies, and running an agent loop that connects the Google GenAI SDK with ADB to execute model-requested actions. Supported mobile actions include open_app, click, type, long_press, drag_and_drop, press_key, go_back, wait, list_apps, and take_screenshot, all using a normalized 0-999 coordinate grid. The guide also covers connecting to physical or remote devices and notes on extending to iOS, adding production robustness, and handling safety decisions.