Can a single agent automate 90% of your code fixes? Block thinks so

Jun 03, 2025

Open-source AI that pays for itself: Block's Vision for AI Integrations

Just when I think I've grasped the full landscape of AI coding assistants, another compelling tool I'd never encountered invariably surfaces. codename goose (hereafter "Goose"), an open-source agent used weekly by 5,000 Block employees, shows what happens when you give an LLM a toolbox. Built at Block (formerly Square) and released under an MIT licence in January 2025, Goose runs locally on engineers’ machines and pairs large-language-model reasoning with a growing library of tool integrations to form a flexible automation platform. That blend turns the agent into a workhorse capable of tackling everything from autonomous code fixes to lightning-fast incident response, converting maintenance drudgery into review-only work. Thanks to the Model Context Protocol, Goose is a blank slate: its capabilities are defined entirely by whichever tools it plugs into, rather than by hard-coded specialties. The heavily edited conversation that follows with Jackie Brosamer and Brad Axen, members of the team behind Goose, unpacks how the project came together, what it means for enterprise automation, and where it is heading next.

Introduction & Overview

What exactly is Goose? Goose is an open source, on-machine AI agent designed to automate complex engineering and knowledge work tasks from start to finish. Developed by Block (formerly Square), it combines large language model reasoning with tool integrations to create a flexible automation platform. The project was publicly released in January 2025 under an MIT license after about nine months of internal development.

Why did Block build yet another AI copilot? The team initially set out to leverage LLMs for developer tasks, recognizing they had become genuinely useful tools for building code. However, they quickly realized the potential extended far beyond developer workflows. Because LLMs are general-purpose, an agent with the right tools can automate tasks for design teams, support agents, and many other roles. Goose was created to be this flexible agent platform capable of handling diverse automation needs across the organization.

How widely is Goose used within Block? Approximately 5,000 people at Block use Goose weekly, including both developers and non-developers. Block runs the same open source version internally (a practice called "dogfooding"), adding only proprietary authentication and security connectors required for their corporate environment.

Architecture & Technical Integration

How central is the Model Context Protocol (MCP) to Goose? While Goose predated MCP, the team quickly integrated it upon release. MCP is powerful because it provides a standard way for Goose to connect with any model (Anthropic, OpenAI, or open source options) and integrate with diverse data sources like GitHub, Slack, Google Calendar, and custom internal systems. Goose is essentially a "blank slate" until connected to tools via MCP - its capabilities depend entirely on these connections. This makes it highly customizable without changing code.

What models work with Goose, and which are most popular? Goose supports any LLM, and users can hot-swap models mid-conversation. Currently, users gravitate toward "frontier models" - the latest and most capable options. The Anthropic Sonnet family and OpenAI's reasoner models see the most use. Interestingly, users often employ different models for different tasks within the same conversation: one model for planning/design, then switching to Sonnet for execution. Gemini 2.5 Pro with its 1-million token context window handles tasks requiring large amounts of content. The proactiveness of the Sonnet family is a key reason for its popularity.

How does Goose handle context window limitations? The team considers this a fundamental challenge requiring multiple strategies:

Smart summarization over long context windows
Selective context retrieval using RAG to identify which tools are relevant for specific queries
Enabling the agent to navigate information through tool-calling (like using a knowledge graph)
Multi-turn searching that outperforms simple semantic search
Having the agent iteratively search codebases using tools like ripgrep rather than dumping all results into the prompt The goal is feeding the right tokens into the context window, not all available tokens.

Can Goose work with local models? Yes, hobbyists in the open source community are seeing success running Goose with local models like Llama. While local models don't yet solve all enterprise-scale coding problems, they're becoming increasingly capable for many practical tasks.

Developer Experience & Workflow

How are new engineers onboarded to Goose? Goose is auto-installed on every new Block laptop. New users typically start with the chat interface, which works like any standard chat when the agent isn't using tools. As users interact, Goose proactively suggests available tools and capabilities, naturally guiding them into its full feature set. For instance, when discussing a codebase, Goose might ask if the user wants it to attempt changes, dynamically evolving the interaction.

How do developers typically use Goose in their workflow? Usage patterns vary by task type:

For one-off tasks like dashboards or data visualizations, engineers often let Goose generate everything ("vibe coding")
For maintained codebases, a common pattern is ~90% AI-generated code with engineers handling the final 10% for quality assurance
For longer-running processes, developers ask Goose to work on the side, then return to their IDE to review diffs and make minor tweaks
Notebook users can chat with Goose while it fills in cells for model training or data analysis

Can engineers still use other AI assistants alongside Goose? Block maintains a liberal "bring your own assistant" policy, recognizing that different tools suit different problems. Complex codebases might work better with autocomplete-style tools like Cursor, while Goose excels at volume fixes and smaller codebases. This experimentation helps the organization learn which patterns map to which assistants.

Practical Applications & Use Cases

What are "recipes" and why are they important? Recipes are asynchronous, trigger-driven workflows that Goose can execute autonomously. For example, Goose can monitor GitHub issues and automatically attempt fixes, or address security vulnerabilities. The human still reviews and gets credit for the PR, but the agent handles the initial autonomous work. This transforms maintenance drudgery into review-only work.

What specific tasks can Goose automate end-to-end? Goose handles a wide range of tasks with varying levels of autonomy:

Fully autonomous: Small tasks like handling vulnerability tickets
Semi-autonomous: Automating parts of model training, generating feature definitions in complex Java systems, documenting sensitive models
Human-partnered: Design critiques, SRE support during incidents, creating websites, generating content in Google Docs or Slack

How does Goose handle hallucinations? Hallucinations still occur, such as suggesting non-existent libraries or methods. The most successful users are quick to "savagely discard" unproductive sessions rather than trying to correct the agent. Since Goose generates code quickly, starting over multiple times is still faster than manual coding. The key is focusing on tasks with objective validation—code that runs, passes tests, or SQL queries that can be explained—which makes hallucinations easier to catch.

How is Goose being used for incident response? Goose significantly reduces time to recovery (TTR) by processing volumes of data that would overwhelm humans. When a service is down, AI can read system logs from the last hour across multiple LLMs in parallel, surfacing insights to humans. This parallel processing of logs and system data can potentially make incident recovery 100x faster compared to the more modest speedups seen in code generation.

Security & Governance

What are the security considerations with MCP servers? The team acknowledges the current "wild west" landscape of MCP servers and recommends users perform due diligence on third-party MCPs. Block sits on the MCP steering committee with Anthropic, working to improve the ecosystem through:

Adding human confirmation flags for dangerous actions
Creating a vetted registry (like PyPI for MCP) with proper vetting processes
Evolving the protocol to address issues like long-lived connections through features like streamable HTTP components

How does working in financial services affect AI tool adoption? Contrary to expectations, regulatory controls are actually an asset when automating with AI. Non-engineering teams handling compliance work are enthusiastic about automating responses to forms and regulatory requirements. The existing constraints help ensure proper validation as more processes become automated, and code validation is often easier than validating outputs from, say, an executive assistant.

Multi-Agent Systems & Architecture

Does Goose support multi-agent architectures? Currently, the team focuses on practical patterns rather than complex multi-agent abstractions. They're implementing:

Multiple agents running in parallel for volume work (like handling support tickets)
Quick task retries for multiple attempts
Single-model context summarization, which currently outperforms splitting context between multiple models They expect this may change as foundation models evolve, but specialized agent-to-agent communication protocols need more real-world production use cases before adding significant value.

How do reasoning models factor into Goose's capabilities? Block deliberately uses the most capable (and currently most expensive) reasoning models to prove what's possible, betting that costs will continue dropping dramatically. Reasoner models excel at tasks requiring precise instruction following, such as generating rich UI elements with exact specifications. The team mixes expensive reasoner models for critical tasks with cheaper models for bulk work.

Future Directions & Industry Impact

What's on the near-term roadmap? Key priorities for the next 6-12 months include:

A complete UX redesign moving away from "design by engineers" to create a more intuitive interface for AI agent interaction
Reducing context switching so users can stay within the tool
Productizing Goose-style agents for Block's customer-facing products to help small businesses and financial management
Implementing RAG and knowledge graphs to help agents select from the 40-50 MCP servers (with hundreds of tools) typically connected to a session

How is Goose being adopted outside Block? Partner companies are contributing to the codebase and using it internally. An interesting pattern emerging is data teams using Goose to replace traditional dashboards with conversational, agent-driven insights. Engineers work with data engineers to make Goose productive for answering questions, bridging the engineering experience to other roles like PMs and marketers.

How will tools like Goose affect engineering hiring and skills? The team sees this as augmentation rather than replacement, moving engineers up levels of abstraction. For interviews, they plan to give candidates harder problems but allow them to use LLMs, recognizing that the real differentiator is how well someone leverages these tools. The emphasized skills are changing - it's about giving engineers more leverage, not eliminating engineering work.

Ben Lorica edits the Gradient Flow newsletter. He helps organize the AI Conference, the AI Agent Conference, the NLP Summit, Ray Summit, and the Data+AI Summit. He is the host of the Data Exchange podcast. You can follow him on Linkedin, Mastodon, Reddit, Bluesky, YouTube, or TikTok. This newsletter is produced by Gradient Flow.