Agent

使用 AI 驱动的浏览器智能体自动化复杂工作流。

文档索引

可在此获取完整文档索引：https://docs.stagehand.dev/llms.txt

在进一步浏览前，可使用该文件发现所有可用页面。

Agent

使用 AI 驱动的浏览器智能体自动化复杂工作流。

什么是 `agent()`？

await agent.execute("在 Browserbase 申请一份工作")

agent 会把高层级任务转换为全自主的浏览器工作流。你可以通过指定 LLM 提供商与模型、设置行为用的自定义指令，以及配置最大步数，来定制这个智能体。

为什么使用 `agent()`？

多步骤工作流

自动执行复杂操作序列。

跳转到 Agent 执行配置

视觉理解

像人类一样通过计算机视觉查看并理解 Web 界面。

查看原文

使用 `agent()`

在 Stagehand 中，有三种方式创建 Agent：

使用 Computer Use Agent（CUA 模式）
使用任意 LLM 的 Agent（DOM 模式）
使用视觉与 DOM 的 Agent（Hybrid 混合模式）

功能可用性

某些高级特性仅在特定 Agent 模式下可用：

功能	CUA	DOM	Hybrid
基础执行	✅	✅	✅
自定义工具	✅	✅	✅
MCP 集成	✅	✅	✅
系统提示词	✅	✅	✅
变量	❌	✅	✅
流式输出	❌	✅	✅
回调	❌	✅	✅
Abort signal	❌	✅	✅
消息续接	❌	✅	✅
排除工具	❌	✅	✅
结构化输出	❌	✅	✅
基于 DOM 的动作	❌	✅	✅
基于坐标的动作	✅	❌	✅
可视光标高亮	✅	❌	✅

Computer Use Agents

你可以像下面这样使用来自 Google、OpenAI、Anthropic 或 Microsoft 的专用 computer-use 模型，并将 mode 设置为 "cua"。若要比较不同 computer-use 模型的表现，可以访问它们的 evals 页面。

const agent = stagehand.agent({
    mode: "cua",
    model: "google/gemini-3-flash-preview",
    systemPrompt: "You are a helpful assistant...",
});

await agent.execute({
    instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.",
    maxSteps: 20,
    highlightCursor: true
})

const agent = stagehand.agent({
    mode: "cua",
    model: "openai/computer-use-preview",
    systemPrompt: "You are a helpful assistant...",
});

await agent.execute({
    instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.",
    maxSteps: 20,
    highlightCursor: true
})

const agent = stagehand.agent({
    mode: "cua",
    model: "anthropic/claude-sonnet-4-6",
    systemPrompt: "You are a helpful assistant...",
});

await agent.execute({
    instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.",
    maxSteps: 20,
    highlightCursor: true
})

将 Stagehand Agent 与任意 LLM 搭配使用

不指定 provider 也可以使用 agent，从而接入任意模型或 LLM 提供商：

const agent = stagehand.agent();
await agent.execute("在 Browserbase 申请一份工作")

可用的 Agent 模型

查看如何在 Stagehand Agent 中使用不同模型的指南。

查看原文

Hybrid 模式

DOM 模式与 CUA 模式各有优缺点。Hybrid 模式将两者结合，让 Agent 同时访问基于坐标与基于 DOM 的工具，更好地弥补各自短板。

Google Hybrid
Anthropic Hybrid

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  experimental: true, // Hybrid 模式必需
});
await stagehand.init();

const agent = stagehand.agent({
  mode: "hybrid",
  model: "google/gemini-3-flash-preview",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

await agent.execute({
  instruction: "Click the sign up button and fill out the registration form",
  maxSteps: 20,
});

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  experimental: true, // Hybrid 模式必需
});
await stagehand.init();

const agent = stagehand.agent({
  mode: "hybrid",
  model: "anthropic/claude-haiku-4-5-20251001",
  systemPrompt: "You are a helpful assistant that interacts with web pages visually.",
});

await agent.execute({
  instruction: "Navigate the page and interact with the form elements",
  maxSteps: 15,
  highlightCursor: true, // 在 hybrid 模式中默认启用
});

`agent()` 的返回值是什么？

当你使用 agent() 时，Stagehand 会返回一个 Promise<AgentResult>，结构如下：

{
  success: true,
  message: "The first name and email fields have been filled successfully with 'John' and 'john@example.com'.",
  actions: [
    {
      type: 'ariaTree',
      reasoning: undefined,
      taskCompleted: true,
      pageUrl: 'https://example.com',
      timestamp: 1761598722055
    },
    {
      type: 'act',
      reasoning: undefined,
      taskCompleted: true,
      action: 'type "John" into the First Name textbox',
      playwrightArguments: {...},
      pageUrl: 'https://example.com',
      timestamp: 1761598731643
    },
    {
      type: 'close',
      reasoning: "The first name and email fields have been filled successfully.",
      taskCompleted: true,
      taskComplete: true,
      pageUrl: 'https://example.com',
      timestamp: 1761598732861
    }
  ],
  completed: true,
  // 仅在提供 `output` schema 时填充（仅 DOM / Hybrid 模式）
  output: {
    price: "$199",
    airline: "Delta"
  },
  usage: {
    input_tokens: 2040,
    output_tokens: 28,
    reasoning_tokens: 12,
    cached_input_tokens: 0,
    inference_time_ms: 14079
  }
}

自定义 Agent 工具

Stagehand Agent 自带浏览器自动化工具，但你也可以通过添加自己的自定义工具或排除内置工具来定制工具集。

添加自定义工具

自定义工具可以为 Agent 增加额外能力，以获得更细粒度的控制和更好的性能。与 MCP 集成不同，自定义工具是内联定义的，并且直接在你的应用内部执行。

定义自定义工具

使用 @browserbasehq/stagehand 导出的 tool 辅助函数来定义自定义工具：

import { tool } from "@browserbasehq/stagehand";
import { z } from "zod";

const agent = stagehand.agent({
  model: "openai/gpt-5",
  tools: {
    getWeather: tool({
      description: 'Get the current weather in a location',
      inputSchema: z.object({
        location: z.string().describe('The location to get weather for'),
      }),
      execute: async ({ location }) => {
        // 在这里编写你的自定义逻辑
        const weather = await fetchWeatherAPI(location);
        return {
          location,
          temperature: weather.temp,
          conditions: weather.conditions,
        };
      },
    }),
  },
  systemPrompt: 'You are a helpful assistant with access to weather data.',
});

await agent.execute("What's the weather in San Francisco and should I bring an umbrella?");

import { tool } from "@browserbasehq/stagehand";
import { z } from "zod";

const agent = stagehand.agent({
  mode: "cua",
  model: "anthropic/claude-sonnet-4-6",
  tools: {
    searchDatabase: tool({
      description: 'Search for records in the database',
      inputSchema: z.object({
        query: z.string().describe('The search query'),
        limit: z.number().optional().describe('Max results to return'),
      }),
      execute: async ({ query, limit = 10 }) => {
        const results = await db.search(query, limit);
        return { results };
      },
    }),

    calculatePrice: tool({
      description: 'Calculate the total price with tax',
      inputSchema: z.object({
        amount: z.number().describe('The base amount'),
        taxRate: z.number().describe('Tax rate as decimal (e.g., 0.08 for 8%)'),
      }),
      execute: async ({ amount, taxRate }) => {
        const total = amount * (1 + taxRate);
        return { total: total.toFixed(2) };
      },
    }),
  },
});

await agent.execute("Find products under $50 and calculate the total with 8% tax");

import { tool } from "@browserbasehq/stagehand";
import { z } from "zod";

const agent = stagehand.agent({
  model: "google/gemini-2.0-flash",
  tools: {
    sendEmail: tool({
      description: 'Send an email via SendGrid',
      inputSchema: z.object({
        to: z.string().email().describe('Recipient email address'),
        subject: z.string().describe('Email subject'),
        body: z.string().describe('Email body content'),
      }),
      execute: async ({ to, subject, body }) => {
        const response = await fetch('https://api.sendgrid.com/v3/mail/send', {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${process.env.SENDGRID_API_KEY}`,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            personalizations: [{ to: [{ email: to }] }],
            from: { email: 'noreply@example.com' },
            subject,
            content: [{ type: 'text/plain', value: body }],
          }),
        });

        return {
          sent: response.ok,
          messageId: response.headers.get('X-Message-Id'),
        };
      },
    }),
  },
});

await agent.execute("Fill out the contact form and send me a confirmation email at user@example.com");

自定义工具 vs MCP 集成

自定义工具	MCP 集成
与你的代码内联定义	连接到外部服务
直接执行函数	标准化协议
性能更好、上下文更优化	可跨应用复用
TypeScript 类型安全	可访问预构建集成
细粒度控制	基于网络的通信

排除内置工具

阻止 Agent 在执行期间使用某些特定的内置工具。当你希望限制 Agent 能力，或避免某些行为时，这很有用。

基础用法

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // excludeTools 必需
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

// 排除 screenshot 与 extract 工具
const result = await agent.execute({
  instruction: "Navigate through the website and click the submit button",
  maxSteps: 15,
  excludeTools: ["screenshot", "extract"],
});

按模式区分的可用工具

你可以排除哪些工具，取决于 Agent 的模式：

DOM 模式
Hybrid 模式

工具	说明
`act`	执行语义动作（点击、输入等）
`fillForm`	使用 DOM 选择器填写表单字段
`ariaTree`	获取页面的无障碍树
`extract`	从页面提取结构化数据
`goto`	导航到 URL
`scroll`	使用语义方向滚动（上 / 下 / 左 / 右）
`keys`	按下键盘按键
`navback`	在历史记录中后退
`screenshot`	截图
`think`	Agent 推理 / 规划步骤
`wait`	按时间或条件等待
`search`	Web 搜索（需要 `useSearch: true` 与 `BROWSERBASE_API_KEY`）

工具	说明
`click`	在特定坐标点击
`type`	在坐标处输入文本
`dragAndDrop`	从一个点拖拽到另一个点
`clickAndHold`	在坐标处点击并按住
`fillFormVision`	使用视觉 / 坐标填写表单
`act`	执行语义动作
`ariaTree`	获取无障碍树
`extract`	从页面提取数据
`goto`	导航到 URL
`scroll`	使用坐标滚动
`keys`	按下键盘按键
`navback`	后退导航
`screenshot`	截图
`think`	Agent 推理步骤
`wait`	按时间 / 条件等待
`search`	Web 搜索（需要 `useSearch: true` 与 `BROWSERBASE_API_KEY`）

使用场景

// 阻止 Agent 在执行期间截图
const result = await agent.execute({
  instruction: "Fill out the contact form",
  excludeTools: ["screenshot"],
});

// 阻止 Agent 提取数据
const result = await agent.execute({
  instruction: "Click through the signup flow",
  excludeTools: ["extract"],
});

// 禁用 Web 搜索能力
const result = await agent.execute({
  instruction: "Find information on the current page",
  excludeTools: ["search"],
});

Web 搜索

在 agent.execute() 中设置 useSearch: true，即可启用 search 工具。这会让 Agent 能够使用 Browserbase Search API 执行 Web 搜索，当 Agent 需要在导航前先查找 URL 或收集信息时非常有用。

const result = await agent.execute({
  instruction: "Find the latest pricing for Browserbase",
  useSearch: true,
  maxSteps: 20,
});

变量

使用变量可以把敏感数据（例如密码、API key 或个人信息）传给 Agent，而不会把真实值暴露给 LLM。Agent 只能看到变量名与描述，实际值会在运行时替换进去。

基础用法

const stagehand = new Stagehand({
  env: "LOCAL",
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com/login");

const result = await agent.execute({
  instruction: "Log into the website using my credentials",
  maxSteps: 10,
  variables: {
    username: {
      value: "john@example.com",
      description: "The user's email address for login"
    },
    password: {
      value: process.env.USER_PASSWORD,
      description: "The user's password for login"
    }
  }
});

变量与 act() 使用相同的类型。你可以传入简单值，也可以传入带描述的富对象：

// 简单值（与 act 相同的格式）
variables: {
  username: "john@example.com",
  password: "secret123",
}

// 带描述的富值（帮助 Agent 理解上下文）
variables: {
  username: { value: "john@example.com", description: "The login email" },
  password: { value: "secret123", description: "The login password" },
}

变量如何工作

LLM 只接收描述： Agent 会在系统提示词中看到变量名和描述，但永远看不到真实值。
占位符语法： 当 LLM 需要使用变量时，会使用 %variableName% 语法（例如："type %password% into the password field"）。
运行时替换： 真实值会在动作执行前的最后一刻被替换进去。
安全日志： 变量值永远不会出现在日志或工具输出中。

支持的工具

变量可与以下 Agent 工具配合使用：

DOM 模式
Hybrid 模式

工具	用法
`act`	在动作描述中使用 `%variableName%`
`fillForm`	在字段值中使用 `%variableName%`

工具	用法
`type`	在输入文本中使用 `%variableName%`
`fillFormVision`	在字段值中使用 `%variableName%`
`act`	在动作描述中使用 `%variableName%`

缓存优化

变量在设计上就对缓存友好：

缓存键只使用变量名，不使用变量值。
改变变量值（例如不同密码）不会使缓存执行失效。
这使你能够用不同凭据高效重放同一个工作流。

// 对敏感数据使用变量
variables: {
  apiKey: {
    value: process.env.API_KEY,
    description: "API key for authentication"
  }
}

// 不要在指令里硬编码敏感值
instruction: "Log in with password 'secret123'"

MCP 集成

Agent 可以通过 MCP（Model Context Protocol，模型上下文协议）集成外部工具与服务来增强能力。这让你的 Agent 不仅能做浏览器交互，还能访问外部 API 与数据源。

直接传入 URL
创建连接

const agent = stagehand.agent({
    mode: "cua",
    model: "openai/computer-use-preview",
    integrations: [
      `https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`,
    ],
    systemPrompt: `You have access to web search through Exa. Use it to find current information before browsing.`
});

await agent.execute("Search for the best headphones of 2025 and go through checkout for the top recommendation");

import { connectToMCPServer } from "@browserbasehq/stagehand";

const supabaseClient = await connectToMCPServer(
  `https://server.smithery.ai/@supabase-community/supabase-mcp/mcp?api_key=***`
);

const agent = stagehand.agent({
    mode: "cua",
    model: "openai/computer-use-preview",
    integrations: [supabaseClient],
    systemPrompt: `You can interact with Supabase databases. Use these tools to store and retrieve data.`
});

await agent.execute("Search for restaurants and save the first result to the database");

流式输出

启用流式模式后，你可以接收 Agent 的增量响应。这对于构建实时 UI 很有用，因为你可以在 Agent 推进过程中展示它的推理与状态。

启用流式模式

在 Agent 配置中设置 stream: true 以启用流式输出：

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // streaming 必需
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true, // 启用流式模式
});

const streamResult = await agent.execute({
  instruction: "Search for headphones on Amazon",
  maxSteps: 20,
});

// 增量输出文本流
for await (const delta of streamResult.textStream) {
  process.stdout.write(delta);
}

// 流结束后获取最终结果
const finalResult = await streamResult.result;
console.log("Completed:", finalResult.completed);

流对象属性

启用流式输出后，execute() 会返回一个 AgentStreamResult，包含：

属性	类型	说明
`textStream`	`AsyncIterable<string>`	Agent 的增量文本输出
`fullStream`	`AsyncIterable<StreamPart>`	包含工具调用与消息在内的全部流事件
`result`	`Promise<AgentResult>`	流结束后的最终结果

// 输出所有内容（工具调用、消息等）
for await (const event of streamResult.fullStream) {
  console.log(event);
}

回调

回调允许你接入 Agent 的执行生命周期，以便监控进度、记录事件或修改行为。

可用回调

非流式
流式

当 stream: false（默认）时，可使用以下回调：

回调	说明
`prepareStep`	在每个 LLM 步骤前调用，用于修改设置
`onStepFinish`	在每个步骤完成时调用

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

await agent.execute({
  instruction: "Fill out the contact form",
  maxSteps: 10,
  callbacks: {
    prepareStep: async (stepContext) => {
      console.log(`Starting step ${stepContext.stepNumber}`);
      return stepContext; // 返回修改后的或原始上下文
    },
    onStepFinish: async (event) => {
      console.log(`Step finished: ${event.finishReason}`);
      if (event.toolCalls) {
        for (const tc of event.toolCalls) {
          console.log(`Tool called: ${tc.toolName}`);
        }
      }
    },
  },
});

当 stream: true 时，还可以使用以下额外回调：

回调	说明
`prepareStep`	在每个 LLM 步骤前调用，用于修改设置
`onStepFinish`	在每个步骤完成时调用
`onChunk`	在每个流式分块到达时调用
`onFinish`	流结束时调用
`onError`	出错时调用
`onAbort`	流被中止时调用

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true,
});

const streamResult = await agent.execute({
  instruction: "Search for products",
  maxSteps: 15,
  callbacks: {
    onChunk: async (chunk) => {
      // 每个增量分块都会触发
      console.log("Chunk received:", chunk);
    },
    onStepFinish: async (event) => {
      console.log(`Step completed: ${event.finishReason}`);
    },
    onFinish: (event) => {
      console.log("Stream finished!");
      console.log("Total steps:", event.steps.length);
    },
    onError: ({ error }) => {
      console.error("Stream error:", error);
    },
    onAbort: (event) => {
      console.log("Stream aborted after", event.steps.length, "steps");
    },
  },
});

// 别忘了消费这个流
for await (const delta of streamResult.textStream) {
  process.stdout.write(delta);
}

await streamResult.result;

Abort Signal

你可以通过 AbortSignal 随时取消 Agent 执行。这对于实现超时控制或允许用户手动停止长任务非常有用。

基础用法

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // abort signal 必需
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const controller = new AbortController();

// 设置 30 秒超时
setTimeout(() => controller.abort(), 30000);

try {
  const result = await agent.execute({
    instruction: "Complete a complex multi-step task",
    maxSteps: 50,
    signal: controller.signal,
  });
} catch (error) {
  if (error.name === "AgentAbortError") {
    console.log("Task was cancelled");
  }
}

与流式输出一起中止

Abort signal 也适用于流式模式：

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true,
});

const controller = new AbortController();

const streamResult = await agent.execute({
  instruction: "Describe every element on the page",
  maxSteps: 50,
  signal: controller.signal,
  callbacks: {
    onAbort: (event) => {
      console.log(`Aborted after ${event.steps.length} steps`);
    },
  },
});

// 收到 10 个分块后中止
let chunkCount = 0;
for await (const delta of streamResult.textStream) {
  process.stdout.write(delta);
  chunkCount++;
  if (chunkCount >= 10) {
    controller.abort();
    break;
  }
}

// result promise 会以 AgentAbortError 拒绝
try {
  await streamResult.result;
} catch (error) {
  console.log("Stream was aborted:", error.message);
}

自定义中止原因

你可以在中止时传入原因：

controller.abort("User cancelled the operation");

// 错误消息会包含你的原因
// Error: "User cancelled the operation"

消息续接

通过把上一次结果中的 messages 传回去，你可以在多次 Agent 执行之间延续对话。这对于多轮交互，或在保持上下文的前提下把复杂任务拆成多个步骤，非常有用。

基础续接

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // message continuation 必需
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com/products");

// 第一次执行：搜索产品
const firstResult = await agent.execute({
  instruction: "Search for wireless headphones and note the top 3 results",
  maxSteps: 10,
});

console.log("First task:", firstResult.message);

// 延续相同上下文：继续追问
const secondResult = await agent.execute({
  instruction: "Now filter by price under $100 and tell me which of those 3 are still available",
  maxSteps: 10,
  messages: firstResult.messages, // 传入上一轮对话
});

console.log("Follow-up:", secondResult.message);

// 继续基于历史上下文执行动作
const thirdResult = await agent.execute({
  instruction: "Add the cheapest one to the cart",
  maxSteps: 10,
  messages: secondResult.messages, // 串联会话
});

console.log("Final action:", thirdResult.message);

结构化输出

定义一个 Zod schema，以便在 Agent 完成任务时接收结构化数据。当你需要从 Agent 执行结果中提取特定信息（例如价格、日期或其他结构化数据）时，这很有用。

import { z } from "zod";

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // structured output 必需
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://www.google.com/flights");

const result = await agent.execute({
  instruction: "Find the cheapest flight from NYC to LA for next week",
  maxSteps: 20,
  output: z.object({
    price: z.string().describe("The price of the flight"),
    airline: z.string().describe("The airline name"),
    departureTime: z.string().describe("Departure time"),
    arrivalTime: z.string().describe("Arrival time"),
  }),
});

// 访问结构化输出
console.log(result.output);
// { price: "$199", airline: "Delta", departureTime: "8:00 AM", arrivalTime: "11:30 AM" }

const result = await agent.execute({
  instruction: "Extract all items from the shopping cart",
  output: z.object({
    items: z.array(z.object({
      name: z.string().describe("Product name"),
      quantity: z.number().describe("Quantity in cart"),
      unitPrice: z.string().describe("Price per item"),
      totalPrice: z.string().describe("Total price for this item"),
    })).describe("List of items in the cart"),
    subtotal: z.string().describe("Cart subtotal before tax"),
    tax: z.string().optional().describe("Tax amount if shown"),
    total: z.string().describe("Final total"),
  }),
});

console.log(`Found ${result.output?.items.length} items in cart`);
console.log(`Total: ${result.output?.total}`);

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true,
});

const streamResult = await agent.execute({
  instruction: "Find the top 3 search results",
  output: z.object({
    results: z.array(z.object({
      title: z.string().describe("The title of the search result"),
      url: z.string().url().describe("The URL of the search result"),
      snippet: z.string().describe("A brief description or snippet"),
    })).max(3).describe("Top 3 search results"),
  }),
});

// 输出文本流
for await (const delta of streamResult.textStream) {
  process.stdout.write(delta);
}

// 从最终结果中获取结构化输出
const finalResult = await streamResult.result;
console.log(finalResult.output?.results);

Agent 执行配置

你可以通过 maxSteps 参数控制 Agent 为完成任务最多可以执行多少步。

基础示例
复杂任务

// 设置 maxSteps，以控制 Agent 最多能执行多少个动作
await agent.execute({
  instruction: "Sign me up for a library card",
  maxSteps: 15 // 如果任务未完成，Agent 会在 15 步后停止
});

对于复杂任务，请提高 maxSteps 上限，并检查任务是否成功。

// 需要更多动作的复杂多步骤任务
const result = await agent.execute({
  instruction: "Find and apply for software engineering jobs, filtering by remote work and saving 3 applications",
  maxSteps: 30, // 复杂工作流使用更高上限
});

// 检查任务是否成功完成
if (result.success === true) {
  console.log("Task completed successfully!");
} else {
  console.log("Task failed or was incomplete");
}

最佳实践

遵循这些最佳实践可以提升 Agent 的成功率、减少执行时间，并尽量降低任务完成过程中的意外错误。

从正确的页面开始

在执行任务之前，先导航到目标页面：

这样做
不要这样做

await page.goto('https://github.com/browserbase/stagehand');
await agent.execute('Get me the latest PR on the stagehand repo');

await agent.execute('Go to GitHub and find the latest PR on browserbase/stagehand');

说得更具体

提供更详细的指令，通常能得到更好的结果：

这样做
不要这样做

await agent.execute("Find Italian restaurants in Brooklyn that are open after 10pm and have outdoor seating");

await agent.execute("Find a restaurant");

故障排查

Agent 会在完成任务前停止

问题： Agent 在完成你要求的任务前就停止了。

解决方案：

检查 Agent 是否触发了 maxSteps 上限（默认是 20）。
对复杂任务提高 maxSteps：例如 maxSteps: 30 或更高。
将非常复杂的任务拆成更小的连续执行。

// 对复杂任务提高 maxSteps
await agent.execute({
  instruction: "Complete the multi-page registration form with all required information",
  maxSteps: 40 // 为复杂任务提高上限
});

// 或拆成更小的任务，并检查成功状态
const firstResult = await agent.execute({
  instruction: "Fill out page 1 of the registration form",
  maxSteps: 15
});

// 只有第一次成功才继续
if (firstResult.success === true) {
  await agent.execute({
    instruction: "Navigate to page 2 and complete remaining fields",
    maxSteps: 15
  });
} else {
  console.log("First task failed, stopping execution");
}

Agent 无法点击正确的元素

问题： Agent 点击了错误的元素，或无法与正确的 UI 组件交互。

解决方案：

确保使用合适的视口尺寸：Stagehand 默认使用 1288x711（对 Computer Use 模型最优）。
尽量避免修改视口尺寸，因为其他尺寸可能降低性能。

下一步

Act

高效执行基于 observe 结果的动作。

查看原文

Extract

从观察到的元素中提取结构化数据。

查看原文

Agent

Agent

什么是 agent()？

为什么使用 agent()？

使用 agent()

功能可用性

Computer Use Agents

将 Stagehand Agent 与任意 LLM 搭配使用

Hybrid 模式

agent() 的返回值是什么？

自定义 Agent 工具

添加自定义工具

定义自定义工具

自定义工具 vs MCP 集成

排除内置工具

基础用法

按模式区分的可用工具

使用场景

Web 搜索

变量

基础用法

变量如何工作

支持的工具

缓存优化

最佳实践

MCP 集成

流式输出

启用流式模式

流对象属性

回调

可用回调

Abort Signal

基础用法

与流式输出一起中止

自定义中止原因

消息续接

基础续接

结构化输出

Agent 执行配置

最佳实践

从正确的页面开始

说得更具体

故障排查

下一步

First Steps

The Basics

Configuration

Best Practices

Integrations

什么是 `agent()`？

为什么使用 `agent()`？

使用 `agent()`

`agent()` 的返回值是什么？