多步骤工作流
自动执行复杂操作序列。
文档索引
可在此获取完整文档索引:https://docs.stagehand.dev/llms.txt
在进一步浏览前,可使用该文件发现所有可用页面。
使用 AI 驱动的浏览器智能体自动化复杂工作流。
agent()?await agent.execute("在 Browserbase 申请一份工作")agent 会把高层级任务转换为全自主的浏览器工作流。你可以通过指定 LLM 提供商与模型、设置行为用的自定义指令,以及配置最大步数,来定制这个智能体。
agent()?多步骤工作流
自动执行复杂操作序列。
视觉理解
像人类一样通过计算机视觉查看并理解 Web 界面。
agent()在 Stagehand 中,有三种方式创建 Agent:
某些高级特性仅在特定 Agent 模式下可用:
| 功能 | CUA | DOM | Hybrid |
|---|---|---|---|
| 基础执行 | ✅ | ✅ | ✅ |
| 自定义工具 | ✅ | ✅ | ✅ |
| MCP 集成 | ✅ | ✅ | ✅ |
| 系统提示词 | ✅ | ✅ | ✅ |
| 变量 | ❌ | ✅ | ✅ |
| 流式输出 | ❌ | ✅ | ✅ |
| 回调 | ❌ | ✅ | ✅ |
| Abort signal | ❌ | ✅ | ✅ |
| 消息续接 | ❌ | ✅ | ✅ |
| 排除工具 | ❌ | ✅ | ✅ |
| 结构化输出 | ❌ | ✅ | ✅ |
| 基于 DOM 的动作 | ❌ | ✅ | ✅ |
| 基于坐标的动作 | ✅ | ❌ | ✅ |
| 可视光标高亮 | ✅ | ❌ | ✅ |
你可以像下面这样使用来自 Google、OpenAI、Anthropic 或 Microsoft 的专用 computer-use 模型,并将 mode 设置为 "cua"。若要比较不同 computer-use 模型的表现,可以访问它们的 evals 页面。
const agent = stagehand.agent({ mode: "cua", model: "google/gemini-3-flash-preview", systemPrompt: "You are a helpful assistant...",});
await agent.execute({ instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.", maxSteps: 20, highlightCursor: true})const agent = stagehand.agent({ mode: "cua", model: "openai/computer-use-preview", systemPrompt: "You are a helpful assistant...",});
await agent.execute({ instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.", maxSteps: 20, highlightCursor: true})const agent = stagehand.agent({ mode: "cua", model: "anthropic/claude-sonnet-4-6", systemPrompt: "You are a helpful assistant...",});
await agent.execute({ instruction: "Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.", maxSteps: 20, highlightCursor: true})不指定 provider 也可以使用 agent,从而接入任意模型或 LLM 提供商:
const agent = stagehand.agent();await agent.execute("在 Browserbase 申请一份工作")可用的 Agent 模型
查看如何在 Stagehand Agent 中使用不同模型的指南。
DOM 模式与 CUA 模式各有优缺点。Hybrid 模式将两者结合,让 Agent 同时访问基于坐标与基于 DOM 的工具,更好地弥补各自短板。
const stagehand = new Stagehand({ env: "BROWSERBASE", experimental: true, // Hybrid 模式必需});await stagehand.init();
const agent = stagehand.agent({ mode: "hybrid", model: "google/gemini-3-flash-preview",});
const page = stagehand.context.pages()[0];await page.goto("https://example.com");
await agent.execute({ instruction: "Click the sign up button and fill out the registration form", maxSteps: 20,});const stagehand = new Stagehand({ env: "BROWSERBASE", experimental: true, // Hybrid 模式必需});await stagehand.init();
const agent = stagehand.agent({ mode: "hybrid", model: "anthropic/claude-haiku-4-5-20251001", systemPrompt: "You are a helpful assistant that interacts with web pages visually.",});
await agent.execute({ instruction: "Navigate the page and interact with the form elements", maxSteps: 15, highlightCursor: true, // 在 hybrid 模式中默认启用});agent() 的返回值是什么?当你使用 agent() 时,Stagehand 会返回一个 Promise<AgentResult>,结构如下:
{ success: true, message: "The first name and email fields have been filled successfully with 'John' and 'john@example.com'.", actions: [ { type: 'ariaTree', reasoning: undefined, taskCompleted: true, pageUrl: 'https://example.com', timestamp: 1761598722055 }, { type: 'act', reasoning: undefined, taskCompleted: true, action: 'type "John" into the First Name textbox', playwrightArguments: {...}, pageUrl: 'https://example.com', timestamp: 1761598731643 }, { type: 'close', reasoning: "The first name and email fields have been filled successfully.", taskCompleted: true, taskComplete: true, pageUrl: 'https://example.com', timestamp: 1761598732861 } ], completed: true, // 仅在提供 `output` schema 时填充(仅 DOM / Hybrid 模式) output: { price: "$199", airline: "Delta" }, usage: { input_tokens: 2040, output_tokens: 28, reasoning_tokens: 12, cached_input_tokens: 0, inference_time_ms: 14079 }}Stagehand Agent 自带浏览器自动化工具,但你也可以通过添加自己的自定义工具或排除内置工具来定制工具集。
自定义工具可以为 Agent 增加额外能力,以获得更细粒度的控制和更好的性能。与 MCP 集成不同,自定义工具是内联定义的,并且直接在你的应用内部执行。
使用 @browserbasehq/stagehand 导出的 tool 辅助函数来定义自定义工具:
import { tool } from "@browserbasehq/stagehand";import { z } from "zod";
const agent = stagehand.agent({ model: "openai/gpt-5", tools: { getWeather: tool({ description: 'Get the current weather in a location', inputSchema: z.object({ location: z.string().describe('The location to get weather for'), }), execute: async ({ location }) => { // 在这里编写你的自定义逻辑 const weather = await fetchWeatherAPI(location); return { location, temperature: weather.temp, conditions: weather.conditions, }; }, }), }, systemPrompt: 'You are a helpful assistant with access to weather data.',});
await agent.execute("What's the weather in San Francisco and should I bring an umbrella?");import { tool } from "@browserbasehq/stagehand";import { z } from "zod";
const agent = stagehand.agent({ mode: "cua", model: "anthropic/claude-sonnet-4-6", tools: { searchDatabase: tool({ description: 'Search for records in the database', inputSchema: z.object({ query: z.string().describe('The search query'), limit: z.number().optional().describe('Max results to return'), }), execute: async ({ query, limit = 10 }) => { const results = await db.search(query, limit); return { results }; }, }),
calculatePrice: tool({ description: 'Calculate the total price with tax', inputSchema: z.object({ amount: z.number().describe('The base amount'), taxRate: z.number().describe('Tax rate as decimal (e.g., 0.08 for 8%)'), }), execute: async ({ amount, taxRate }) => { const total = amount * (1 + taxRate); return { total: total.toFixed(2) }; }, }), },});
await agent.execute("Find products under $50 and calculate the total with 8% tax");import { tool } from "@browserbasehq/stagehand";import { z } from "zod";
const agent = stagehand.agent({ model: "google/gemini-2.0-flash", tools: { sendEmail: tool({ description: 'Send an email via SendGrid', inputSchema: z.object({ to: z.string().email().describe('Recipient email address'), subject: z.string().describe('Email subject'), body: z.string().describe('Email body content'), }), execute: async ({ to, subject, body }) => { const response = await fetch('https://api.sendgrid.com/v3/mail/send', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.SENDGRID_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ personalizations: [{ to: [{ email: to }] }], from: { email: 'noreply@example.com' }, subject, content: [{ type: 'text/plain', value: body }], }), });
return { sent: response.ok, messageId: response.headers.get('X-Message-Id'), }; }, }), },});
await agent.execute("Fill out the contact form and send me a confirmation email at user@example.com");| 自定义工具 | MCP 集成 |
|---|---|
| 与你的代码内联定义 | 连接到外部服务 |
| 直接执行函数 | 标准化协议 |
| 性能更好、上下文更优化 | 可跨应用复用 |
| TypeScript 类型安全 | 可访问预构建集成 |
| 细粒度控制 | 基于网络的通信 |
阻止 Agent 在执行期间使用某些特定的内置工具。当你希望限制 Agent 能力,或避免某些行为时,这很有用。
const stagehand = new Stagehand({ env: "LOCAL", experimental: true, // excludeTools 必需});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
const page = stagehand.context.pages()[0];await page.goto("https://example.com");
// 排除 screenshot 与 extract 工具const result = await agent.execute({ instruction: "Navigate through the website and click the submit button", maxSteps: 15, excludeTools: ["screenshot", "extract"],});你可以排除哪些工具,取决于 Agent 的模式:
| 工具 | 说明 |
|---|---|
act | 执行语义动作(点击、输入等) |
fillForm | 使用 DOM 选择器填写表单字段 |
ariaTree | 获取页面的无障碍树 |
extract | 从页面提取结构化数据 |
goto | 导航到 URL |
scroll | 使用语义方向滚动(上 / 下 / 左 / 右) |
keys | 按下键盘按键 |
navback | 在历史记录中后退 |
screenshot | 截图 |
think | Agent 推理 / 规划步骤 |
wait | 按时间或条件等待 |
search | Web 搜索(需要 useSearch: true 与 BROWSERBASE_API_KEY) |
| 工具 | 说明 |
|---|---|
click | 在特定坐标点击 |
type | 在坐标处输入文本 |
dragAndDrop | 从一个点拖拽到另一个点 |
clickAndHold | 在坐标处点击并按住 |
fillFormVision | 使用视觉 / 坐标填写表单 |
act | 执行语义动作 |
ariaTree | 获取无障碍树 |
extract | 从页面提取数据 |
goto | 导航到 URL |
scroll | 使用坐标滚动 |
keys | 按下键盘按键 |
navback | 后退导航 |
screenshot | 截图 |
think | Agent 推理步骤 |
wait | 按时间 / 条件等待 |
search | Web 搜索(需要 useSearch: true 与 BROWSERBASE_API_KEY) |
// 阻止 Agent 在执行期间截图const result = await agent.execute({ instruction: "Fill out the contact form", excludeTools: ["screenshot"],});
// 阻止 Agent 提取数据const result = await agent.execute({ instruction: "Click through the signup flow", excludeTools: ["extract"],});
// 禁用 Web 搜索能力const result = await agent.execute({ instruction: "Find information on the current page", excludeTools: ["search"],});在 agent.execute() 中设置 useSearch: true,即可启用 search 工具。这会让 Agent 能够使用 Browserbase Search API 执行 Web 搜索,当 Agent 需要在导航前先查找 URL 或收集信息时非常有用。
const result = await agent.execute({ instruction: "Find the latest pricing for Browserbase", useSearch: true, maxSteps: 20,});使用变量可以把敏感数据(例如密码、API key 或个人信息)传给 Agent,而不会把真实值暴露给 LLM。Agent 只能看到变量名与描述,实际值会在运行时替换进去。
const stagehand = new Stagehand({ env: "LOCAL",});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
const page = stagehand.context.pages()[0];await page.goto("https://example.com/login");
const result = await agent.execute({ instruction: "Log into the website using my credentials", maxSteps: 10, variables: { username: { value: "john@example.com", description: "The user's email address for login" }, password: { value: process.env.USER_PASSWORD, description: "The user's password for login" } }});变量与 act() 使用相同的类型。你可以传入简单值,也可以传入带描述的富对象:
// 简单值(与 act 相同的格式)variables: { username: "john@example.com", password: "secret123",}
// 带描述的富值(帮助 Agent 理解上下文)variables: { username: { value: "john@example.com", description: "The login email" }, password: { value: "secret123", description: "The login password" },}%variableName% 语法(例如:"type %password% into the password field")。变量可与以下 Agent 工具配合使用:
| 工具 | 用法 |
|---|---|
act | 在动作描述中使用 %variableName% |
fillForm | 在字段值中使用 %variableName% |
| 工具 | 用法 |
|---|---|
type | 在输入文本中使用 %variableName% |
fillFormVision | 在字段值中使用 %variableName% |
act | 在动作描述中使用 %variableName% |
变量在设计上就对缓存友好:
// 对敏感数据使用变量variables: { apiKey: { value: process.env.API_KEY, description: "API key for authentication" }}// 不要在指令里硬编码敏感值instruction: "Log in with password 'secret123'"Agent 可以通过 MCP(Model Context Protocol,模型上下文协议)集成外部工具与服务来增强能力。这让你的 Agent 不仅能做浏览器交互,还能访问外部 API 与数据源。
const agent = stagehand.agent({ mode: "cua", model: "openai/computer-use-preview", integrations: [ `https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`, ], systemPrompt: `You have access to web search through Exa. Use it to find current information before browsing.`});
await agent.execute("Search for the best headphones of 2025 and go through checkout for the top recommendation");import { connectToMCPServer } from "@browserbasehq/stagehand";
const supabaseClient = await connectToMCPServer( `https://server.smithery.ai/@supabase-community/supabase-mcp/mcp?api_key=***`);
const agent = stagehand.agent({ mode: "cua", model: "openai/computer-use-preview", integrations: [supabaseClient], systemPrompt: `You can interact with Supabase databases. Use these tools to store and retrieve data.`});
await agent.execute("Search for restaurants and save the first result to the database");启用流式模式后,你可以接收 Agent 的增量响应。这对于构建实时 UI 很有用,因为你可以在 Agent 推进过程中展示它的推理与状态。
在 Agent 配置中设置 stream: true 以启用流式输出:
const stagehand = new Stagehand({ env: "LOCAL", experimental: true, // streaming 必需});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929", stream: true, // 启用流式模式});
const streamResult = await agent.execute({ instruction: "Search for headphones on Amazon", maxSteps: 20,});
// 增量输出文本流for await (const delta of streamResult.textStream) { process.stdout.write(delta);}
// 流结束后获取最终结果const finalResult = await streamResult.result;console.log("Completed:", finalResult.completed);启用流式输出后,execute() 会返回一个 AgentStreamResult,包含:
| 属性 | 类型 | 说明 |
|---|---|---|
textStream | AsyncIterable<string> | Agent 的增量文本输出 |
fullStream | AsyncIterable<StreamPart> | 包含工具调用与消息在内的全部流事件 |
result | Promise<AgentResult> | 流结束后的最终结果 |
// 输出所有内容(工具调用、消息等)for await (const event of streamResult.fullStream) { console.log(event);}回调允许你接入 Agent 的执行生命周期,以便监控进度、记录事件或修改行为。
当 stream: false(默认)时,可使用以下回调:
| 回调 | 说明 |
|---|---|
prepareStep | 在每个 LLM 步骤前调用,用于修改设置 |
onStepFinish | 在每个步骤完成时调用 |
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
await agent.execute({ instruction: "Fill out the contact form", maxSteps: 10, callbacks: { prepareStep: async (stepContext) => { console.log(`Starting step ${stepContext.stepNumber}`); return stepContext; // 返回修改后的或原始上下文 }, onStepFinish: async (event) => { console.log(`Step finished: ${event.finishReason}`); if (event.toolCalls) { for (const tc of event.toolCalls) { console.log(`Tool called: ${tc.toolName}`); } } }, },});当 stream: true 时,还可以使用以下额外回调:
| 回调 | 说明 |
|---|---|
prepareStep | 在每个 LLM 步骤前调用,用于修改设置 |
onStepFinish | 在每个步骤完成时调用 |
onChunk | 在每个流式分块到达时调用 |
onFinish | 流结束时调用 |
onError | 出错时调用 |
onAbort | 流被中止时调用 |
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929", stream: true,});
const streamResult = await agent.execute({ instruction: "Search for products", maxSteps: 15, callbacks: { onChunk: async (chunk) => { // 每个增量分块都会触发 console.log("Chunk received:", chunk); }, onStepFinish: async (event) => { console.log(`Step completed: ${event.finishReason}`); }, onFinish: (event) => { console.log("Stream finished!"); console.log("Total steps:", event.steps.length); }, onError: ({ error }) => { console.error("Stream error:", error); }, onAbort: (event) => { console.log("Stream aborted after", event.steps.length, "steps"); }, },});
// 别忘了消费这个流for await (const delta of streamResult.textStream) { process.stdout.write(delta);}
await streamResult.result;你可以通过 AbortSignal 随时取消 Agent 执行。这对于实现超时控制或允许用户手动停止长任务非常有用。
const stagehand = new Stagehand({ env: "LOCAL", experimental: true, // abort signal 必需});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
const controller = new AbortController();
// 设置 30 秒超时setTimeout(() => controller.abort(), 30000);
try { const result = await agent.execute({ instruction: "Complete a complex multi-step task", maxSteps: 50, signal: controller.signal, });} catch (error) { if (error.name === "AgentAbortError") { console.log("Task was cancelled"); }}Abort signal 也适用于流式模式:
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929", stream: true,});
const controller = new AbortController();
const streamResult = await agent.execute({ instruction: "Describe every element on the page", maxSteps: 50, signal: controller.signal, callbacks: { onAbort: (event) => { console.log(`Aborted after ${event.steps.length} steps`); }, },});
// 收到 10 个分块后中止let chunkCount = 0;for await (const delta of streamResult.textStream) { process.stdout.write(delta); chunkCount++; if (chunkCount >= 10) { controller.abort(); break; }}
// result promise 会以 AgentAbortError 拒绝try { await streamResult.result;} catch (error) { console.log("Stream was aborted:", error.message);}你可以在中止时传入原因:
controller.abort("User cancelled the operation");
// 错误消息会包含你的原因// Error: "User cancelled the operation"通过把上一次结果中的 messages 传回去,你可以在多次 Agent 执行之间延续对话。这对于多轮交互,或在保持上下文的前提下把复杂任务拆成多个步骤,非常有用。
const stagehand = new Stagehand({ env: "LOCAL", experimental: true, // message continuation 必需});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
const page = stagehand.context.pages()[0];await page.goto("https://example.com/products");
// 第一次执行:搜索产品const firstResult = await agent.execute({ instruction: "Search for wireless headphones and note the top 3 results", maxSteps: 10,});
console.log("First task:", firstResult.message);
// 延续相同上下文:继续追问const secondResult = await agent.execute({ instruction: "Now filter by price under $100 and tell me which of those 3 are still available", maxSteps: 10, messages: firstResult.messages, // 传入上一轮对话});
console.log("Follow-up:", secondResult.message);
// 继续基于历史上下文执行动作const thirdResult = await agent.execute({ instruction: "Add the cheapest one to the cart", maxSteps: 10, messages: secondResult.messages, // 串联会话});
console.log("Final action:", thirdResult.message);定义一个 Zod schema,以便在 Agent 完成任务时接收结构化数据。当你需要从 Agent 执行结果中提取特定信息(例如价格、日期或其他结构化数据)时,这很有用。
import { z } from "zod";
const stagehand = new Stagehand({ env: "LOCAL", experimental: true, // structured output 必需});await stagehand.init();
const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929",});
const page = stagehand.context.pages()[0];await page.goto("https://www.google.com/flights");
const result = await agent.execute({ instruction: "Find the cheapest flight from NYC to LA for next week", maxSteps: 20, output: z.object({ price: z.string().describe("The price of the flight"), airline: z.string().describe("The airline name"), departureTime: z.string().describe("Departure time"), arrivalTime: z.string().describe("Arrival time"), }),});
// 访问结构化输出console.log(result.output);// { price: "$199", airline: "Delta", departureTime: "8:00 AM", arrivalTime: "11:30 AM" }const result = await agent.execute({ instruction: "Extract all items from the shopping cart", output: z.object({ items: z.array(z.object({ name: z.string().describe("Product name"), quantity: z.number().describe("Quantity in cart"), unitPrice: z.string().describe("Price per item"), totalPrice: z.string().describe("Total price for this item"), })).describe("List of items in the cart"), subtotal: z.string().describe("Cart subtotal before tax"), tax: z.string().optional().describe("Tax amount if shown"), total: z.string().describe("Final total"), }),});
console.log(`Found ${result.output?.items.length} items in cart`);console.log(`Total: ${result.output?.total}`);const agent = stagehand.agent({ model: "anthropic/claude-sonnet-4-5-20250929", stream: true,});
const streamResult = await agent.execute({ instruction: "Find the top 3 search results", output: z.object({ results: z.array(z.object({ title: z.string().describe("The title of the search result"), url: z.string().url().describe("The URL of the search result"), snippet: z.string().describe("A brief description or snippet"), })).max(3).describe("Top 3 search results"), }),});
// 输出文本流for await (const delta of streamResult.textStream) { process.stdout.write(delta);}
// 从最终结果中获取结构化输出const finalResult = await streamResult.result;console.log(finalResult.output?.results);你可以通过 maxSteps 参数控制 Agent 为完成任务最多可以执行多少步。
// 设置 maxSteps,以控制 Agent 最多能执行多少个动作await agent.execute({ instruction: "Sign me up for a library card", maxSteps: 15 // 如果任务未完成,Agent 会在 15 步后停止});对于复杂任务,请提高 maxSteps 上限,并检查任务是否成功。
// 需要更多动作的复杂多步骤任务const result = await agent.execute({ instruction: "Find and apply for software engineering jobs, filtering by remote work and saving 3 applications", maxSteps: 30, // 复杂工作流使用更高上限});
// 检查任务是否成功完成if (result.success === true) { console.log("Task completed successfully!");} else { console.log("Task failed or was incomplete");}遵循这些最佳实践可以提升 Agent 的成功率、减少执行时间,并尽量降低任务完成过程中的意外错误。
在执行任务之前,先导航到目标页面:
await page.goto('https://github.com/browserbase/stagehand');await agent.execute('Get me the latest PR on the stagehand repo');await agent.execute('Go to GitHub and find the latest PR on browserbase/stagehand');提供更详细的指令,通常能得到更好的结果:
await agent.execute("Find Italian restaurants in Brooklyn that are open after 10pm and have outdoor seating");await agent.execute("Find a restaurant");问题: Agent 在完成你要求的任务前就停止了。
解决方案:
maxSteps 上限(默认是 20)。maxSteps:例如 maxSteps: 30 或更高。// 对复杂任务提高 maxStepsawait agent.execute({ instruction: "Complete the multi-page registration form with all required information", maxSteps: 40 // 为复杂任务提高上限});
// 或拆成更小的任务,并检查成功状态const firstResult = await agent.execute({ instruction: "Fill out page 1 of the registration form", maxSteps: 15});
// 只有第一次成功才继续if (firstResult.success === true) { await agent.execute({ instruction: "Navigate to page 2 and complete remaining fields", maxSteps: 15 });} else { console.log("First task failed, stopping execution");}问题: Agent 点击了错误的元素,或无法与正确的 UI 组件交互。
解决方案:
1288x711(对 Computer Use 模型最优)。