Browser-use 提示词解密
本文通过 browser-use 的 agent,给定一个任务并查看该 agent 与 LLM 的完整通信。模板任务如下:
'1. Go to https://www.bing.com '
"2. Search for 'browser use' in the search bar "
'3. Click on first page url '
'4. Return the page title'
完整 Prompts
第一步
- prompt 中,指定了通用操作(common action),比如:表单填充(form fill)、导航和提取(navigation and extraction)等
- Deepseek 理解任务,并返回的 action 选择了 go_to_url
Deepseek 返回:
{
"current_state": {
"evaluation_previous_goal": "Unknown - The task has just started, and no actions have been taken yet.",
"memory": "Task started. Step 1: Go to https://www.bing.com. 0 out of 4 steps completed.",
"next_goal": "Navigate to https://www.bing.com."
},
"action": [
{
"go_to_url": {
"url": "https://www.bing.com"
}
}
]
}
第二步
点击查看完整 prompt:p2.log
- prompt 中,提供了可以操作的按钮、编号及其信息。
- Deepseek 在表单填充的可选 action 中(input_text、click_element)中,选择了先输入、再点击,并给出了指定的编号
Deepseek 返回:
{
"current_state": {
"evaluation_previous_goal": "Success - Successfully navigated to https://www.bing.com as per the task.",
"memory": "Navigated to Bing homepage. Next step is to search for 'browser use' in the search bar.",
"next_goal": "Search for 'browser use' in the search bar."
},
"action": [
{
"input_text": {
"index": 13,
"text": "browser use"
}
},
{
"click_element": {
"index": 14
}
}
]
}
第三步
Deepseek 返回:
{
"current_state": {
"evaluation_previous_goal": "Success - Successfully input 'browser use' into the search bar. Suggestions appeared as expected.",
"memory": "Input 'browser use' into the search bar. Next step is to click the search button to execute the search.",
"next_goal": "Click the search button to execute the search for 'browser use'."
},
"action": [
{
"click_element": {
"index": 14
}
}
]
}
其他
可复制完整的上诉 prompt 给 deepseek,然后查看相关结果
本地测试
Install UV
参考 https://github.com/astral-sh/uv,用于管理 python 环境,依赖等。
# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh
基础环境
参考:https://github.com/browser-use/browser-use
# 新文件夹
$ mkdir browser-use-demo
# 创建虚拟 python 环境,避免影响系统的主环境
$ uv venv --python 3.12
# 激活虚拟环境
$ source .venv/bin/activate
# 安装 browser-use
$ uv pip install browser-use
# 安装 chrome
# 安装 browser-use 后,子依赖 playwright 就会被安装
# 在 ./.venv/bin/playwright 中,主机环境无此命令
$ playwright install chromium
Demo
from langchain_openai import ChatOpenAI
from browser_use import Agent
import os
from pydantic import SecretStr
import asyncio
from dotenv import load_dotenv
load_dotenv()
# 打印所有 prompt
import langchain
langchain.verbose = True
langchain.debug = True
# 获取 API_KEY
# 具体配置放 .env 文件中
api_key = os.getenv('DEEPSEEK_API_KEY', '')
if not api_key:
raise ValueError('DEEPSEEK_API_KEY is not set')
# Initialize the model
llm=ChatOpenAI(base_url='https://api.deepseek.com/v1', model='deepseek-chat', api_key=SecretStr(api_key))
async def main():
agent = Agent(
# task="Compare the price of gpt-4o and DeepSeek-V3",
task=(
'1. Go to https://www.bing.com '
"2. Search for 'browser use' in the search bar "
'3. Click on first page url '
'4. Return the page title'
),
llm=llm,
use_vision=False,
)
await agent.run()
asyncio.run(main())
Debug
VSCode 中,打开命令 Control + Command + P
,选择 Python: Select interpreter
,参考:
- https://stackoverflow.com/questions/54009081/how-can-i-debug-a-python-code-in-a-virtual-environment-using-vscode
- https://code.visualstudio.com/docs/python/environments
如果要调试依赖文件,需要创建 .vscode/launch.json
文件,并指定 justMyCode
为 false,默认为 true,无法断点进依赖。
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
// debug python module
"justMyCode": false
},
]
}
Comments
Leave a comment