构建跨模型通用 Agent：OpenAI, Claude, Gemini 协议适配

摘要: 本文剖析了 OpenAI、Anthropic (Claude) 和 Google (Gemini) 在 API 协议上的核心差异，结合 OpenCode 框架的源码实践，探讨了如何集成 Skills 和 MCP (Model Context Protocol) 等新一代 Agent 能力。

1. 引言：为什么需要跨模型适配？

在使用 AI Agent 时，深度绑定单一厂商存在显著风险：

成本波动：不同模型的 Token 定价策略差异巨大。
服务稳定性：单一 API 可能遭遇宕机或限流。
能力偏科：Claude 在代码生成上表现优异，Gemini 则拥有超长 Context Window 优势。

为了让 Agent 能够“博采众长”，我们需要抹平它们之间的协议鸿沟。

2. 协议差异深度解析

下表总结了三大主流模型厂商在 API 调用上的关键差异：

特性	OpenAI v1	Anthropic	Google Gemini
User Message	`{ role: "user", content: "..." }`	`{ role: "user", content: "..." }`	`{ role: "user", parts: [...] }`
Image Input	`image_url` (URL/Base64)	`source`(URL/Base64)	`inlineData` (Base64) / `fileData`
Tools Definition	`tools` (JSON Schema)	`tools` (JSON Schema)	`function_declarations` (OpenAPI)

2.1 OpenAI

基础协议: Chat Completions API - 行业事实标准。

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "VAR_chat_model_id",
    "messages": [
      {
        "role": "developer",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

视觉 (Vision): Vision Guide - image_url 详情与多图处理。

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1-mini",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "what is in this image?"},
          {
            "type": "input_image",
            "image_url": "https://api.nga.gov/iiif/a2e6da57-3cd1-4235-b20e-95dcaefed6c8/full/!800,800/0/default.jpg"
          }
        ]
      }
    ]
  }'

工具调用: Function Calling - tools 定义与 tool_calls 响应流。

function-calling-diagram-steps

# 1. Define a list of callable tools for the model
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
    },
]

Anthropic (Claude)

基础协议: Messages API - 顶层 system 参数与 content 块结构。

curl https://api.anthropic.com/v1/messages \
    -H 'Content-Type: application/json' \
    -H 'anthropic-version: 2023-06-01' \
    -H "X-Api-Key: $ANTHROPIC_API_KEY" \
    --max-time 600 \
    -d '{
          "max_tokens": 1024,
          "messages": [
            {
              "content": "Hello, world",
              "role": "user"
            }
          ],
          "model": "claude-opus-4-6"
        }'

视觉 (Vision): Vision - base64 编码要求与 image block 构造。

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image",
            "source": {
              "type": "base64",
              "media_type": "image/jpeg",
              "data": "'"$BASE64_IMAGE_DATA"'"
            }
          },
          {
            "type": "text",
            "text": "Describe this image."
          }
        ]
      }
    ]
  }'

工具调用: Tool Use - 详解 tool_use 与 tool_result 块的交互。

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 1024,
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in San Francisco?"
      }
    ]
  }'

Google Gemini

基础协议: REST API Reference 结构差异最大，使用 parts 数组。

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts":[{"text": "Write a story about a magic backpack."}]
        }]
       }'

视觉 (Vision): Vision - inlineData (Base64) 与 fileData (File API) 的区别。

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
    "contents": [{
    "parts":[
        {
            "inline_data": {
            "mime_type":"image/jpeg",
            "data": "'"$(base64 $B64FLAGS $IMG_PATH)"'"
            }
        },
        {"text": "Caption this image."},
    ]
    }]
}'

工具调用: Function Calling - function_declarations 配置与 functionResponse 格式。

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "What is the weather like in San Francisco?"
          }
        ]
      }
    ],
    "tools": [
      {
        "functionDeclarations": [
          {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
            "type": "object",
            "properties": {
                "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
            }
          }
        ]
      }
    ]
  }'

3. 解决方案：Unified Interface

为了解耦业务逻辑，我们需要定义一套标准的 TypeScript 接口，并在底层实现适配器。这里推荐基于 Vercel AI SDK Core (ai 包) 进行扩展，它已经处理了大部分底层的流式传输和基础归一化工作。

3.1 统一定义 (Standard Definition)

我们可以定义一个通用的 LanguageModelV3CallOptions 接口，屏蔽各厂商的参数差异：

参考 vercel/ai 的 LanguageModelV3CallOptions

// Vercel AI SDK 
export type LanguageModelV3CallOptions = {
    prompt: LanguageModelV3Prompt;
    responseFormat?:
    | { type: 'text' }
    | {
        type: 'json';
                schema?: JSONSchema7;
                name?: string;
                description?: string;
        };
    tools?: Array<LanguageModelV3FunctionTool | LanguageModelV3ProviderTool>;
    toolChoice?: LanguageModelV3ToolChoice;
    ...
};

3.2 适配器模式 (Adapter Pattern)

利用 Vercel AI SDK (ai 包) 作为底层设施，并在此之上构建中间及适配层。 Tools 协议适配: 各 Provider 独立实现工具调用的协议转换，将统一的 tools 定义映射为各厂商的原生格式。

Google: 转换为 functionDeclarations 格式。参考 google-prepare-tools.ts
Anthropic: 转换为 Anthropic 原生 tool 格式。参考 anthropic-prepare-tools.ts

4. 新能力集成 Skills

Skills 是 OpenCode 扩展 Agent 能力的核心机制，它不仅仅是简单的上下文注入，更是一套按需加载的领域知识管理与分发系统。通过 skill 工具，Agent 可以像安装插件一样，在运行时动态获取特定任务所需的专业知识、工作流 SOP 和代码规范，而无需在初始 Prompt 中预加载海量信息。

核心机制与工作流

注册与发现 (Discovery)
- Agent 启动时注册通用的 skill 工具，其描述包含动态生成的 <available_skills> 列表。
- LLM 通过此列表”感知”潜在能力，仅在识别到特定任务意图时才触发调用。
按需加载 (On-Demand Loading)
- LLM 主动调用 skill({ name: "..." }) 请求特定能力。
- Agent 读取本地对应的资源文件（Markdown），其中封装了该领域的详细指令、API 定义、代码模板及最佳实践。
上下文注入 (Context Injection)
- 加载的内容以 <skill_content> 块形式作为 tool_result 返回。
- LLM 立即将这些新知识整合进当前上下文，”学会”特定技能，从而在后续操作中严格遵循特定的工程规范或业务逻辑。

设计优势

Token 效率: 避免将所有文档一次性塞入 Context Window，大幅降低 Token 消耗并减少无关信息的干扰。
高可扩展性: 新增能力只需添加 Skill 描述文件，无需修改 Agent 核心代码。
专业深度: 允许针对特定领域（如内部组件库、DevOps 流程）提供极尽详细的指导，确保 Agent 输出符合生产级标准。

5. 总结

构建跨模型通用 Agent 并非易事，但通过分层架构可以有效应对挑战：

协议适配层: 识别 OpenAI、Claude、Gemini 在 System Prompt、Vision、Tools 上的差异，利用统一接口（如 Vercel AI SDK）进行屏蔽。
能力扩展层:
- 利用 Skills 机制实现领域知识的按需加载，解决 Token 瓶颈。
- 其他：引入 MCP 协议，实现与外部生态工具的标准化互联。

这种架构不仅保证了 Agent 在不同模型间的可移植性，还为其未来的能力演进奠定了坚实基础。