> ## Documentation Index
> Fetch the complete documentation index at: https://ppio.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# 批量推理

export const BatchApiModels = () => {
  if (typeof document === "undefined") {
    return null;
  } else {
    let attempts = 0;
    const maxAttempts = 50;
    const INIT_DISPLAY_COUNT = 3;
    const interval = setInterval(() => {
      const clientComponent = document.getElementById("batch-api-models");
      if (clientComponent && window.ppInfraRemoteData.llmModels.status === 'loaded') {
        const modelList = window.ppInfraRemoteData.llmModels.data.filter(model => {
          return (model.endpoints || []).includes('batch-api');
        });
        let displayModels = modelList.slice(0, INIT_DISPLAY_COUNT).map(model => {
          return `<li><span class="model-id-item">${model.id}</span></li>`;
        }).join('');
        let showMoreButton = '';
        if (modelList.length > INIT_DISPLAY_COUNT) {
          showMoreButton = `<button id="show-more-batch-api-model-btn" style="margin-left: 32px; color: rgb(40 116 255)">展示更多</button>`;
        }
        clientComponent.innerHTML = `
          <ul>${displayModels}</ul>
          ${showMoreButton}
        `;
        document.getElementById('show-more-batch-api-model-btn')?.addEventListener('click', () => {
          clientComponent.innerHTML = `
            <ul>${modelList.map(model => {
            return `<li><span class="model-id-item">${model.id}</span></li>`;
          }).join('')}</ul>
          `;
        });
        clearInterval(interval);
      }
      attempts++;
      if (attempts >= maxAttempts) {
        clearInterval(interval);
      }
    }, 200);
    return <div id="batch-api-models"></div>;
  }
};

大语言模型批量推理 API 支持异步处理大量推理请求，完全兼容 OpenAI API 标准。

批量推理 API 是在不需要立即获得推理结果时的经济高效解决方案。它提供比在线调用更高的速率限制，确保在 48 小时的合理时间范围内交付结果。

此 API 非常适用于：

* 进行评估和数据分析
* 对大量数据集进行分类
* 以离线模式生成文档摘要

支持的模型：

<BatchApiModels />

## 快速开始

### 1. 准备批量文件

批量推理 API 使用 .jsonl 格式文件作为输入，每行代表一个 API 推理请求的详细信息。可用的 endpoint 包括 `/v1/chat/completions` 和 `/v1/completions`。

<Warning>
  为了与 OpenAI API 兼容，请将 `endpoint` 参数设置为 `/v1/chat/completions` 或 `/v1/completions`。
</Warning>

每个请求都必须包含一个唯一的 `custom_id`，以便在批量完成后在输出文件中定位推理结果。每行的 `body` 字段中的参数将作为实际推理请求参数发送到 endpoint。

<Warning>
  单个 JSONL 文件中的所有请求必须使用同一个模型，请不要在一个批次中混合不同模型的请求。
</Warning>

以下是包含 2 个请求的示例输入文件：

```JSON theme={null}
{"custom_id": "request-1", "body": {"model": "deepseek/deepseek-v3-0324", "messages": [{"role": "user", "content": "Hello, world!"}], "max_tokens": 400}}
{"custom_id": "request-2", "body": {"model": "deepseek/deepseek-v3-0324", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
```

### 2. 上传批量输入文件

上传批量输入文件，以便在创建批量任务时能够正确引用它。使用文件 API 上传您的 .jsonl 文件，并将 purpose 设置为 `batch`。请注意，该文件将保留 15 天。

<Tip>
  关于如何获取 API 密钥，请参见[管理 API 密钥](/support/api-key)。
</Tip>

代码示例

**Python**

```python theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppio.com/openai/v1",
    api_key="<Your API Key>",
)

batch_input_file = client.files.create(
    file=open("batch_input.jsonl", "rb"),
    purpose="batch",
)

print(batch_input_file)
```

**Curl**

```bash theme={null}
export API_KEY="<Your API Key>"

curl --request POST \
  --url https://api.ppio.com/openai/v1/files \
  --header 'Authorization: Bearer ${API_KEY}' \
  --form 'file=@"/your/batch_input.jsonl"' \
  --form 'purpose="batch"'
```

成功上传文件后的示例响应：

```
{
    "id": "file_d2co***as73c0cjd0",
    "object": "file",
    "bytes": 238,
    "filename": "batch_input.jsonl",
    "created_at": 1754894162,
    "purpose": "batch",
    "metadata": {
        "total_requests": 2
    }
}
```

### 3. 创建批量任务

成功上传输入文件后，您可以使用上传的文件对象的 ID 启动批量任务。完成时间窗口固定为 `48h`，目前不可调整。

代码示例

**Python**

```python theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppio.com/openai/v1",
    api_key="<Your API Key>",
)

batch = client.batches.create(
  input_file_id="file_d2cor0es1cas73c0cj60",
  endpoint="/v1/chat/completions",
  completion_window="48h"
)
print(batch)
```

**Curl**

```bash theme={null}
export API_KEY="<Your API Key>"

curl --request POST \
  --url https://api.ppio.com/openai/v1/batches \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${API_KEY}' \
  --data '{
      "input_file_id": "file_d2co***as73c0cjd0",
      "endpoint": "/v1/chat/completions",
      "completion_window": "48h"
  }'
```

此请求将返回一个包含您的批量任务元数据的 Batch 对象，如下面的示例所示：

```JSON theme={null}
{
    "id": "batch_d2cq***73a68lu0",
    "object": "batch",
    "endpoint": "/v1/chat/completions",
    "input_file_id": "file_d2co***as73c0cjd0",
    "output_file_id": "",
    "error_file_id": "",
    "completion_window": "48h",
    "in_progress_at": null,
    "expires_at": null,
    "finalizing_at": null,
    "completed_at": null,
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "status": "validating",
    "errors": "",
    "version": 0,
    "created_at": "2025-08-11T16:31:52.949816948+08:00",
    "updated_at": null,
    "created_by": "8f242aa1-f725-4a67-8***9-cb68025e0976",
    "created_by_key_id": "key_cc19f96c***e7390644a37da21",
    "remark": "",
    "total": 0,
    "completed": 0,
    "failed": 0,
    "metadata": null,
    "request_counts": {
        "total": 0,
        "completed": 0,
        "failed": 0
    }
}
```

### 4. 检查批量任务状态

您可以随时检查批量任务的状态以获取最新的批量信息。

Batch 对象的状态枚举值如下：

<table class="table table-big">
  <thead>
    <tr>
      <th>状态</th>
      <th>描述</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>VALIDATING</td>
      <td>批量任务开始前正在验证输入文件</td>
    </tr>

    <tr>
      <td>PROGRESS</td>
      <td>批量任务正在进行中</td>
    </tr>

    <tr>
      <td>COMPLETED</td>
      <td>批量处理成功完成</td>
    </tr>

    <tr>
      <td>FAILED</td>
      <td>批量处理失败</td>
    </tr>

    <tr>
      <td>EXPIRED</td>
      <td>批量任务超过截止时间</td>
    </tr>

    <tr>
      <td>CANCELLING</td>
      <td>批量任务正在取消中</td>
    </tr>

    <tr>
      <td>CANCELLED</td>
      <td>批量任务已取消</td>
    </tr>
  </tbody>
</table>

代码示例

**Python**

```python theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppio.com/openai/v1",
    api_key="<Your API Key>",
)
batch = client.batches.retrieve("batch_d2cq***73a68lu0")
print(batch)
```

**Curl**

```bash theme={null}
export API_KEY="<Your API Key>"

curl --request GET \
  --url https://api.ppio.com/openai/v1/batches/{batch_id} \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${API_KEY}'
```

### 5. 获取结果

批量推理完成后，您可以使用 Batch 对象中的 `output_file_id` 字段下载结果输出文件。

结果输出文件将在批量推理结束后 30 天删除，请及时通过接口获取。

代码示例

**Python**

```python theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppio.com/openai/v1",
    api_key="<Your API Key>",
)

content = client.files.content("example-250811-1")
print(content.read())
```

**Curl**

```bash theme={null}
export API_KEY="<Your API Key>"

curl --request GET \
  --url https://api.ppio.com/openai/v1/files/{file_id}/content \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${API_KEY}'
```

响应返回原始文件内容。对于批量输出文件，每行包含如下响应：

```json theme={null}
{
  "custom_id": "request-2589",
  "error": null,
  "id": "batch_req_task_d2c",
  "response": {
    "body": {
      "id": "29e1432c-edfb-44a4-b531-c23c600abfae",
      "object": "chat.completion",
      "created": 1754902266,
      "model": "deepseek-test",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hello! 👋 How can I assist you today? 😊"
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 13,
        "total_tokens": 18
      }
    },
    "request_id": "request-2589",
    "status_code": 200
  }
}
```

## 使用说明

### 限制

1. 每个批量任务最多可包含 50,000 个请求。<br />
2. 每个批量任务的最大输入文件大小为 100MB。

### 错误处理

批量处理过程中遇到的错误记录在单独的错误文件中，可通过 error\_file\_id 字段访问。常见的错误代码包括：

<table class="table table-big">
  <thead>
    <tr>
      <th>错误代码</th>
      <th>描述</th>
      <th>解决方案</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>400</td>
      <td>请求格式无效</td>
      <td>检查 JSONL 语法和必需字段</td>
    </tr>

    <tr>
      <td>401</td>
      <td>身份验证失败</td>
      <td>验证 API 密钥</td>
    </tr>

    <tr>
      <td>404</td>
      <td>未找到批量任务</td>
      <td>检查批量任务 ID</td>
    </tr>

    <tr>
      <td>429</td>
      <td>超过速率限制</td>
      <td>降低请求频率</td>
    </tr>

    <tr>
      <td>500</td>
      <td>服务器错误</td>
      <td>联系我们</td>
    </tr>
  </tbody>
</table>

### 批量任务过期

未在 48 小时内完成的批量任务将转换为 EXPIRED 状态。未完成的请求将被取消，而已完成的请求将通过输出文件提供。您只需为已完成请求消耗的令牌付费。批量任务会尽力在 48 小时内完成。

## 所有批量推理 API

1. [创建批处理任务](/models/reference-llm-create-batch)
2. [查询批处理任务](/models/reference-llm-retrieve-batch)
3. [取消批处理任务](/models/reference-llm-cancel-batch)
4. [查询批处理任务列表](/models/reference-llm-list-batches)
5. [上传文件](/models/reference-llm-upload-batch-input-file)
6. [查询文件列表](/models/reference-llm-list-files)
7. [查询文件](/models/reference-llm-query-file)
8. [删除文件](/models/reference-llm-delete-file)
9. [查询文件内容](/models/reference-llm-retrieve-file-content)
状态	描述
VALIDATING	批量任务开始前正在验证输入文件
PROGRESS	批量任务正在进行中
COMPLETED	批量处理成功完成
FAILED	批量处理失败
EXPIRED	批量任务超过截止时间
CANCELLING	批量任务正在取消中
CANCELLED	批量任务已取消
错误代码	描述	解决方案
400	请求格式无效	检查 JSONL 语法和必需字段
401	身份验证失败	验证 API 密钥
404	未找到批量任务	检查批量任务 ID
429	超过速率限制	降低请求频率
500	服务器错误	联系我们