Harness Markdown 表格模板
这篇文档只专注一件事:
把 Harness 运行、对比、回归里最常用的表格模板整理成可以直接复制的 Markdown 版本。
如果已经理解 Harness 是“运行与验证支架”,这里可以直接当模板页用。
1. 任务运行清单表
| 字段 | 含义 | 示例 |
| --- | --- | --- |
| `Run Batch ID` | 本轮任务批次 | `batch-2026-04-29-a` |
| `Task Set Name` | 任务集名称 | `research-core-set` |
| `Compared Version` | 当前版本 | `agent-v5` |
| `Baseline Version` | 对照版本 | `agent-v4` |
| `Task Count` | 任务数量 | `25` |
| `Run Status` | 运行状态 | `finished` |
| `Notes` | 备注 | `包含 4 个历史失败样本` |
2. 单任务轨迹摘要表
| Step | Goal | Action | Tool | Observation | State Update |
| --- | --- | --- | --- | --- | --- |
| 1 | | | | | |
| 2 | | | | | |
| 3 | | | | | |
这个表适合:
- 复盘单个失败案例
- 对比两个版本的运行路径
3. 回归对比表
| Task ID | Baseline Result | Current Result | Status | Notes |
| --- | --- | --- | --- | --- |
| `task-001` | pass | pass | unchanged | |
| `task-002` | fail | pass | improved | |
| `task-003` | pass | fail | regressed | |
4. 失败分布统计表
| Failure Type | Count | Example Task | Notes |
| --- | --- | --- | --- |
| Wrong Tool | | | |
| Wrong Argument | | | |
| Missing Evidence | | | |
| Looping | | | |
| Premature Stop | | | |
5. 版本回归发布表
| 检查项 | 结果 | 备注 |
| --- | --- | --- |
| Core Task Set Pass | | |
| Historical Failure Set Pass | | |
| No New Critical Regression | | |
| Cost Within Budget | | |
| Latency Within SLA | | |
| High-risk Flow Reviewed | | |
| Release Decision | | |