Research Agent 代码实现示例

如果说最小 Agent 代码实现示例更偏回答：

第一个 Agent 的代码骨架应该长什么样

那么这篇文档要解决的是另一个更贴近研究型任务的问题：

一个会持续做调研、补资料、收束证据的 Research Agent，代码上到底该怎么组织？

很多人理解了 Research Agent 的概念之后，真正开始写代码时还是会卡在这些地方：

研究任务要不要先拆维度
state 里到底应该存什么
搜索和阅读工具怎么分层
什么情况下继续检索，什么情况下应该停止
发现信息缺口或来源冲突时，系统应该怎么处理
最后怎样从一堆材料收束成一个有依据的结论

所以这篇文档不讲“万能研究系统”，也不讲高风险自动执行，而是只聚焦一件事：

给你一个围绕 research loop 展开的、足够完整的 TypeScript 伪代码骨架。

重点不是把代码写得多炫，而是让你看懂一个研究型 Agent 最核心的闭环：

研究目标 -> 规划研究维度 -> 检索 -> 阅读 -> 提取证据 -> 判断缺口/冲突 -> 继续探索或停止 -> 形成结论

1. 这种代码骨架适合什么场景

Research Agent 不是所有任务都需要。

如果用户问的是一个非常直接的问题，比如：

这个接口为什么 500？

那你更可能需要的是 Debug Agent，而不是 Research Agent。

Research Agent 更适合下面这类任务：

需要综合多个来源，而不是单条事实
需要围绕若干维度做比较
第一轮信息通常不够，需要继续补资料
最终输出不是“摘抄资料”，而是“形成判断”

比如：

调研某个 AI 框架是否适合做内部知识助手
比较几种向量数据库在某个业务场景下的适用性
分析一个新模型是否适合接入现有工作流
研究一个技术方案的成本、限制和风险边界

这类任务的共同点是：

问题的价值不在于搜到内容，而在于把探索过程组织起来。

2. 先定义一个研究型任务

为了让代码结构更具体，我们先固定一个示例任务：

调研某个 AI 框架是否适合做企业内部知识助手。

这个目标天然不是单点问答，而是综合判断。

系统至少要回答：

它能解决什么问题
它适合什么场景
它不适合什么场景
团队接入成本大不大
已知限制和风险是什么

也就是说，Research Agent 的输出不只是“找到信息”，而是：

有结构的结论
每个结论背后的证据
当前仍然存在的不确定性

3. 先规划研究维度，而不是立刻搜索

Research Agent 很容易犯的第一个错误是：

一上来就搜很多，然后在结果里打转。

更稳的方式是先把研究任务拆成若干维度。

例如这个案例里，可以先拆成下面几类：

capability_fit：能力覆盖是否满足需求
integration_cost：接入与工程改造成本
operational_risk：稳定性、可观测性、治理边界
known_limitations：已知限制、缺口和不适用场景

这样做的好处不是“形式完整”，而是它能让后面的循环有明确目标：

这一轮检索是在补哪个维度
哪个维度已经足够
哪个维度仍然证据薄弱
哪些来源其实在讨论同一件事

所以第一版 Research Agent 很值得显式保留一个“研究维度计划”。

4. `state` 应该怎么设计

Research Agent 的 state 通常比最小 Agent 更重要。

因为研究型任务最怕的不是“查不到”，而是：

查过的东西散掉了
缺口没有被记录
冲突来源被覆盖掉
最后只剩一个模糊结论

下面是一套比较适合第一版实现的状态结构：

type ResearchDimensionId =
  | "capability_fit"
  | "integration_cost"
  | "operational_risk"
  | "known_limitations";

type ResearchDimension = {
  id: ResearchDimensionId;
  question: string;
  priority: "high" | "medium" | "low";
  done: boolean;
  confidence: number;
};

type Evidence = {
  id: string;
  dimensionId: ResearchDimensionId;
  claim: string;
  sourceTitle: string;
  sourceUrl: string;
  snippet: string;
  sourceType: "official_doc" | "blog" | "benchmark" | "community";
  supportLevel: "strong" | "medium" | "weak";
  extractedAtStep: number;
};

type InformationGap = {
  dimensionId: ResearchDimensionId;
  question: string;
  reason: "not_enough_evidence" | "conflict_detected" | "too_generic";
};

type ConflictRecord = {
  dimensionId: ResearchDimensionId;
  claimA: string;
  claimB: string;
  sourceA: string;
  sourceB: string;
  resolutionStatus: "open" | "resolved" | "needs_human_review";
  note: string;
};

type ToolCallRecord = {
  step: number;
  toolName: string;
  args: Record<string, unknown>;
  summary: string;
};

type ResearchState = {
  goal: string;
  dimensions: ResearchDimension[];
  evidence: Evidence[];
  gaps: InformationGap[];
  conflicts: ConflictRecord[];
  notes: string[];
  toolCalls: ToolCallRecord[];
  currentStep: number;
  maxSteps: number;
  done: boolean;
  stopReason?: string;
};

你可以看到，这个结构相比最小 Agent 多出来了几类真正关键的信息：

dimensions：告诉系统这次研究到底覆盖哪些面
evidence：不只存原始文本，而是存“可引用的证据单元”
gaps：显式记录哪里还没搞清楚
conflicts：显式记录互相冲突的说法

这几个字段会直接决定 research loop 是否能稳定推进。

5. `evidence` 为什么不能只存原始文本

很多第一版实现会把证据写成：

const evidence: string[] = [];

这当然能跑，但很快会出现两个问题：

你不知道这段话支持的是哪个维度
你不知道这段话来自什么来源，也无法比较可信度

Research Agent 更推荐把证据存成“结构化 claim”。

也就是说，一条证据至少要回答：

它在支持什么判断
它来自哪里
它强还是弱
它属于哪个研究维度

这样后面做综合时，系统才能回答：

哪个维度证据已经够了
哪个结论只建立在弱证据上
哪个冲突只是来源层级不同造成的

换句话说：

Research Agent 不是把资料装进数组，而是把资料整理成可比较的证据对象。

6. 工具设计：把 `retrieval` 和 `read` 分开

第一版实现时，很容易把工具写成一个大而全的接口，比如：

search_and_read_everything

这种工具通常会带来两个问题：

决策层不知道自己到底是在“找来源”还是“读内容”
日志里很难看清 research loop 的推进过程

更清楚的方式是把工具拆成两层：

retrieval：找候选来源
read：读取某个已选来源的详细内容

例如：

type ToolResult<T> = {
  ok: boolean;
  data?: T;
  error?: string;
};

type SearchHit = {
  title: string;
  url: string;
  snippet: string;
  sourceType: "official_doc" | "blog" | "benchmark" | "community";
};

type ToolDefinition<TArgs, TResult> = {
  name: string;
  description: string;
  run: (args: TArgs) => Promise<ToolResult<TResult>>;
};

const searchSourcesTool: ToolDefinition<
  { query: string; dimensionId: ResearchDimensionId },
  { hits: SearchHit[] }
> = {
  name: "search_sources",
  description: "Search candidate sources for a research dimension",
  async run(args) {
    return {
      ok: true,
      data: {
        hits: [
          {
            title: "Framework official docs",
            url: "https://example.com/docs",
            snippet: "Architecture, features, and deployment model",
            sourceType: "official_doc",
          },
        ],
      },
    };
  },
};

const readSourceTool: ToolDefinition<
  { url: string },
  { title: string; content: string }
> = {
  name: "read_source",
  description: "Read the full content of a selected source",
  async run(args) {
    return {
      ok: true,
      data: {
        title: "Framework official docs",
        content: "Detailed content here...",
      },
    };
  },
};

这样的分层有几个直接好处：

决策更清楚：先找，再读
工具日志更清楚：能看到研究是在拓展来源，还是在加深理解
资源更可控：不会一上来把所有长文都读完

Research loop 的味道，很多时候就体现在这种“先扩大搜索面，再对关键来源做深读”的节奏里。

7. 决策层应该决定什么

Research Agent 的决策层，第一版不用追求太复杂。

它最核心只需要决定三件事：

下一步研究哪个维度
是继续 search 还是 read
当前信息是否已经足够收束

也就是说，第一版的行动类型可以很简单：

type ResearchAction =
  | {
      type: "search";
      dimensionId: ResearchDimensionId;
      query: string;
      reason: string;
    }
  | {
      type: "read";
      dimensionId: ResearchDimensionId;
      url: string;
      reason: string;
    }
  | {
      type: "finish";
      reason: string;
    };

你会发现，这里故意没有任何高风险动作。

因为这个示例的重点不是自动执行系统改动，而是：

围绕研究任务不断找证据、补缺口、决定何时收束。

8. 什么时候继续检索，什么时候继续阅读

Research loop 里一个特别关键的问题是：

下一步应该继续搜，还是先把已有来源读深？

一个比较稳的经验是：

如果某个维度几乎没有来源，先 search
如果某个维度已经有候选来源但 claim 还不清楚，先 read
如果已经出现冲突，优先寻找更高质量来源做 read
如果证据都很泛，继续 search 更具体的问题

可以把这个判断写成非常简单的启发式规则：

function chooseNextAction(state: ResearchState): ResearchAction {
  const openGap = state.gaps[0];
  if (!openGap) {
    return {
      type: "finish",
      reason: "No explicit gaps remain and evidence is sufficient across dimensions",
    };
  }

  const dimensionEvidence = state.evidence.filter(
    (item) => item.dimensionId === openGap.dimensionId
  );

  if (dimensionEvidence.length === 0) {
    return {
      type: "search",
      dimensionId: openGap.dimensionId,
      query: openGap.question,
      reason: "Need initial sources for this dimension",
    };
  }

  const hasStrongEvidence = dimensionEvidence.some(
    (item) => item.supportLevel === "strong"
  );

  if (!hasStrongEvidence) {
    return {
      type: "read",
      dimensionId: openGap.dimensionId,
      url: pickBestSourceUrl(state, openGap.dimensionId),
      reason: "Need deeper reading to strengthen evidence quality",
    };
  }

  return {
    type: "finish",
    reason: "Current gaps are sufficiently addressed",
  };
}

这里最重要的不是规则本身有多“聪明”，而是：

继续检索还是继续阅读，应该由显式的状态来驱动。

9. 什么时候继续，什么时候停止

Research Agent 最怕两种极端：

太早停：第一轮搜到一点资料就总结
停不下来：一直检索，但研究质量没有继续提升

所以停止条件最好也显式化。

比较常见的停止条件有：

每个高优先级维度至少有 1 到 2 条可用证据
没有未解决的关键冲突
新一轮检索带来的信息增量已经很低
达到最大步数或预算上限

可以把它写成一个单独函数：

function shouldStop(state: ResearchState): { stop: boolean; reason: string } {
  const highPriorityDimensions = state.dimensions.filter(
    (dimension) => dimension.priority === "high"
  );

  const uncovered = highPriorityDimensions.filter((dimension) => {
    const count = state.evidence.filter(
      (item) => item.dimensionId === dimension.id
    ).length;
    return count < 2;
  });

  const openCriticalConflicts = state.conflicts.filter(
    (conflict) => conflict.resolutionStatus === "open"
  );

  if (state.currentStep >= state.maxSteps) {
    return { stop: true, reason: "Reached max research steps" };
  }

  if (uncovered.length > 0) {
    return {
      stop: false,
      reason: "Some high-priority dimensions still lack enough evidence",
    };
  }

  if (openCriticalConflicts.length > 0) {
    return {
      stop: false,
      reason: "Important conflicts remain unresolved",
    };
  }

  return {
    stop: true,
    reason: "Coverage is sufficient and key conflicts are resolved",
  };
}

这样做的价值是：

研究循环为什么结束，不再是黑盒
日志里能明确记录 stop reason
后面做 eval 时，可以检查系统是不是经常过早停止

10. 一个较完整的 TypeScript 伪代码实现

下面这段代码不是生产实现，但已经足够表达一个完整的 research loop。

重点看结构，不要纠结具体接口细节。

type ResearchDimensionId =
  | "capability_fit"
  | "integration_cost"
  | "operational_risk"
  | "known_limitations";

type ResearchDimension = {
  id: ResearchDimensionId;
  question: string;
  priority: "high" | "medium" | "low";
  done: boolean;
  confidence: number;
};

type Evidence = {
  id: string;
  dimensionId: ResearchDimensionId;
  claim: string;
  sourceTitle: string;
  sourceUrl: string;
  snippet: string;
  sourceType: "official_doc" | "blog" | "benchmark" | "community";
  supportLevel: "strong" | "medium" | "weak";
  extractedAtStep: number;
};

type InformationGap = {
  dimensionId: ResearchDimensionId;
  question: string;
  reason: "not_enough_evidence" | "conflict_detected" | "too_generic";
};

type ConflictRecord = {
  dimensionId: ResearchDimensionId;
  claimA: string;
  claimB: string;
  sourceA: string;
  sourceB: string;
  resolutionStatus: "open" | "resolved" | "needs_human_review";
  note: string;
};

type ToolCallRecord = {
  step: number;
  toolName: string;
  args: Record<string, unknown>;
  summary: string;
};

type SearchHit = {
  title: string;
  url: string;
  snippet: string;
  sourceType: "official_doc" | "blog" | "benchmark" | "community";
};

type ResearchAction =
  | {
      type: "search";
      dimensionId: ResearchDimensionId;
      query: string;
      reason: string;
    }
  | {
      type: "read";
      dimensionId: ResearchDimensionId;
      url: string;
      reason: string;
    }
  | {
      type: "finish";
      reason: string;
    };

type ResearchState = {
  goal: string;
  dimensions: ResearchDimension[];
  evidence: Evidence[];
  gaps: InformationGap[];
  conflicts: ConflictRecord[];
  notes: string[];
  toolCalls: ToolCallRecord[];
  currentStep: number;
  maxSteps: number;
  done: boolean;
  stopReason?: string;
};

type ToolResult<T> = {
  ok: boolean;
  data?: T;
  error?: string;
};

const searchSourcesTool = {
  name: "search_sources",
  description: "Search candidate sources for a research dimension",
  async run(args: {
    query: string;
    dimensionId: ResearchDimensionId;
  }): Promise<ToolResult<{ hits: SearchHit[] }>> {
    return {
      ok: true,
      data: {
        hits: [
          {
            title: "Official docs overview",
            url: "https://example.com/docs/overview",
            snippet: "Capabilities and system architecture",
            sourceType: "official_doc",
          },
          {
            title: "Community implementation notes",
            url: "https://example.com/community/post",
            snippet: "Practical integration experience",
            sourceType: "community",
          },
        ],
      },
    };
  },
};

const readSourceTool = {
  name: "read_source",
  description: "Read full content for a selected source",
  async run(args: { url: string }): Promise<ToolResult<{ title: string; content: string }>> {
    return {
      ok: true,
      data: {
        title: "Official docs overview",
        content:
          "The framework supports tool orchestration, retrieval integration, and evaluation hooks, but requires explicit state management for multi-step workflows.",
      },
    };
  },
};

function initializeResearchState(goal: string): ResearchState {
  return {
    goal,
    dimensions: [
      {
        id: "capability_fit",
        question: "Does the framework cover the core capabilities needed for an internal knowledge assistant?",
        priority: "high",
        done: false,
        confidence: 0,
      },
      {
        id: "integration_cost",
        question: "What is the integration and maintenance cost for an existing engineering team?",
        priority: "high",
        done: false,
        confidence: 0,
      },
      {
        id: "operational_risk",
        question: "What are the observability, safety, and reliability risks in production use?",
        priority: "high",
        done: false,
        confidence: 0,
      },
      {
        id: "known_limitations",
        question: "What limitations or unsuitable scenarios are repeatedly mentioned across sources?",
        priority: "medium",
        done: false,
        confidence: 0,
      },
    ],
    evidence: [],
    gaps: [],
    conflicts: [],
    notes: [],
    toolCalls: [],
    currentStep: 0,
    maxSteps: 8,
    done: false,
  };
}

function rebuildGaps(state: ResearchState) {
  const newGaps: InformationGap[] = [];

  for (const dimension of state.dimensions) {
    const relatedEvidence = state.evidence.filter(
      (item) => item.dimensionId === dimension.id
    );

    if (relatedEvidence.length < 2) {
      newGaps.push({
        dimensionId: dimension.id,
        question: dimension.question,
        reason: "not_enough_evidence",
      });
      continue;
    }

    const onlyWeakEvidence = relatedEvidence.every(
      (item) => item.supportLevel === "weak"
    );

    if (onlyWeakEvidence) {
      newGaps.push({
        dimensionId: dimension.id,
        question: `${dimension.question} Use stronger and more direct sources.`,
        reason: "too_generic",
      });
    }
  }

  for (const conflict of state.conflicts) {
    if (conflict.resolutionStatus === "open") {
      newGaps.push({
        dimensionId: conflict.dimensionId,
        question: `Resolve conflict between ${conflict.sourceA} and ${conflict.sourceB}`,
        reason: "conflict_detected",
      });
    }
  }

  state.gaps = newGaps;
}

function pickBestSourceUrl(state: ResearchState, dimensionId: ResearchDimensionId): string {
  const preferred = state.toolCalls
    .filter((call) => call.toolName === "search_sources")
    .map((call) => String(call.args.selectedUrl ?? ""))
    .find(Boolean);

  return preferred || "https://example.com/docs/overview";
}

function chooseNextAction(state: ResearchState): ResearchAction {
  rebuildGaps(state);

  const stopCheck = shouldStop(state);
  if (stopCheck.stop) {
    return { type: "finish", reason: stopCheck.reason };
  }

  const nextGap = state.gaps[0];
  const evidenceForDimension = state.evidence.filter(
    (item) => item.dimensionId === nextGap.dimensionId
  );

  if (evidenceForDimension.length === 0) {
    return {
      type: "search",
      dimensionId: nextGap.dimensionId,
      query: nextGap.question,
      reason: "Need initial candidate sources",
    };
  }

  return {
    type: "read",
    dimensionId: nextGap.dimensionId,
    url: pickBestSourceUrl(state, nextGap.dimensionId),
    reason: "Need deeper evidence from a selected source",
  };
}

function shouldStop(state: ResearchState): { stop: boolean; reason: string } {
  if (state.currentStep >= state.maxSteps) {
    return { stop: true, reason: "Reached max research steps" };
  }

  const highPriorityDimensions = state.dimensions.filter(
    (dimension) => dimension.priority === "high"
  );

  const missingCoverage = highPriorityDimensions.some((dimension) => {
    const evidenceCount = state.evidence.filter(
      (item) => item.dimensionId === dimension.id
    ).length;
    return evidenceCount < 2;
  });

  if (missingCoverage) {
    return {
      stop: false,
      reason: "High-priority dimensions still need more coverage",
    };
  }

  const openConflicts = state.conflicts.some(
    (conflict) => conflict.resolutionStatus === "open"
  );

  if (openConflicts) {
    return {
      stop: false,
      reason: "Conflicts remain unresolved",
    };
  }

  return {
    stop: true,
    reason: "Coverage and evidence quality are sufficient for synthesis",
  };
}

function extractEvidenceFromRead(
  state: ResearchState,
  dimensionId: ResearchDimensionId,
  url: string,
  title: string,
  content: string
): Evidence[] {
  return [
    {
      id: `${dimensionId}-${state.currentStep}`,
      dimensionId,
      claim: "The framework supports multi-step orchestration but requires explicit workflow and state design.",
      sourceTitle: title,
      sourceUrl: url,
      snippet: content.slice(0, 160),
      sourceType: "official_doc",
      supportLevel: "strong",
      extractedAtStep: state.currentStep,
    },
  ];
}

function detectConflicts(state: ResearchState) {
  const capabilityEvidence = state.evidence.filter(
    (item) => item.dimensionId === "capability_fit"
  );

  const mentionsEasy = capabilityEvidence.find((item) =>
    item.claim.includes("easy")
  );
  const mentionsHard = capabilityEvidence.find((item) =>
    item.claim.includes("requires explicit workflow")
  );

  if (mentionsEasy && mentionsHard) {
    state.conflicts.push({
      dimensionId: "capability_fit",
      claimA: mentionsEasy.claim,
      claimB: mentionsHard.claim,
      sourceA: mentionsEasy.sourceUrl,
      sourceB: mentionsHard.sourceUrl,
      resolutionStatus: "open",
      note: "Need narrower evidence about what is easy by default versus what still requires custom engineering.",
    });
  }
}

function updateDimensionStatus(state: ResearchState) {
  for (const dimension of state.dimensions) {
    const relatedEvidence = state.evidence.filter(
      (item) => item.dimensionId === dimension.id
    );
    const strongCount = relatedEvidence.filter(
      (item) => item.supportLevel === "strong"
    ).length;

    dimension.confidence = Math.min(1, strongCount / 2);
    dimension.done = relatedEvidence.length >= 2 && dimension.confidence >= 0.5;
  }
}

function logStep(
  state: ResearchState,
  event: string,
  extra: Record<string, unknown> = {}
) {
  console.log(
    JSON.stringify({
      step: state.currentStep,
      goal: state.goal,
      event,
      ...extra,
    })
  );
}

function buildFinalReport(state: ResearchState): string {
  const lines: string[] = [];

  lines.push(`# Research Summary`);
  lines.push(`Goal: ${state.goal}`);
  lines.push("");
  lines.push(`## Dimension Status`);

  for (const dimension of state.dimensions) {
    lines.push(
      `- ${dimension.id}: done=${dimension.done}, confidence=${dimension.confidence.toFixed(2)}`
    );
  }

  lines.push("");
  lines.push(`## Key Evidence`);
  for (const item of state.evidence) {
    lines.push(
      `- [${item.dimensionId}] ${item.claim} (${item.sourceTitle}, ${item.supportLevel})`
    );
  }

  lines.push("");
  lines.push(`## Remaining Gaps`);
  if (state.gaps.length === 0) {
    lines.push(`- None`);
  } else {
    for (const gap of state.gaps) {
      lines.push(`- [${gap.dimensionId}] ${gap.question}`);
    }
  }

  lines.push("");
  lines.push(`## Stop Reason`);
  lines.push(`- ${state.stopReason ?? "unknown"}`);

  return lines.join("\n");
}

async function runResearchAgent(goal: string) {
  const state = initializeResearchState(goal);

  rebuildGaps(state);

  while (!state.done) {
    const action = chooseNextAction(state);

    logStep(state, "decision_made", {
      actionType: action.type,
      reason: action.reason,
    });

    if (action.type === "finish") {
      state.done = true;
      state.stopReason = action.reason;
      break;
    }

    if (action.type === "search") {
      const result = await searchSourcesTool.run({
        query: action.query,
        dimensionId: action.dimensionId,
      });

      if (!result.ok || !result.data) {
        state.notes.push(`Search failed for ${action.dimensionId}: ${result.error ?? "unknown error"}`);
        state.currentStep += 1;
        continue;
      }

      const topHit = result.data.hits[0];
      state.toolCalls.push({
        step: state.currentStep,
        toolName: searchSourcesTool.name,
        args: {
          query: action.query,
          dimensionId: action.dimensionId,
          selectedUrl: topHit?.url,
        },
        summary: `Found ${result.data.hits.length} candidate sources`,
      });

      state.notes.push(
        `For ${action.dimensionId}, selected ${topHit?.title ?? "no hit"} for deeper reading.`
      );
    }

    if (action.type === "read") {
      const result = await readSourceTool.run({ url: action.url });

      if (!result.ok || !result.data) {
        state.notes.push(`Read failed for ${action.url}: ${result.error ?? "unknown error"}`);
        state.currentStep += 1;
        continue;
      }

      state.toolCalls.push({
        step: state.currentStep,
        toolName: readSourceTool.name,
        args: { url: action.url },
        summary: `Read ${result.data.title}`,
      });

      const extracted = extractEvidenceFromRead(
        state,
        action.dimensionId,
        action.url,
        result.data.title,
        result.data.content
      );

      state.evidence.push(...extracted);
      detectConflicts(state);
      updateDimensionStatus(state);
      rebuildGaps(state);
    }

    state.currentStep += 1;
  }

  return {
    report: buildFinalReport(state),
    state,
  };
}

这段伪代码里最值得你抓住的不是类型数量，而是下面这条主线：

先初始化研究维度
每轮先重建 gaps
再决定是 search、read 还是 finish
读完后把内容整理成 evidence
再做冲突检测、维度更新和停止判断

这就是一个比较典型的 research loop。

11. 如何处理信息缺口

Research Agent 的价值，很多时候就体现在它能不能意识到：

现在的信息还不够。

第一版实现时，你不需要做特别复杂的反思模块，但至少要让系统能识别几类常见缺口：

还没有覆盖到某个高优先级维度
某个维度虽然有资料，但都太泛
只有单一来源，没有交叉支持
已经发现冲突，但还没有更高质量证据去收束

所以一个实用的做法是：

每轮结束都重建 gaps
把 gap 变成下一轮的 query 输入
明确区分“缺来源”和“缺高质量证据”

这样系统就不是机械循环，而是在围绕“当前还缺什么”继续推进。

12. 如何处理来源冲突

研究型任务里，来源冲突不是异常，而是常态。

真正重要的不是“避免冲突”，而是：

冲突出现后，系统能不能把它显式保留下来，并继续寻找更高质量证据。

第一版可以先用一个很朴素的原则：

官方文档通常更适合回答“支持什么、边界是什么”
实践博客更适合回答“落地成本和坑点是什么”
社区讨论更适合暴露“真实摩擦点”，但可信度往往不稳定

所以当冲突出现时，不要急着直接二选一，而应该先做两件事：

记录冲突属于哪个维度
继续查找更接近原始事实、更高质量的来源

如果最后还是解决不了，就不要假装问题已经消失，而应该把它保留到最终报告里，标记为：

needs_human_review
或者“当前证据不足以下定论”

Research Agent 的可信度，往往就来自这种不强行过度收束的克制。

13. 日志应该记什么

Research Agent 没有日志，后面几乎没法调。

最小日志建议至少记下面几类事件：

当前步数
当前决策类型
当前聚焦的研究维度
工具参数摘要
证据提取数量
是否发现 gap 或 conflict
最终 stop reason

例如每一步可以输出：

logStep(state, "decision_made", {
  actionType: action.type,
  dimensionId:
    action.type === "finish" ? undefined : action.dimensionId,
  reason: action.reason,
});

这类日志的价值不是“方便看热闹”，而是它能帮助你回答几个很关键的问题：

为什么这个任务会停在第 3 步
为什么这个维度一直没有证据
为什么系统总在 search，却不进入 read
为什么最终结论看起来像拍脑袋

很多时候，研究质量问题最后都会暴露成 loop 或日志问题。

14. 最小 eval 应该怎么做

Research Agent 的第一版，不需要一上来就做大而全评测。

更实际的做法是先做一组最小 eval，哪怕只有 5 到 10 个固定研究题目也足够有价值。

可以重点看这几项：

14.1 Coverage

高优先级研究维度有没有被覆盖到。

14.2 Evidence Quality

结论是不是主要建立在高质量来源上。

14.3 Groundedness

最终结论能不能回指到具体证据。

14.4 Conflict Handling

出现冲突时，系统是忽略掉了，还是保留并继续求证。

14.5 Efficiency

是不是用了很多步，却没有带来明显的信息增量。

一个最小的 eval 记录长这样就够用：

type EvalResult = {
  taskId: string;
  finished: boolean;
  totalSteps: number;
  coveredHighPriorityDimensions: number;
  totalHighPriorityDimensions: number;
  openConflicts: number;
  groundedConclusion: boolean;
};

第一版不要急着追求“研究分数”多精确，而是先能稳定发现这些问题：

过早停止
漏维度
证据太弱
冲突被忽略
步数很多但质量没提升

只要这几类现象能被看见，你的系统就已经进入可迭代状态了。

15. 第一版实现时最值得坚持的几个原则

最后给你几个很实用的原则。

15.1 先把研究过程做清楚，再谈“更聪明”

不要一开始就加复杂反思、多 Agent 协同、自动任务树展开。

Research Agent 第一版更重要的是：

研究维度清楚
证据结构清楚
缺口和冲突可见
停止条件清楚

15.2 `search` 和 `read` 分层，比“一个万能工具”更稳

Research loop 很依赖节奏感。

分层之后，你才能清楚看到：

什么时候是在扩展来源面
什么时候是在做深读
什么时候是在补证据强度

15.3 允许不确定性存在

研究型任务不是每次都能得到完全一致的答案。

所以好的输出不是“每次都很肯定”，而是：

哪些结论已经有足够依据
哪些地方还有冲突
哪些地方需要人工继续判断

15.4 先做最小 eval，不要等系统很大才评估

Research Agent 如果没有最小 eval，很容易越写越复杂，但你并不知道它到底有没有变好。

16. 一句话总结

Research Agent 的代码实现重点，不是把“搜索”包起来，而是把“围绕目标做多步探索、发现缺口、整理证据并收束结论”这个研究闭环稳定地表达出来。

1. 这种代码骨架适合什么场景​

2. 先定义一个研究型任务​

3. 先规划研究维度，而不是立刻搜索​

4. state 应该怎么设计​

5. evidence 为什么不能只存原始文本​

6. 工具设计：把 retrieval 和 read 分开​

7. 决策层应该决定什么​

8. 什么时候继续检索，什么时候继续阅读​

9. 什么时候继续，什么时候停止​

10. 一个较完整的 TypeScript 伪代码实现​

11. 如何处理信息缺口​

12. 如何处理来源冲突​

13. 日志应该记什么​

14. 最小 eval 应该怎么做​

14.1 Coverage​

14.2 Evidence Quality​

14.3 Groundedness​

14.4 Conflict Handling​

14.5 Efficiency​

15. 第一版实现时最值得坚持的几个原则​

15.1 先把研究过程做清楚，再谈“更聪明”​

15.2 search 和 read 分层，比“一个万能工具”更稳​

15.3 允许不确定性存在​

15.4 先做最小 eval，不要等系统很大才评估​

16. 一句话总结​