sanguo_moziplus_v2/docs/design/agent-routing-redesign.md

# Agent 路由机制重设计方案

**版本**: v2.0
**作者**: 庞统（副军师）🐦
**日期**: 2026-05-17
**状态**: 待评审
**触发**: E2E 测试暴露 review 阶段派错 Agent（张飞被派去审查自己），根因是 Daemon 硬编码路由
**评审**: 司马懿

---

## 1. 问题诊断

### 1.1 Bug 根因

任务生命周期中 `assignee` 只在执行阶段被设置（张飞 claim → assignee="zhangfei-dev"）。到 review 阶段，`decide()` 走 Level 2：`task.assignee` 在注册列表中 → 又派给张飞。

### 1.2 更深层的问题

**Daemon 在做 AI 该做的决策。** v2.6 架构明确定义：

| 维度 | v2.6 设计目标 | 当前实现 |
|------|-------------|---------|
| 决策者 | Agent（在黑板上自主决策） | Daemon（if-else 硬编码） |
| Daemon 角色 | 投递员（执行黑板上的决策） | 调度器（决定谁干什么） |
| 编排方式 | AI agent 在黑板上自主领活 | 配置表驱动（非 AI 判断） |

T3-10 设计原文写着"配置表驱动非 AI 判断"——与 v2.6 核心原则矛盾。

---

## 2. 调研发现

### 2.1 学术前沿

| 来源 | 核心发现 | 对我们的价值 |
|------|---------|-------------|
| **bMAS** arXiv 2507.01701 | Control Unit（LLM 驱动）根据黑板当前内容动态选择 Agent | 路由本身可以是 LLM 调用，不是 if-else |
| **Self-Selection** arXiv 2510.01285 | 任务不显式分配，Agent 根据自己能力自主决定是否参与 | 最 AI Native 的模式，我们的演进目标 |
| **MasRouter** arXiv 2601.04861 | 根据任务复杂度动态选模型规模 + confidence 机制 | confidence 阈值 + 历史表现动态评分 |
| **AgentGate** arXiv 2604.06696 | 3B-7B 小模型做结构化路由决策 | 验证"路由可以是 AI"的可行性 |

### 2.2 生产实践

| 项目 | 模式 | 启发 |
|------|------|------|
| **Microsoft Conductor**（2026.05 开源） | YAML 确定性编排 | 确定性流程 + LLM 动态路由分层混合 |
| **Azure Agent Patterns** | 5 种模式：顺序/并发/群聊/**Handoff**/Magentic | **Handoff**：Agent 完成后自己决定交接给谁 |
| **AWS 动态分派** | 事件驱动 + 上下文感知路由 | 路由变成事件，不是轮询 |
| **Claude Code Agent Teams** | Lead coordinator + context 隔离 | Lead 做分解+分配+监控，subagent 只拿相关 context |

### 2.3 已有调研的线索

- architecture-v2.6.md：**"Agent 决策，Daemon 执行"**；Daemon 是投递员不是决策者
- shared-consciousness-research.md：Control Unit 是 LLM 驱动的，不是规则路由
- v2.6-research-01：Hermes 幻觉门控——不信任 Agent 完成声明

---

## 3. 设计原则

| # | 原则 | 说明 |
|---|------|------|
| P1 | 路由决策在 Agent 层，不在 Daemon 层 | "谁该做这个任务"由 Agent 自己或 LLM 决定，Daemon 只执行 |
| P2 | 当前 Agent 最清楚下一步需要谁 | 刚做完工作的人最清楚该交接给谁（Azure Handoff） |
| P3 | 路由可审计 | 每次路由决策记录到黑板，可回溯 |

---

## 4. 三种路由模式

### 4.1 模式总览

```
┌───────────────────────────────────────────────────────────┐
│                     路由决策入口                            │
│               Dispatcher.decide(task)                      │
└────────┬──────────────┬──────────────────┬────────────────┘
         │              │                  │
    ┌────▼────┐   ┌────▼─────┐   ┌────────▼────────┐
    │ Mode A  │   │ Mode B   │   │    Mode C       │
    │ LLM路由  │   │ Agent交接 │   │  Agent自主领活   │
    │(中心化)  │   │(去中心化) │   │   (去中心化)     │
    └────┬────┘   └────┬─────┘   └────────┬────────┘
         │              │                  │
    LLM选Agent     执行者说需要谁     Agent自己来领
```

### 4.2 Mode A：LLM 路由（中心化）

**场景**：首次分配（pending → claimed）、异常升级（failed/blocked）、无明确 handoff 指令时。

**机制**：Daemon 调用一次轻量 LLM API，传入任务信息 + Agent 能力画像 + 负载状态，LLM 返回选择的 Agent + 理由 + 置信度。

**关键**：不是 spawn 一个 Agent session，是一次 ~300 token 的 API 调用（~1-2s，<¥0.01）。

```
输入: 任务描述 + 6个Agent画像 + 负载
输出: {"agent_id": "xxx", "reason": "...", "confidence": 0.9}
约束: ~200 token response, temperature=0.1
```

### 4.3 Mode B：Agent 声明式交接（去中心化）⭐ 最高频

**场景**：Agent 完成当前阶段后，明确声明下一步需要什么。

**机制**：Agent 在 POST /status 时附带 `next_capability` 字段：

```json
{
  "status": "review",
  "agent": "zhangfei-dev",
  "next_capability": "review",
  "handoff_note": "代码已实现，请审查质量和安全性"
}
```

Daemon 读 `next_capability`，查 Agent 能力画像找到匹配者（排除当前执行者），直接 spawn。

**这是最 AI Native 的模式**——刚做完工作的人最清楚下一步需要谁。不需要 LLM 调用，0ms 延迟。

### 4.4 Mode C：Agent 自主领活（去中心化）— 未来演进

**场景**：Daemon 广播任务需求，Agent 自己决定是否 claim。

**当前阶段不实现**，保留演进空间。数据结构（agent_profiles、capabilities）不变，只需把"Daemon 查表派发"改为"Daemon 广播 + Agent 自己 claim"。

### 4.5 模式选择逻辑

```python
def decide(self, task, action_type=""):
    # 确定性快速路径（0ms，不调 LLM）
    if self._is_deterministic(task, action_type):
        return self._deterministic_route(task, action_type)

    # Mode B: Agent 声明了 next_capability → 直接匹配
    if task.next_capability:
        return self._match_capability(task.next_capability,
                                       exclude={task.assignee})

    # Mode A: 无明确 handoff → LLM 路由
    return self._llm_route(task, action_type)
```

**确定性快速路径**包括：
- 机械检查（L1_guardrail、format_check）→ Daemon 本地执行
- 已有 assignee 且非生命周期流转（如 crashed → retry 同一人）→ 直接用

---

## 5. 核心组件设计

### 5.1 Agent 能力画像（Agent Profile）

每个 Agent 在配置中声明自己的能力（**不是 Daemon 硬编码**）：

```yaml
# config/default.yaml → agents 段扩展
agents:
  zhangfei-dev:
    capabilities: [coding, implementation, scripting]
    can_review: false
    max_concurrent: 1

  simayi-challenger:
    capabilities: [review, quality_check, debate]
    can_review: true
    max_concurrent: 2

  guanyu-dev:
    capabilities: [risk, compliance, position_check]
    can_review: true
    max_concurrent: 1

  zhaoyun-data:
    capabilities: [data, acquisition, cleaning, verification]
    can_review: false
    max_concurrent: 1

  jiangwei-infra:
    capabilities: [deploy, infrastructure, docker, vnpy]
    can_review: false
    max_concurrent: 1

  pangtong-fujunshi:
    capabilities: [planning, coordination, escalation, strategy]
    can_review: true
    is_fallback: true
    max_concurrent: 3
```

Daemon 启动时读取配置，写入黑板 `agent_profiles` 表。未来可演进为 Agent 自己注册。

### 5.2 LLM 路由器（LLMDriver）

```python
class LLMDriver:
    """bMAS Control Unit — 轻量 LLM 路由决策"""

    def __init__(self, model: str, api_base: str, api_key: str):
        self.model = model
        self.client = OpenAI(base_url=api_base, api_key=api_key)

    def route(self, task, agent_profiles, active_agents) -> RouteDecision:
        prompt = self._build_prompt(task, agent_profiles, active_agents)

        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"},
            max_tokens=200,
            temperature=0.1,
        )

        result = json.loads(response.choices[0].message.content)
        return RouteDecision(
            agent_id=result["agent_id"],
            reason=result["reason"],
            confidence=result.get("confidence", 0.5),
        )
```

**Routing Prompt 模板**：

```
你是任务路由器。根据任务需求和 Agent 能力，选择最合适的 Agent。

## 当前任务
- ID: {task_id}
- 标题: {task_title}
- 状态: {task_status}
- 描述: {task_description}
- 上一步执行者: {previous_assignee}

## 可用 Agent
{每个Agent: ID, 能力列表, 当前负载}

## 约束
1. review/quality_check 不能选上一步执行者
2. 同等能力优先选负载最低的
3. 必须匹配任务所需能力

## 输出
{"agent_id": "...", "reason": "...", "confidence": 0.0-1.0}
```

### 5.3 Dispatcher 重写

```python
class Dispatcher:
    def __init__(self, config, counter):
        self.counter = counter
        self.agent_profiles = config.get("agent_profiles", {})
        self.llm = LLMDriver(
            model=config.get("routing", {}).get("model", "zhipu/glm-5.1"),
            api_base=config.get("routing", {}).get("api_base", ""),
            api_key=config.get("routing", {}).get("api_key", ""),
        )
        self.LOCAL_ACTIONS = {"L1_guardrail", "format_check", "file_exists_check"}

    def decide(self, task, action_type="") -> dict:
        # ── 快速路径：确定性路由 ──
        if action_type in self.LOCAL_ACTIONS:
            return {"level": "local", "reason": "机械检查，Daemon本地执行"}

        # retry 同一人
        if action_type == "retry" and task.assignee:
            return {"level": "full_agent", "agent_id": task.assignee,
                    "reason": "retry原执行者", "mode": "deterministic"}

        # ── Mode B: Agent 声明式交接 ──
        if task.next_capability:
            agent = self._match_capability(task.next_capability,
                                            exclude={task.assignee})
            if agent:
                return {"level": "full_agent", "agent_id": agent,
                        "reason": f"执行者handoff: 需要{task.next_capability}",
                        "mode": "agent_handoff"}

        # ── Mode A: LLM 路由 ──
        decision = self.llm.route(task, self.agent_profiles,
                                   self.counter.active_agents)

        # 合法性校验
        if (decision.agent_id not in self.agent_profiles
            or decision.confidence < 0.7):
            return {"level": "full_agent", "agent_id": "pangtong-fujunshi",
                    "reason": f"LLM低置信度({decision.confidence}): {decision.reason}",
                    "mode": "fallback"}

        return {"level": "full_agent", "agent_id": decision.agent_id,
                "reason": decision.reason, "mode": "llm_route",
                "confidence": decision.confidence}

    def _match_capability(self, capability, exclude=None):
        """从能力画像中匹配 Agent"""
        candidates = [
            aid for aid, prof in self.agent_profiles.items()
            if aid not in (exclude or set())
            and capability in prof.get("capabilities", [])
        ]
        if not candidates:
            return None
        if len(candidates) == 1:
            return candidates[0]
        return min(candidates, key=lambda a: self.counter.active_agents.get(a, 0))
```

### 5.4 assignee 语义变更

| 维度 | 当前 | 改为 |
|------|------|------|
| `assignee` 含义 | 任务负责人（贯穿全生命周期） | **当前阶段执行者**（随状态流转更新） |
| 新增 `previous_assignee` | 无 | 保存前一阶段执行者（用于排除和审计） |

```python
# 状态流转时更新
task.previous_assignee = task.assignee
task.assignee = new_agent_id
```

---

## 6. 路由审计

每次路由决策写入黑板 `routing_decisions` 表。

### 6.1 表结构

```sql
CREATE TABLE IF NOT EXISTS routing_decisions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    task_id TEXT NOT NULL,
    from_status TEXT,           -- 前一状态
    to_status TEXT,             -- 目标状态
    mode TEXT NOT NULL,         -- deterministic / agent_handoff / llm_route / fallback
    selected_agent TEXT NOT NULL,
    previous_agent TEXT,        -- 前一阶段执行者
    reason TEXT,                -- 路由理由
    confidence REAL,            -- LLM 置信度（Mode A 才有）
    model TEXT,                 -- 使用的 LLM 模型（Mode A 才有）
    latency_ms INTEGER,         -- 路由耗时
    created_at TEXT DEFAULT (datetime('now')),
    FOREIGN KEY (task_id) REFERENCES tasks(id)
);

CREATE INDEX idx_routing_task ON routing_decisions(task_id);
```

### 6.2 审计日志示例

```
task=test-e2e-001 | pending→claimed | mode=llm_route
  → zhangfei-dev (confidence=0.95, reason="编码任务匹配coding能力")
  → model=zhipu/glm-5.1, latency=1200ms

task=test-e2e-001 | working→review | mode=agent_handoff
  → simayi-challenger (reason="执行者handoff: 需要review")
  → latency=2ms

task=test-e2e-001 | review→done | mode=agent_handoff
  → pangtong-fujunshi (reason="审查通过，交接给协调者收尾")
  → latency=1ms
```

---

## 7. 路由模型配置

### 7.1 后端配置

```yaml
# config/default.yaml 新增
routing:
  model: "zhipu/glm-5.1"     # 默认路由模型
  api_base: ""                # 空=用 OpenClaw Gateway
  api_key: ""                 # 空=用 OpenClaw 默认
  confidence_threshold: 0.7   # 低于此值 fallback
  max_tokens: 200
  temperature: 0.1
```

### 7.2 前端配置入口

在现有 `ModelConfig.tsx` 页面顶部新增"路由模型"配置区域：

```
┌─────────────────────────────────────────────┐
│ 🎯 路由模型（Control Unit）                  │
│ ┌─────────────────────┐ ┌────┐              │
│ │ zhipu/glm-5.1     ▾ │ │应用│              │
│ └─────────────────────┘ └────┘              │
│ 任务路由使用的 LLM（推荐轻量快速模型）         │
├─────────────────────────────────────────────┤
│ 🐦 庞统  pangtong-fujunshi                  │
│ 当前: zhipu/glm-5.1                         │
│ ...                                         │
```

- 模型下拉列表复用 OpenClaw 已注册的 `knownModels`（和 Agent 模型选的是同一个数据源）
- 通过后端 API `PATCH /api/config/routing-model` 保存
- 调用 `api.setModel` 同理，走 Gateway 模型配置

### 7.3 API

```python
# blackboard_routes.py 新增
@api_route("GET", "/api/config/routing")
def get_routing_config(request):
    return {"model": config.routing.model,
            "confidence_threshold": config.routing.confidence_threshold}

@api_route("PATCH", "/api/config/routing")
def set_routing_config(request):
    new_model = request.json.get("model")
    # 校验模型在 OpenClaw 已注册模型列表中
    config.routing.model = new_model
    config.save()
    return {"ok": True}
```

---

## 8. 改动清单

### 8.1 数据模型

| 变更 | 类型 | 说明 |
|------|------|------|
| 新增 `agent_profiles` 配置段 | 配置 | 每个 Agent 声明能力列表 |
| 新增 `routing` 配置段 | 配置 | 路由模型 + 参数 |
| tasks 新增 `next_capability` 字段 | DDL | Agent 声明下一步需要的能力 |
| tasks 新增 `previous_assignee` 字段 | DDL | 保存前一阶段执行者 |
| 新增 `routing_decisions` 表 | DDL | 路由审计日志 |
| `assignee` 语义变更 | 逻辑 | 从"任务负责人"改为"当前阶段执行者" |

### 8.2 代码

| 文件 | 变更 |
|------|------|
| `dispatcher.py` | 重写：新增 LLMDriver + Mode A/B/C 路由逻辑 |
| `config/default.yaml` | 新增 `agent_profiles` + `routing` 配置段 |
| `blackboard_routes.py` | status API 接受 `next_capability`；新增路由配置 API |
| `ticker.py` | 使用新 dispatcher；路由结果写 routing_decisions |
| `blackboard/db.py` | 新增 routing_decisions 表 DDL；tasks 表新增字段 |
| `ModelConfig.tsx` | 新增路由模型配置区域 |

### 8.3 不变

| 不变 | 原因 |
|------|------|
| 状态机（pending→claimed→working→review→done） | 状态流转语义正确 |
| Agent prompt 模板（S2） | Agent 仍按 4 步流程，只在 POST /status 时多传一个字段 |
| Spawner 逻辑 | spawn 机制不变 |
| 前端 Dashboard 核心布局 | 只在 ModelConfig 加一个区域 |

---

## 9. 和现有实践的对标

| 实践 | 本方案对应 |
|------|----------|
| bMAS Control Unit（LLM 驱动） | Mode A: LLMDriver 实现，轻量 API 调用 |
| Azure Handoff（Agent 交接） | Mode B: next_capability + handoff_note |
| 自主选择（arXiv 2510.01285） | Mode C: 未来演进，数据结构预留 |
| MasRouter（confidence） | confidence 阈值 + fallback 机制 |
| Microsoft Conductor（确定性 + 动态混合） | 快速路径（确定性）+ LLM 路由（动态）分层 |
| 幻觉门控（Hermes） | LLM 输出合法性校验 + confidence 阈值 |
| "Agent 决策，Daemon 执行"（v2.6 原则） | Mode B 是最直接的实现：Agent 自己决定交接给谁 |

---

## 10. 演进路线

```
Phase 1（本次实现）: Mode A + Mode B
  - LLMDriver 路由（首次分配、异常场景）
  - Agent 声明式交接（最高频场景）
  - 路由审计表
  - 前端路由模型配置

Phase 2（未来）: Mode C
  - 同样的 agent_profiles 和 capabilities 数据结构
  - Daemon 广播需求 → Agent 自己 claim
  - 迁移成本极低（数据结构不变，只改消费方式）

Phase 3（更远）: 经验驱动的路由
  - 路由审计数据反哺 LLM prompt（历史匹配成功率）
  - Agent 可靠性评分（参考 MasRouter）
  - 动态能力发现（Agent 完成新类型任务后自动更新画像）
```

---

## 11. 司马懿评审要点

请重点关注：

1. **LLMDriver 的异常处理**：API 超时/失败时的 fallback 策略是否合理
2. **Mode B 的安全性**：Agent 声明 `next_capability` 时是否需要校验（防恶意指定）
3. **assignee 语义变更**的影响范围：是否有其他模块依赖"assignee = 任务负责人"
4. **routing_decisions 表设计**：字段是否充分，索引是否合理
5. **配置 API 的安全性**：修改路由模型是否需要鉴权
6. **性能影响**：Mode A 的 ~2s 延迟在 tick cycle 中是否可接受

---

## 12. 参考

- bMAS: arXiv 2507.01701 — Blackboard LLM Multi-Agent System
- Self-Selection: arXiv 2510.01285 — Agent 自主选择模式
- MasRouter: arXiv 2601.04861 — Confidence-Aware Routing
- AgentGate: arXiv 2604.06696 — 结构化路由引擎
- Microsoft Conductor: github.com/microsoft/conductor — 确定性编排
- Azure Agent Patterns: learn.microsoft.com — Handoff 模式
- v2.6 调研报告: docs/research/shared-consciousness-research.md
- v2.6 架构设计: docs/design/architecture-v2.6.md
- T3-10 调度判据: docs/design/topic3-challenge-review-proposal.md §5.4