auto-sync: 2026-05-15 12:07:12

2026-05-15 12:07:12 +08:00
parent ba13a28016
commit 1b1fe1e54c
1 changed files with 14 additions and 4 deletions
@@ -15,6 +15,7 @@
 | v2.6 | 2026-05-15 | **架构重构**:Shared Workspace(Blackboard)取代 DAG 引擎为编排核心 |
 | v2.6.1 | 2026-05-15 | 司马懿评审反馈 + Mail 退役决策 + 质量门控 + 决策记录 + 工程修正 |
 | v2.6.2 | 2026-05-15 | 课题1设计决策：三层执行模型、续杯机制、AI驱动retry、Guardrail体系、must_haves三件套、分级审查矩阵 |
+| v2.6.2.1 | 2026-05-15 | 司马懿评审反馈：L2/L3区分标准、timeout修正、outputs关联attempt、Scope Guard异步、risk_level自动 |

 ---

@@ -202,6 +203,7 @@ CREATE TABLE IF NOT EXISTS outputs (
    content_path TEXT,                      -- 文件路径(产出物在 task 目录下)
    summary TEXT,                           -- 一句话摘要
    metadata TEXT,                          -- JSON: {files_changed, lines_added, ...}
+    attempt_number INTEGER DEFAULT 1,       -- 关联 task_attempts.attempt_number
    created_at TEXT NOT NULL DEFAULT (datetime('now')),

    FOREIGN KEY (task_id) REFERENCES tasks(id)
@@ -465,6 +467,12 @@ Daemon **不做**：
 - L2 的 sub 是一次性、单任务的（"帮我检查这个输出是否在 scope 内"），执行完就退出
 - L3 的 agent 是完整的黑板参与者（读全局、自主决策、写回多个表）

+**L2 与 L3 的区分标准**：是否读黑板全局。
+- L2：不读黑板全局上下文，只拿当前任务的特定字段做判断。spawn 时传递局部数据（如 scope_declaration 文本 + task.truths），sub 返回结果后退出。
+- L3：读黑板全局（tasks + comments + outputs + decisions + observations），做全局决策。spawn 时只传任务 ID + 触发原因，Agent 自己读黑板。
+
+这个区分决定了 spawn 时的消息内容——L2 传数据，L3 传指针。
+
 ### 4.2 Daemon Tick 循环

 参考 Hermes Dispatcher,但更轻量:
@@ -595,8 +603,8 @@ def build_spawn_message(task_id: str, trigger_reason: str, comments_since: str =
 |------|-------------|------|------|
 | Agent 有进展 | 黑板有新 observations | 不干预（无限续） | L1 |
 | Agent 没进展但 session 活跃 | 无新 observations 但 session 还在 | 不干预（可能正在思考） | L1 |
-| timeout（session idle）+ 产出达标 | session idle + outputs 表有内容 | 幻觉门控验证产出 → 通过则继续流转 | L1→L2 |
-| timeout + 产出不达标 | session idle + outputs 为空 | L2 spawn sub 发 reminder 让 Agent 继续（假死处理） | L2 |
+| timeout（agent run 返回超时）+ 产出达标 | agent run 返回超时 + outputs 表有内容 | 幻觉门控验证产出 → 通过则继续流转 | L1→L2 |
+| timeout（agent run 返回超时）+ 产出不达标 | agent run 返回超时 + outputs 为空 | L2 spawn sub 发 reminder 让 Agent 继续（假死处理） | L2 |
 | timeout + 产出不达标 + reminder 后仍无进展 | 二次 timeout | 回收到 pending，记录 failure_detail | L1 |
 | 非timeout 错误（进程退出） | 进程已死 | 进入 AI 纠错流程 | L3 |
 | 硬上限超时 | working 状态超过 3x 预估工时 | 强制回收，记录事件 | L1 |
@@ -634,6 +642,8 @@ def build_spawn_message(task_id: str, trigger_reason: str, comments_since: str =
 | 司马懿 | 内容类失败（评审不通过） | reviews 表（verdict=needs_revision + issues） |
 | 庞统 | 方向类失败（需求偏离） | decisions 表（重规划原因） |

+| Agent（重试时） | 新 attempt 的产出 | outputs 表（带 attempt_number） |
+
 **Agent 重试时能看到什么**：黑板上的 events（失败记录）+ reviews（评审意见）+ comments（讨论）。全部在黑板上，spawn 时自然读到。

 **设计推导**：
@@ -651,7 +661,7 @@ def build_spawn_message(task_id: str, trigger_reason: str, comments_since: str =

 | Guardrail | 触发时机 | 检测方式 | 发现问题后 |
 |-----------|---------|---------|-----------|
-| **Scope Guard** | Agent 写 scope_declaration 时 | L2 sub 对比 scope_declaration vs task.truths | 写 observation（severity=warning）→ Daemon 触发后续判断链 |
+| **Scope Guard** | Agent claim 任务后在工作过程中写 decisions（scope 相关）时 | L2 sub 异步对比 scope_declaration vs task.truths | 写 observation（severity=warning），Daemon 下次 tick 触发庞统判断 |
 | **Output Guard** | Agent 写 output 时 | L1 机械检查（文件存在、格式正确、字段非空）+ L2 语义检查 | 机械失败直接打回，语义问题写 observation |
 | **Format Guard** | Agent 写任何结构化数据时 | L1 JSON Schema 校验 | 格式错误直接打回重做 |

@@ -979,7 +989,7 @@ Agent 执行过程中的每个关键决策都必须记录在黑板的 decisions
 | **低风险** | 调研报告、文档更新、日志查看 | 一阶段（Output Guardrail 机械检查） | Daemon 自动 |
 | **调研** | 技术调研、方案探索 | 一阶段（庞统确认方向） | 庞统 |

-**风险等级由庞统在创建任务时标注**（task_type 字段），Daemon 据此决定审查流程。
+**风险等级**：庞统创建任务时标注。默认值为 `standard`。庞统的 Skill 中内置规则：创建 task_type 为 `strategy` 或 `deploy` 的任务时自动设为 `high`，`research` 类型自动设为 `research`。无需庞统手动判断。

 **设计推导**：
 - v1.0 实践：每个节点都要司马懿审查，简单任务过重