sanguo/sanguo_moziplus_v2

Fork 0

Files

T

cfdaily 98eb15125d

CI / lint (pull_request) Successful in 6s

Details

CI / test (pull_request) Successful in 9s

Details

CI / notify-on-failure (pull_request) Successful in 0s

Details

chore(docs): 归档 §20 审查文档至 archive-3.0，追加审查历史

- review-v3-vs-head-pangtong.md → archive-3.0/
- review-v3-vs-head-simayi.md → archive-3.0/
- step5-audit-report.md → archive-3.0/
- step5-impact-analysis.md → archive-3.0/
- §20 新增 §19 审查与验证历史（关键发现+修复状态汇总）

2026-06-13 08:49:41 +08:00

6.6 KiB

Raw Permalink Blame History

Step 5 双重审计报告

摘要

设计一致性检查项: 8
特殊逻辑覆盖检查项: 22
一致/覆盖: 24
偏差/遗漏: 6（严重 3 / 轻微 3）

偏差/遗漏清单

#	维度	设计要求 / 旧逻辑	代码实际	严重程度	建议
D1	B1.2 pre_spawn	旧 `_mail_on_checks_passed`: `if not _mail_auto_working(): raise RuntimeError` — pre_spawn 失败时中止 spawn	新 `_handler_on_checks_passed`: `_handler.pre_spawn(...)` 返回值未检查，`handler_marked_working = True` 无条件执行	严重	改为 `if not _handler.pre_spawn(...): raise RuntimeError("handler_pre_spawn_failed")`
D2	B3.1 PromptContext	旧 `_build_mail_prompt` 从 must_haves JSON 解析 `from_agent` 和 `performative` 传入模板	新 `spawner._build_spawn_message` 构建 PromptContext 时缺少 `from_agent` 和 `mail_type`，均为空字符串	严重	从 `must_haves` JSON 提取 `from` 和 `performative` 填入 PromptContext
D3	B1.3 inform outcome 白名单	旧 `_mail_auto_complete`: inform 类型有 outcome 白名单 `{"completed", "claimed", "no_reply"}`，不在白名单的 outcome 跳过 auto-done	新 `MailHandler.verify_completion`: inform 始终返回 True，不检查 outcome	轻微	CRASH_OUTCOMES 已被基类处理。剩余异常 outcome（session_revived/api_error/fallback_timeout）极少出现，且旧逻辑不标 done 只是等 ticker 重投，最终效果差异不大。但严格对齐需要加白名单检查
D4	A. 设计 §6 retry 逻辑	设计文档要求 retry 逻辑中 `handler = TaskTypeRegistry.get_by_project(project_id); if handler: return handler.build_retry_prompt(...)`	spawner L1118-1130 重试 prompt 仍用 `is_mail = project_id == "_mail"` 硬编码	轻微	当前不影响运行（旧的 `_build_mail_prompt` 仍保留且可用），但与设计文档不一致
D5	B1.5 _check_reply 语义差异	旧 `_mail_check_reply`: `SELECT id FROM tasks WHERE id != ? AND must_haves LIKE ?` — 检查是否有其他任务的 must_haves 包含当前 task_id（即 in_reply_to 匹配）	新 `MailHandler._check_reply`: `SELECT COUNT(*) FROM comments WHERE task_id=? AND author != 'daemon' AND comment_type != 'system'` — 检查当前任务是否有非系统 comment	严重	两个查询语义完全不同。旧逻辑检查的是 mail 表的回复任务（通过 must_haves 中 in_reply_to 关联），新逻辑检查的是当前任务的 comments。这可能导致 request 类型邮件的幻觉门控行为不同
D6	B1.3 标 done 重试机制	旧 `_mail_auto_complete`: 标 done 时外层有 `for attempt in range(3)` 循环	新 `BaseTaskHandler._mark_task_status`: H1 修复后已有 3 次重试	轻微	✅ 已修复，但注意旧代码标 done 和标 failed 是分开的重试循环，新代码统一走 `_mark_task_status`。行为等价

一致确认项

A. 设计一致性

#	维度	检查点	结果
A1	§6 dispatcher	classify_outcome 后调 handler.post_complete	✅ on_complete 闭包替换为 handler.post_complete
A2	§6 dispatcher	on_checks_passed → handler.pre_spawn	✅ _handler_on_checks_passed 调用 handler.pre_spawn（但返回值未检查，见 D1）
A3	§6 dispatcher	guardrail 跳过 → handler 判断	✅ `is_handler_task = handler is not None`
A4	§6 spawner	_build_prompt → handler.build_prompt	✅ handler 路径调用 handler.build_prompt(ctx)
A5	§6 spawner	_build_api_section → handler 查询	✅ handler 存在时 success_status 从 handler.target_success_status 获取
A6	§6 ticker	虚拟项目扫描 → registry.virtual_projects()	✅ 循环 `TaskTypeRegistry.virtual_projects()`
A7	§6 ticker	check_completion → handler.check_completion	✅ 超时检查中调 `handler.check_completion(task.id, db_path)`
A8	§6 兼容期	设计说"兼容期保留旧逻辑"	✅ 无 handler 的项目走旧路径（legacy_on_complete）

B. 特殊逻辑覆盖

#	维度	检查点	结果
B1	1.1 guardrail	handler 项目跳过，_general 等走 guardrail	✅
B2	1.2 _mail_auto_working	`BEGIN IMMEDIATE` + status 检查 + 标 working	✅ `_auto_mark_working` 完全一致
B3	1.3 request 无回复 → 标 failed + notify	✅ MailHandler.on_failure 调 `_mark_task_status(failed)` + `notify_mail_failed`
B4	1.4 _mail_revert_to_pending	spawn 失败回退 working → pending	✅ Exception handler 中有 `BEGIN IMMEDIATE` + 状态检查回退
B5	1.6 Task review verdict 读取	approved → done	✅ handle_review_complete
B6	1.6 Task review 非 approved → @mention assignee + 保持 review	✅ H3 修复后保持 review + INSERT comment with comment_type='review'
B7	1.6 Task executor 三信号验证	output/comment/terminal status → review	✅ verify_completion 完全一致
B8	1.7 Legacy dispatch 路径	handler 替代 is_mail_legacy	✅ handler_legacy 查注册表
B9	2.1 _transition_status assignee 清空	handler 项目不清空	✅
B10	2.2 跳过 claimed 状态	handler 项目跳过 claimed 直接 working	✅
B11	2.3 _dispatch_reviews 跳过	handler 项目不走 review	✅
B12	2.5 startup recovery	`_general` + virtual_projects()	✅ 不会重复扫描
B13	3.1 _build_api_section	handler 存在时正确获取 success_status	✅
B14	B4.1 TaskHandler.post_complete	区分 executor/review 流程	✅ 通过读 DB status 判断
B15	B4.2 MailHandler.post_complete	基类统一流程	✅
B16	B4.3 ToolchainHandler.post_complete	基类统一流程	✅
B17	B1.5 _check_reply 异常保守处理	旧: return True（保守）/ 新: return False	见 D5
B18	CRASH_OUTCOMES 集合	与旧 ROLLBACK_CURRENT_AGENT_OUTCOMES 一致	✅ 完全一致
B19	B2.1 _toolchain ticker 扫描	_toolchain 会被 ticker 扫描	✅ _toolchain 有 blackboard.db 时会被 tick_project 处理
B20	B2.3 handler 项目都跳过 claimed	_toolchain 也跳过	✅ 所有 handler 项目统一处理

修复优先级

优先级	#	修复内容
P0	D1	dispatcher _handler_on_checks_passed 检查 pre_spawn 返回值
P0	D2	spawner PromptContext 从 must_haves 提取 from_agent 和 mail_type
P0	D5	MailHandler._check_reply 恢复旧查询语义（检查 must_haves 中的 in_reply_to）
P1	D3	inform outcome 白名单（可选，影响极小）
P2	D4	retry prompt 用 handler 路径替代硬编码

6.6 KiB Raw Permalink Blame History Unescape Escape

Step 5 双重审计报告

摘要

偏差/遗漏清单

一致确认项

A. 设计一致性

B. 特殊逻辑覆盖

修复优先级

6.6 KiB

Raw Permalink Blame History