[moz] docs: §18 Mail Handler Verify/Prompt 强化设计

2026-06-16 12:47:04 +08:00
parent 627982db09
commit f1e513cba2
1 changed files with 215 additions and 2 deletions
@@ -1,10 +1,11 @@
 ---
 title: "TaskTypeRegistry + Handler 架构重构"
 created: 2026-06-10
-version: v3.0
+version: v3.1
 ---
 > 状态: ✅ 已完成（Step 1-5 全部合并，394 passed）
 > v3.1 新增 §18：Mail Handler Verify/Prompt 强化（2026-06-16，进行中）
 # §1 现状分析（v3.0 更新说明：§1-§13 保留原样，新增 §14-§18，更新 §3/§5/§7）
@@ -985,7 +986,219 @@ handler.post_complete(task_id, agent_id, outcome, db_path)
 ---
-## §14. Mail 失败通知机制
+## §18. Mail Handler Verify/Prompt 强化
 > 日期：2026-06-16 | 作者：庞统 | 状态：方向 1-5 全部已确认
 ## 18.1 问题背景
 ### 触发事件
 2026-06-12 daemon 重启后，_mail DB 中积压的 E2E 测试遗留邮件（5/18~6/1 创建，type=request，performative="text"）被 dispatch 给 agent。agent 正常处理并输出文本（如"已阅，无需处理"），但 `verify_completion` 判定 no_reply → 标 failed → 触发 `notify_mail_failed` → 产生 38 封 `[投递失败]` 通知邮件，每 ~2.5 分钟一轮，持续 10 轮。
 ### 根因链
 ```
 E2E 测试脚本 bug（type="text"）
  → mail_routes.py 不校验 type 值，直接透传
  → performative="text" ≠ "inform" → 走 _check_reply
  → _check_reply 查 in_reply_to task，agent 没用 Mail API 回复
  → verify 失败 → on_failure 标 failed
  → notify_mail_failed 发 [投递失败] 通知
  → 通知本身也是 task，循环触发
 ```
 ### 三种 handler verify 对比
 | 维度 | TaskHandler | MailHandler | ToolchainHandler |
 |------|------------|-------------|------------------|
 | verify 信号 | output / comment(≥50字) / terminal_status（三信号） | in_reply_to task（单信号） | action_report / output / comment(≥20字)（三层 fallback） |
 | inform 处理 | N/A | 直接通过（不检查执行证据） | N/A |
 | verify 失败后 | **留 working**（覆盖 post_complete） | **标 failed**（base post_complete + mail on_failure） | 标 failed（base post_complete + tc on_failure） |
 | agent 输出持久化 | 靠 agent 主动 POST output/comment | **无**（agent 输出只在内存） | 靠 agent 主动 POST action_report |
 **关键发现**：
 1. MailHandler 继承 BaseTaskHandler，未覆盖 `post_complete` → verify 失败时走 base 的 `on_failure` → 标 failed
 2. TaskHandler 覆盖了 `post_complete` → verify 失败时留 working，让 ticker 重试
 3. MailHandler 的 verify 只有 `in_reply_to` 一条路径，没有 fallback
 4. inform 类型直接通过（`VerifyResult(True)`），不检查任何执行证据——inform 是"无需回复"不是"无需检查"
 5. E2E 测试用 `TestClient(app)` 写生产 `_mail DB`，且测试脚本用了非标准 `type="text"`
 ## 18.2 修复方向
 ### 方向 1：mail verify 对齐 toolchain 模式（✅ 已确认）
 **问题**：mail verify 只有 in_reply_to task 一条路径。task/toolchain 都有多层 fallback（outputs / comments）。
 **方案**：mail 对齐 toolchain 模式——prompt 加 action report 要求，verify 优先查 action_report → fallback outputs → fallback comments。in_reply_to 回复邮件从唯一信号降为 request 类型的第 4 优先级信号。
 #### prompt 强化（MailApiSection）
 参照 ToolchainApiSection，在 mail prompt 中追加 action report 要求：
 ```
 ### 完成后必须提交 action report
 执行完邮件处理后，必须提交 action report：
 curl -s -X POST "http://localhost:8083/api/projects/_mail/tasks/{task_id}/comments" \
  -H "Content-Type: application/json" \
  -d '{"author": "{agent_id}", "comment_type": "action_report", "body": "处理结果摘要"}'
 ⚠️ 不提交 action report 的任务会被标记为 failed。
 ```
 #### verify 改造（MailHandler.verify_completion）
 ```python
 def verify_completion(self, task_id, db_path) -> VerifyResult:
    performative = self._parse_performative(task_id, db_path)
    # 1. 优先检查 action_report comment（所有类型通用）
    if self._has_action_report(task_id, db_path):
        return VerifyResult(True, "has_action_report", "action_report found")
    # 2. fallback: outputs
    if self._has_outputs(task_id, db_path):
        return VerifyResult(True, "has_output", f"output_count={count}")
    # 3. fallback: 有实质内容的 comment（≥20字，非 system）
    if self._has_comment(task_id, db_path):
        return VerifyResult(True, "has_comment", f"comment_count={count}")
    # 4. request 特有：检查 in_reply_to 回复邮件
    if performative == "request":
        if self._check_reply(task_id, db_path):
            return VerifyResult(True, "has_reply", "in_reply_to found")
    return VerifyResult(False, "no_action",
                        "no action_report, no output, no comment, no reply")
 ```
 注意：action_report 提交到 moziplus DB（comments 表），不是 Gitea。Gitea comment 是跨 agent 协作用的，不是 verify 检查的依据。
 ### 方向 2：prompt 约束强化（✅ 已确认）
 **问题**：当前 mail prompt 只给了 curl 示例，没有硬约束要求 agent 必须输出处理结果。agent 判断"已阅"后直接跳过，不创建 in_reply_to task。
 **方案**：mail request/inform prompt 加 JSON 输出约束（参考 toolchain 的 Red Flags 模式）。
 #### MailContextSection 强化
 **request 类型**追加：
 ```
 ### 输出要求
 - 你的回复必须包含对邮件的实际处理结果
 - 如果是第一次收到：正常处理，输出处理结果
 - 如果是重复邮件（你之前处理过相同 ID 的邮件）：输出"此前已处理" + 之前的处理结果摘要
 - ⚠️ "已阅""无需处理"不是有效处理结果
 ```
 **inform 类型**追加：
 ```
 ### 输出要求
 - 你的回复必须确认已处理（读取/执行/记录），不能只说"已阅"
 - 如果是重复邮件：输出"此前已处理" + 处理结果摘要
 - ⚠️ "已阅"不是有效输出
 ```
 **MailConstraintsSection** 追加 Red Flags：
 ```
 | Agent 想法 | Red Flag 驳回 |
 |------------|--------------|
 | "已阅即可" | ❌ 错！必须输出处理结果或确认执行 |
 | "重复邮件忽略" | ❌ 错！输出"此前已处理" + 结果摘要 |
 | "无需回复" | ❌ 错！request 必须回复，inform 必须确认处理 |
 ```
 ### 方向 3：inform 也要检查执行证据（✅ 已确认）
 **问题**：当前 inform verify 直接返回 `VerifyResult(True)`，不检查任何执行证据。inform 是"无需回复"不是"无需检查"。
 **方案**：inform verify 改为检查 agent 是否有实质输出（comment/output），和 request 走不同的验证路径但都需要验证。
 **改动文件**：`src/daemon/mail_handler.py` `verify_completion` 方法
 ### 方向 4：verify 失败保持 working（✅ 已确认）
 **问题**：MailHandler 继承 BaseTaskHandler，verify 失败时走 base 的 `on_failure` → 标 failed。而 TaskHandler 覆盖了 `post_complete`，verify 失败时留 working。
 **原始设计意图**（§2 设计文档）："不通过 → 留 working，ticker 重查（最多 3 次，然后标 failed）"。
 **方案**：MailHandler 覆盖 `post_complete`，verify 失败时不标 failed，保持 working。ticker 的 `_check_timeouts` 超时兜底：
 - `check_completion` 通过（有回复）→ done
 - `check_completion` 不通过 → 超时后标 failed
 - Runaway Guard（§15 dispatch_count ≥ 10）兜底防止无限循环
 **改动文件**：`src/daemon/mail_handler.py`，新增 `post_complete` 覆盖
 ### 方向 5：type 校验 + E2E 修复 + DB 清理（✅ 已确认）
 #### 5.1 mail_routes.py type 校验
 **问题**：`mail_type = body.get("type")` 直接透传，传什么存什么。`"text"` 不是标准值。
 **方案**：创建时校验 type 只允许 `inform` / `request`，非法值默认 `request`。
 ```python
 mail_type = body.get("type")
 if mail_type is None:
    mail_type = "inform" if in_reply_to else "request"
 elif mail_type not in ("inform", "request"):
    # 非标准值，校正为默认值
    mail_type = "inform" if in_reply_to else "request"
 ```
 **改动文件**：`src/api/mail_routes.py`
 #### 5.2 _parse_performative 容错
 **问题**：`meta.get("performative", meta.get("type", "request"))` 当 performative="text" 时返回 "text"，不等于 "inform" → 走 _check_reply。
 **方案**：只认 `inform` 和 `request` 两个值，其他一律当 `request`。
 ```python
 def _parse_performative(self, task_id, db_path) -> str:
    raw = meta.get("performative", meta.get("type", "request"))
    if raw == "inform":
        return "inform"
    return "request"  # 非标准值一律当 request
 ```
 **改动文件**：`src/daemon/mail_handler.py` `_parse_performative` 方法
 #### 5.3 E2E 测试修复
 **问题**：`tests/e2e/test_e2e_v27.py` 用 `type="text"` 创建测试邮件，且用 `TestClient(app)` 写生产 `_mail DB`。
 **修复**：
 1. `type="text"` 全部改为 `type="inform"` 或 `type="request"`
 2. E2E 测试跑完后清理测试邮件（`mail_ids` 列表中记录的 task）
 **改动文件**：`tests/e2e/test_e2e_v27.py`
 #### 5.4 生产 DB 清理
 **问题**：生产 `_mail DB` 中残留大量 E2E 测试邮件（5/18~6/3 创建的"筛选测试""详情测试""已读测试""任务分配"等）。
 **方案**：手动清理这些测试残留（一次性操作，不需要代码改动）。
 ## 18.3 影响范围
 | 文件 | 改动类型 | 影响面 |
 |------|---------|--------|
 | `src/daemon/mail_handler.py` | verify + post_complete + prompt section | MailHandler 核心逻辑 |
 | `src/api/mail_routes.py` | type 校验 | Mail API 创建入口 |
 | `tests/e2e/test_e2e_v27.py` | type 值修正 + 清理 | E2E 测试 |
 | 生产 `_mail DB` | 清理测试残留 | 一次性操作 |
 ## 18.4 验证计划
 1. 单元测试：mail_handler verify/prompt 变更
 2. 集成测试：mail dispatch → verify → done/working 全链路
 3. 回归测试：`pytest -m "not e2e"` 全量
 4. 手工验证：创建 inform/request 邮件，确认 verify 行为正确
 ---
 # §14. Mail 失败通知机制
 ### 20.1 背景