[moz] docs: §18 Mail Handler Verify/Prompt 强化设计

2026-06-16 12:47:04 +08:00
parent 627982db09
commit f1e513cba2
1 changed files with 215 additions and 2 deletions
@@ -1,10 +1,11 @@
 ---
 title: "TaskTypeRegistry + Handler 架构重构"
 created: 2026-06-10
-version: v3.0
+version: v3.1
 ---

 > 状态: ✅ 已完成（Step 1-5 全部合并，394 passed）
+> v3.1 新增 §18：Mail Handler Verify/Prompt 强化（2026-06-16，进行中）

 # §1 现状分析（v3.0 更新说明：§1-§13 保留原样，新增 §14-§18，更新 §3/§5/§7）

@@ -985,7 +986,219 @@ handler.post_complete(task_id, agent_id, outcome, db_path)

 ---

-## §14. Mail 失败通知机制
+## §18. Mail Handler Verify/Prompt 强化
+
+> 日期：2026-06-16 | 作者：庞统 | 状态：方向 1-5 全部已确认
+
+## 18.1 问题背景
+
+### 触发事件
+
+2026-06-12 daemon 重启后，_mail DB 中积压的 E2E 测试遗留邮件（5/18~6/1 创建，type=request，performative="text"）被 dispatch 给 agent。agent 正常处理并输出文本（如"已阅，无需处理"），但 `verify_completion` 判定 no_reply → 标 failed → 触发 `notify_mail_failed` → 产生 38 封 `[投递失败]` 通知邮件，每 ~2.5 分钟一轮，持续 10 轮。
+
+### 根因链
+
+```
+E2E 测试脚本 bug（type="text"）
+  → mail_routes.py 不校验 type 值，直接透传
+  → performative="text" ≠ "inform" → 走 _check_reply
+  → _check_reply 查 in_reply_to task，agent 没用 Mail API 回复
+  → verify 失败 → on_failure 标 failed
+  → notify_mail_failed 发 [投递失败] 通知
+  → 通知本身也是 task，循环触发
+```
+
+### 三种 handler verify 对比
+
+| 维度 | TaskHandler | MailHandler | ToolchainHandler |
+|------|------------|-------------|------------------|
+| verify 信号 | output / comment(≥50字) / terminal_status（三信号） | in_reply_to task（单信号） | action_report / output / comment(≥20字)（三层 fallback） |
+| inform 处理 | N/A | 直接通过（不检查执行证据） | N/A |
+| verify 失败后 | **留 working**（覆盖 post_complete） | **标 failed**（base post_complete + mail on_failure） | 标 failed（base post_complete + tc on_failure） |
+| agent 输出持久化 | 靠 agent 主动 POST output/comment | **无**（agent 输出只在内存） | 靠 agent 主动 POST action_report |
+
+**关键发现**：
+1. MailHandler 继承 BaseTaskHandler，未覆盖 `post_complete` → verify 失败时走 base 的 `on_failure` → 标 failed
+2. TaskHandler 覆盖了 `post_complete` → verify 失败时留 working，让 ticker 重试
+3. MailHandler 的 verify 只有 `in_reply_to` 一条路径，没有 fallback
+4. inform 类型直接通过（`VerifyResult(True)`），不检查任何执行证据——inform 是"无需回复"不是"无需检查"
+5. E2E 测试用 `TestClient(app)` 写生产 `_mail DB`，且测试脚本用了非标准 `type="text"`
+
+## 18.2 修复方向
+
+### 方向 1：mail verify 对齐 toolchain 模式（✅ 已确认）
+
+**问题**：mail verify 只有 in_reply_to task 一条路径。task/toolchain 都有多层 fallback（outputs / comments）。
+
+**方案**：mail 对齐 toolchain 模式——prompt 加 action report 要求，verify 优先查 action_report → fallback outputs → fallback comments。in_reply_to 回复邮件从唯一信号降为 request 类型的第 4 优先级信号。
+
+#### prompt 强化（MailApiSection）
+
+参照 ToolchainApiSection，在 mail prompt 中追加 action report 要求：
+```
+### 完成后必须提交 action report
+执行完邮件处理后，必须提交 action report：
+curl -s -X POST "http://localhost:8083/api/projects/_mail/tasks/{task_id}/comments" \
+  -H "Content-Type: application/json" \
+  -d '{"author": "{agent_id}", "comment_type": "action_report", "body": "处理结果摘要"}'
+
+⚠️ 不提交 action report 的任务会被标记为 failed。
+```
+
+#### verify 改造（MailHandler.verify_completion）
+
+```python
+def verify_completion(self, task_id, db_path) -> VerifyResult:
+    performative = self._parse_performative(task_id, db_path)
+
+    # 1. 优先检查 action_report comment（所有类型通用）
+    if self._has_action_report(task_id, db_path):
+        return VerifyResult(True, "has_action_report", "action_report found")
+
+    # 2. fallback: outputs
+    if self._has_outputs(task_id, db_path):
+        return VerifyResult(True, "has_output", f"output_count={count}")
+
+    # 3. fallback: 有实质内容的 comment（≥20字，非 system）
+    if self._has_comment(task_id, db_path):
+        return VerifyResult(True, "has_comment", f"comment_count={count}")
+
+    # 4. request 特有：检查 in_reply_to 回复邮件
+    if performative == "request":
+        if self._check_reply(task_id, db_path):
+            return VerifyResult(True, "has_reply", "in_reply_to found")
+
+    return VerifyResult(False, "no_action",
+                        "no action_report, no output, no comment, no reply")
+```
+
+注意：action_report 提交到 moziplus DB（comments 表），不是 Gitea。Gitea comment 是跨 agent 协作用的，不是 verify 检查的依据。
+
+### 方向 2：prompt 约束强化（✅ 已确认）
+
+**问题**：当前 mail prompt 只给了 curl 示例，没有硬约束要求 agent 必须输出处理结果。agent 判断"已阅"后直接跳过，不创建 in_reply_to task。
+
+**方案**：mail request/inform prompt 加 JSON 输出约束（参考 toolchain 的 Red Flags 模式）。
+
+#### MailContextSection 强化
+
+**request 类型**追加：
+```
+### 输出要求
+- 你的回复必须包含对邮件的实际处理结果
+- 如果是第一次收到：正常处理，输出处理结果
+- 如果是重复邮件（你之前处理过相同 ID 的邮件）：输出"此前已处理" + 之前的处理结果摘要
+- ⚠️ "已阅""无需处理"不是有效处理结果
+```
+
+**inform 类型**追加：
+```
+### 输出要求
+- 你的回复必须确认已处理（读取/执行/记录），不能只说"已阅"
+- 如果是重复邮件：输出"此前已处理" + 处理结果摘要
+- ⚠️ "已阅"不是有效输出
+```
+
+**MailConstraintsSection** 追加 Red Flags：
+```
+| Agent 想法 | Red Flag 驳回 |
+|------------|--------------|
+| "已阅即可" | ❌ 错！必须输出处理结果或确认执行 |
+| "重复邮件忽略" | ❌ 错！输出"此前已处理" + 结果摘要 |
+| "无需回复" | ❌ 错！request 必须回复，inform 必须确认处理 |
+```
+
+### 方向 3：inform 也要检查执行证据（✅ 已确认）
+
+**问题**：当前 inform verify 直接返回 `VerifyResult(True)`，不检查任何执行证据。inform 是"无需回复"不是"无需检查"。
+
+**方案**：inform verify 改为检查 agent 是否有实质输出（comment/output），和 request 走不同的验证路径但都需要验证。
+
+**改动文件**：`src/daemon/mail_handler.py` `verify_completion` 方法
+
+### 方向 4：verify 失败保持 working（✅ 已确认）
+
+**问题**：MailHandler 继承 BaseTaskHandler，verify 失败时走 base 的 `on_failure` → 标 failed。而 TaskHandler 覆盖了 `post_complete`，verify 失败时留 working。
+
+**原始设计意图**（§2 设计文档）："不通过 → 留 working，ticker 重查（最多 3 次，然后标 failed）"。
+
+**方案**：MailHandler 覆盖 `post_complete`，verify 失败时不标 failed，保持 working。ticker 的 `_check_timeouts` 超时兜底：
+- `check_completion` 通过（有回复）→ done
+- `check_completion` 不通过 → 超时后标 failed
+- Runaway Guard（§15 dispatch_count ≥ 10）兜底防止无限循环
+
+**改动文件**：`src/daemon/mail_handler.py`，新增 `post_complete` 覆盖
+
+### 方向 5：type 校验 + E2E 修复 + DB 清理（✅ 已确认）
+
+#### 5.1 mail_routes.py type 校验
+
+**问题**：`mail_type = body.get("type")` 直接透传，传什么存什么。`"text"` 不是标准值。
+
+**方案**：创建时校验 type 只允许 `inform` / `request`，非法值默认 `request`。
+
+```python
+mail_type = body.get("type")
+if mail_type is None:
+    mail_type = "inform" if in_reply_to else "request"
+elif mail_type not in ("inform", "request"):
+    # 非标准值，校正为默认值
+    mail_type = "inform" if in_reply_to else "request"
+```
+
+**改动文件**：`src/api/mail_routes.py`
+
+#### 5.2 _parse_performative 容错
+
+**问题**：`meta.get("performative", meta.get("type", "request"))` 当 performative="text" 时返回 "text"，不等于 "inform" → 走 _check_reply。
+
+**方案**：只认 `inform` 和 `request` 两个值，其他一律当 `request`。
+
+```python
+def _parse_performative(self, task_id, db_path) -> str:
+    raw = meta.get("performative", meta.get("type", "request"))
+    if raw == "inform":
+        return "inform"
+    return "request"  # 非标准值一律当 request
+```
+
+**改动文件**：`src/daemon/mail_handler.py` `_parse_performative` 方法
+
+#### 5.3 E2E 测试修复
+
+**问题**：`tests/e2e/test_e2e_v27.py` 用 `type="text"` 创建测试邮件，且用 `TestClient(app)` 写生产 `_mail DB`。
+
+**修复**：
+1. `type="text"` 全部改为 `type="inform"` 或 `type="request"`
+2. E2E 测试跑完后清理测试邮件（`mail_ids` 列表中记录的 task）
+
+**改动文件**：`tests/e2e/test_e2e_v27.py`
+
+#### 5.4 生产 DB 清理
+
+**问题**：生产 `_mail DB` 中残留大量 E2E 测试邮件（5/18~6/3 创建的"筛选测试""详情测试""已读测试""任务分配"等）。
+
+**方案**：手动清理这些测试残留（一次性操作，不需要代码改动）。
+
+## 18.3 影响范围
+
+| 文件 | 改动类型 | 影响面 |
+|------|---------|--------|
+| `src/daemon/mail_handler.py` | verify + post_complete + prompt section | MailHandler 核心逻辑 |
+| `src/api/mail_routes.py` | type 校验 | Mail API 创建入口 |
+| `tests/e2e/test_e2e_v27.py` | type 值修正 + 清理 | E2E 测试 |
+| 生产 `_mail DB` | 清理测试残留 | 一次性操作 |
+
+## 18.4 验证计划
+
+1. 单元测试：mail_handler verify/prompt 变更
+2. 集成测试：mail dispatch → verify → done/working 全链路
+3. 回归测试：`pytest -m "not e2e"` 全量
+4. 手工验证：创建 inform/request 邮件，确认 verify 行为正确
+
+---
+
+# §14. Mail 失败通知机制

 ### 20.1 背景