fix: _handle_pull_request 补充 synchronize action dispatch

姜维排查发现 _handle_pull_request 只处理 opened/closed， Gitea 发 pull_request + action=synchronize 时被静默丢弃。 _handle_pr_synchronize 已存在但未被 dispatch 到。修复：加 elif action == synchronize dispatch。 pull_request_sync 注册保留作为双保险。
docs: 设计文档编号重排(20→14, 24→15) + 已完成文档状态标注更新
2026-06-13 14:42:38 +08:00 · 2026-06-13 10:12:39 +08:00 · 2026-06-13 01:36:24 +00:00 · 2026-06-13 09:35:15 +08:00 · 2026-06-13 01:34:39 +00:00 · 2026-06-13 09:33:59 +08:00
26 changed files with 332 additions and 38 deletions
@@ -6,7 +6,7 @@
 **基于**: PRD-v3.0 §4 四相架构 + architecture-v3.0.md
 **作者**: 庞统（副军师）🐦
 **日期**: 2026-05-29
-**状态**: 实现完成，待 E2E 验证
+**状态**: ✅ 已完成（E2E 已验证通过）
 **评审**: 司马懿
 ---
@@ -2,7 +2,7 @@
 **日期**: 2026-05-30
 **作者**: 庞统
-**状态**: 已修订 v1.1（根据司马懿 2026-05-30 评审意见）
+**状态**: ✅ 已完成（spawner/ticker/dispatcher 全部 use_main_session=True）
 **前置**: `01-four-phase-loop.md`（四相循环 E2E 验证暴露 session 爆炸问题）
 ---
@@ -3,7 +3,7 @@
 > 版本: v1.1  
 > 日期: 2026-05-30  
 > 作者: 庞统（副军师）  
-> 状态: v1.1 修订（司马懿评审意见已纳入）  
+> 状态: ✅ 已完成（@mention + mention_queue 已实现）  
 > 前置: #02 Main Session + Delegation, #03 Prompt 进化  
 ---
@@ -3,7 +3,7 @@
 > 版本: v1.2  
 > 日期: 2026-06-03  
 > 作者: 庞统（副军师）  
-> 状态: 待评审（v1.2）  
+> 状态: ✅ 已完成（_startup_recover 7 个方法已实现）  
 > 前置: spawner-monitor-design.md §5 A0（Agent crash 恢复）  
 > 变更: v1.2 两个关键改进：(1) working→pending 保留 current_agent 让同一 agent 接手；(2) reviewing 精确恢复到前置状态而非硬推 done  
@@ -1,6 +1,6 @@
 # #07 Spawner Acquire-First 设计
-> 状态：#07.1 已实施 ✅ | #07.2 已实施 ✅ | #07.3 设计中
+> 状态：✅ 已完成（#07.1-#07.2 已实施）
 > 作者：庞统
 > 日期：2026-06-01
 > 评审：司马懿
@@ -233,9 +233,9 @@ def _revive_session(agent_id: str) -> bool:
                pass
 ```
-### 4.5 O5: compact 检测（§24 rotation-only v3）
+### 4.5 O5: compact 检测（§15 rotation-only v3）
-§24 设计文档：`docs/design/24-compact-detection-fix.md`
+§15 设计文档：`docs/design/24-compact-detection-fix.md`
 **检测方法**：读 gateway 日志尾部 2MB，按 sessionKey 过滤 `[compaction] rotated active transcript` 事件。
 如果最近的 rotation 事件在 120s 窗口内 → 视为 compact 循环进行中（可能还在 post-compact retry）。
@@ -243,7 +243,7 @@ def _revive_session(agent_id: str) -> bool:
 旧方法 `_check_recent_compaction_jsonl`（扫描 session jsonl 的 `type=compaction` 事件）保留作为 fallback。
 ```python
-# §24 v3: compact 检测优先用 gateway 日志 rotation 事件
+# §15 v3: compact 检测优先用 gateway 日志 rotation 事件
 if result["status"] not in ("idle", "unknown", None):
    session_key = f"agent:{agent_id}:main"
    result["recent_compact"] = AgentSpawner._check_compact_in_progress_gateway(
@@ -3,7 +3,7 @@
 > 版本: v1.1
 > 日期: 2026-06-03
 > 作者: 庞统（副军师）
-> 状态: 待评审（v1.1）
+> 状态: ✅ 已完成（rebuttal on_complete + goal gate 已实现）
 > 前置: #04 黑板协作（@mention）+ #08 Classify Outcome
 > 关联: T4 审查体系完善
 > 变更: v1.1 纳入司马懿评审反馈 — verdict 读 reviews 表 + rebuttal mention spawn 带 on_complete 回调
@@ -3,7 +3,7 @@
 > 版本: v1.1
 > 日期: 2026-06-03
 > 作者: 庞统（副军师）
-> 状态: 待终审（v1.1）
+> 状态: ✅ 已完成（SSE + TaskModal 自动刷新已实现）
 > 前置: #04 黑板协作（@mention + comment）
 > 关联: architecture-v3.0.md T3
 > 变更: v1.1 纳入司马懿评审反馈 — checkpoint SSE 触发文件修正为 checkpoint_routes.py，SSE payload 统一含 project_id
@@ -1,6 +1,6 @@
 # 三国团队工具链与开发流程设计
-> **状态**: v3.3 — #19 上下文四层改造合并 + CI 修复 + A13 修订
+> **状态**: ✅ 已完成（E2E 验证通过，所有 8 步 PASS）
 > **作者**: 庞统（副军师）🐦
 > **评审**: 司马懿（仲达）🗡️
 > **日期**: 2026-06-06
@@ -554,6 +554,16 @@ jobs:
 - revert 可能产生合并冲突 → 部署失败时人工介入
 - 数据库变更回滚需人工介入 → schema 变更必须向前兼容（只加字段不删/不改），违反此规范由 CI 检查拦截（或人工 Review 拦截）
 ### 7.6 部署成功通知（草案）
 > **状态**：草案，未实现。详细方案见 `archive-3.0/22-cd-production.md`。
 当前 deploy.yml 缺少部署成功后的 Mail 通知（CI 失败和 Deploy 失败通知已实现）。待实现方案：
 - deploy job 末尾追加通知 step
 - 查询 Gitea API 获取关联 PR 作者
 - 通过 Mail API 发送成功通知给 PR 作者 + pangtong-fujunshi
 - direct push 场景通知 jiangwei-infra + pangtong-fujunshi
 ---
 ## §8. 验证流程集成
@@ -4,6 +4,8 @@ created: 2026-06-10
 version: v3.0
 ---
 > 状态: ✅ 已完成（Step 1-5 全部合并，394 passed）
 # §1 现状分析（v3.0 更新说明：§1-§13 保留原样，新增 §14-§18，更新 §3/§5/§7）
 # §1 现状分析
@@ -950,7 +952,151 @@ handler.post_complete(task_id, agent_id, outcome, db_path)
 ---
-# §18 设计决策记录
+## §14. Mail 失败通知机制
 ### 20.1 背景
 Mail 是 A→B 点对点通信，失败应通知发件人 A，而非统一 @pangtong。
 当前机制（v1.3 已实现）：
 - `_mark_task("failed")` 对 _mail 项目：调用 `mail_notify.notify_mail_failed` 通知发件人
 - `_mark_task("failed")` 对 Task 项目：@pangtong-fujunshi（F2 原逻辑不变）
 - `_mail_auto_complete` 的 no_reply_found：标 failed 后通知发件人
 - 防递归：`must_haves.system_notify=true` 的邮件失败不再递归通知
 ### 20.2 失败场景与重试机制
 所有可能的失败路径及其重试/等待机制（重试上限 max_retries=3，agent_timeout=630s）：
 | 失败类型 | 机制 | 重试次数 | 每次耗时 | cooldown | 最长总耗时 |
 |---|---|---|---|---|---|
 | `gateway_timeout` | 续杯 | 3 | 630s | 无 | ~31.5 分钟 |
 | `crashed` | ticker 兜底 | 3 | ~2-5 分钟 | 60s + 30s ticker | ~15 分钟 |
 | `api_error`（rate_limit） | 推 pending（**待改为续杯**） | 3 | ~2.5 分钟 | 120s | ~8 分钟 |
 | `compact_interrupted` | 续杯 | 3 | 630s | 60s | ~34 分钟 |
 | `gateway_unreachable` | 续杯 | 3 | 630s | 60s | ~34 分钟 |
 | `lock_conflict` | 续杯 | 3 | 630s | 60s | ~34 分钟 |
 | `fallback_timeout` | 续杯(A3b) | 3 | 630s | 60s | ~34 分钟 |
 | `compact_wait` | monitor 等待 | 3 | 630s | 无 | ~31.5 分钟 |
 | `compact_hanging` | monitor → release | 3 | 630s | 300s | ~31.5 分钟 + ticker |
 | `max_monitor_timeouts` | monitor 上限 | 3 | 630s | 无 | ~31.5 分钟 |
 | `session_stuck` | revive 1 次 | 1 | ~30s | 无 | ~30 秒 |
 | `compact_failed` | 无重试 | 0 | — | 300s | 立刻 failed |
 | `auth_failed` | 无重试 | 0 | — | — | 立刻 failed |
 | `agent_error` | 无重试 | 0 | — | 300s | 立刻 failed |
 | `no_reply_found` | 无重试 | 0 | — | — | 立刻 failed |
 ### 20.3 触发点
 | 触发点 | 文件 | 说明 |
 |---|---|---|
 | `_mark_task(failed)` | spawner.py | _mail 项目 → notify_mail_failed；Task 项目 → @pangtong |
 | `_mail_auto_complete` no_reply_found | dispatcher.py | Agent 正常退出但没回复 request → 标 failed → 通知发件人 |
 ### 20.4 实现位置
 - `src/daemon/mail_notify.py`：`notify_mail_failed` + `_is_mail_project` + 通知模板
 - `src/daemon/spawner.py`：`_mark_task` 中 _mail/Task 分流
 - `src/daemon/dispatcher.py`：`_mail_auto_complete` 中 no_reply_found 后调 notify
 ### 20.5 通知设计（v2.0 — AI Native）
 通知提供充足事实信息，不做硬编码处理建议。收件 AI 自行判断下一步。
 **通知结构**：
 ```
 邮件投递失败通知
 📧 原始邮件：「{title}」
 👤 收件人：{to_agent}
 ❌ 失败原因：{reason_human_readable}（{reason_raw}）
 📊 重试情况：{attempt_info}
 📋 上下文信息：
 {detail_formatted}
 常见失败原因参考：
 • no_reply_found：收件人未回复（Agent 未能识别或处理此邮件）
 • crashed / max_crash_count：收件人处理时进程崩溃（已自动重试 3 次）
 • max_retries：续杯耗尽（已自动重试 3 次，共约 34 分钟）
 • max_api_retry_count：API 连续失败达上限（rate_limit/500/503）
 • max_monitor_timeouts：处理超时达上限（共约 31.5 分钟）
 • gateway_timeout：Agent 执行超时（已续杯重试）
 • session_stuck：Agent 会话假死（lock PID 死亡，revive 失败）
 • revive_failed：会话假死后恢复失败
 • auth_failed：Agent 认证失败（配置问题）
 • fallback_exhausted：主模型和备用模型均失败
 • agent_failed：收件人主动标记失败
 • compact_failed：上下文压缩失败
 • compact_hanging：上下文压缩长时间未完成（等待超 31.5 分钟）
 • compact_interrupted：上下文压缩被中断（已自动重试 3 次）
 • gateway_unreachable：Gateway 不可达（已自动重试 3 次）
 • lock_conflict：会话锁冲突（已自动重试 3 次）
 • 其他：建议排查系统日志
 ——系统自动通知
 ```
 **reason 人话翻译映射**：
 | reason_raw | reason_human_readable | detail 提取 |
 |---|---|---|
 | `no_reply_found` | 收件人未回复 | 无额外信息 |
 | `crashed` | 处理时进程崩溃 | stderr_preview 前 200 字 |
 | `max_crash_count` | 连续崩溃达上限 | count + stderr_preview |
 | `max_retries` | 续杯耗尽 | count + retry_field |
 | `max_api_retry_count` | API 连续失败达上限 | count |
 | `max_monitor_timeouts` | 处理超时达上限 | count + elapsed_seconds |
 | `gateway_timeout` | Agent 执行超时 | retry_count |
 | `session_stuck` | 会话假死 | stuck_count |
 | `revive_failed` | 假死后恢复失败 | stuck_count |
 | `auth_failed` | 认证失败 | stderr_preview |
 | `fallback_exhausted` | 模型全部失败 | fallback_count + fallback_reason |
 | `agent_failed` | 收件人主动标失败 | 无 |
 | `compact_failed` | 上下文压缩失败 | stderr_preview |
 | `compact_hanging` | 压缩长时间未完成 | compact_wait_count |
 | `compact_interrupted` | 压缩被中断 | 无 |
 | `gateway_unreachable` | Gateway 不可达 | stderr_preview |
 | `lock_conflict` | 会话锁冲突 | 无 |
 | 默认 | 未知原因 | reason + stderr_preview（如有） |
 **重试情况格式**：
 - 有重试：`"已自动重试 {count} 次，共耗时约 {total_time}"`
 - 无重试：`"无法重试（{reason_human_readable}）"`
 ### 20.6 防递归
 系统通知邮件（from=system）本身也可能失败：
 - 检查 `must_haves.system_notify=true` → 跳过递归通知
 - system 不是有效 Agent → 通知路由到 pangtong-fujunshi 代处理
 ### 20.7 待实现改动
 #### P1：api_error rate_limit 改为可恢复 retry
 **当前**：`_classify_outcome` 中 rate_limit/500/503 → `api_error`，`should_retry=False`，走推 pending 路径。
 **改为**：`should_retry=True`，走续杯路径。cooldown 60s。上限仍 3 次。
 **改动文件**：`src/daemon/spawner.py` `_classify_outcome` 的 `api_error` 分支。
 **影响**：`api_retry_count` 机制可以废弃（统一用 `retry_count`），但保持向后兼容暂不删除。
 #### P2：通知模板更新（v2.0）
 **当前**：`mail_notify.py` 的 `_NOTIFY_TEMPLATE` 是静态模板，不传 detail。
 **改为**：动态模板，根据 reason 选择人话翻译 + 提取 detail 信息 + 格式化重试情况。
 **改动文件**：`src/daemon/mail_notify.py`。
 **新增**：`_REASON_MAP` 字典（reason → 人话 + detail 提取函数）。
 ### 20.8 不改的
 | 项目 | 原因 |
 |---|---|
 | F2 @pangtong 对 Task 的逻辑 | Task failed 仍 @pangtong，只对 Mail 不同 |
 | no_reply_found 的判定逻辑 | 只在判定后加通知，不改判定本身 |
 | inform 类型邮件的完成逻辑 | inform 直接 done，不存在 no_reply_found |
 | 外部 API 的 from 校验 | system 不走 HTTP，外部无法伪造 |
 ---
 # §21 设计决策记录
 本节记录设计过程中的关键讨论和决策，便于未来回顾。
@@ -1009,3 +1155,38 @@ handler.post_complete(task_id, agent_id, outcome, db_path)
 **结论**：L2 的 RoleSkillSection 改为注入索引+引导语（~100 token），引导 Agent 用 `read` 去读 Skill 全文（L3 层）。遵循 Hermes 的渐进式 Skill 加载模式。
 ---
 ## §22. 审查与验证历史
 ### Step 2-5 背靠背审查（2026-06-10/11）
 Step 2-5（Task 五层架构重构）合并前，庞统和司马懿分别独立完成 v3.0 → HEAD 的背靠背审查。
 **审查范围**：v3.0 tag → HEAD（6 commits, +1584/-134 行, 9 个文件）
 **关键发现与修复**：
 | # | 问题 | 严重度 | 状态 |
 |---|------|--------|------|
 | A1 | dispatcher review verdict 处理丢失 | 致命 | ✅ 已修复（PR #24） |
 | A2 | Handler 注册初始化缺失 | 致命 | ✅ 已修复 |
 | D1 | pre_spawn 返回值未检查 | 严重 | ✅ 已修复（H1 3次重试） |
 | D2 | PromptContext 缺少 from_agent/mail_type | 严重 | ✅ 已修复 |
 | D5 | _check_reply 语义差异 | 严重 | ✅ 已修复（恢复 tasks 表查询） |
 | D3 | inform outcome 白名单 | 轻微 | ⚪ 保留（CRASH_OUTCOMES 已覆盖） |
 | D4 | retry prompt 硬编码 | 轻微 | ⚪ 保留（旧方法 deprecated） |
 | D6 | 标 done 重试 | 轻微 | ✅ 已修复（统一 _mark_task_status） |
 **Handler 缺陷修复（Step 5 前）**：
 | # | 修复 | 状态 |
 |---|------|------|
 | H1 | _mark_task_status 3 次重试 | ✅ |
 | H2 | review @mention comment_type | ✅ |
 | H3 | review 非 approved 保持 review | ✅ |
 **背靠背设计-编码一致性检查**：13 个专题（01-13），4 个严重偏差修复，6 个轻微保留。
 **详细审查记录**：见 `archive-3.0/` 目录。
 ---
@@ -1,6 +1,6 @@
-# §24 — Compact 检测方案修正
+# §15 — Compact 检测方案修正
-> 状态：**v5 已实现**（gateway log + jsonl 配对）
+> 状态：✅ 已完成（gateway log + jsonl 配对）
 > 作者：庞统
 > 日期：2026-06-11（v4），2026-06-13（v5）
 > 框架：基于 §07 Spawner Acquire-First
@@ -342,6 +342,8 @@ async def _handle_pull_request(payload: Dict[str, Any]) -> None:
        await _handle_pr_opened(payload)
    elif action == "closed":
        await _handle_pr_closed(payload)
    elif action == "synchronize":
        await _handle_pr_synchronize(payload)
 async def _handle_pr_opened(payload: Dict[str, Any]) -> None:
@@ -1,4 +1,4 @@
-"""Mail 失败通知 — 以 system 身份通知发件人"""
+"""Mail 失败通知 v2.0 — 以 system 身份通知发件人（AI Native）"""
 from __future__ import annotations
@@ -6,7 +6,7 @@ import json
 import logging
 from datetime import datetime
 from pathlib import Path
-from typing import Optional
+from typing import Dict, Optional
 from src.blackboard.models import Task
 from src.blackboard.operations import Blackboard
@@ -15,21 +15,121 @@ from src.config.agents import AGENT_IDS
 logger = logging.getLogger(__name__)
-# 邮件通知正文模板（统一模板，包含所有可能的失败原因和建议）
+# ── Reason 人话翻译 + detail 提取 ──────────────────────────────
 _NOTIFY_TEMPLATE = """你的邮件投递失败了。
-📧 原始邮件：「{title}」
+def _extract_stderr(detail: dict, max_len: int = 200) -> str:
-👤 收件人：{to_agent}
+    """从 detail 中提取 stderr_preview"""
-❌ 失败原因：{reason}
+    preview = (detail or {}).get("stderr_preview", "")
    if preview and len(preview) > max_len:
        preview = preview[:max_len] + "..."
    return preview
 常见失败原因及处理建议：
 • no_reply_found：收件人未回复。建议重发邮件，或通过黑板任务方式联系
 • auth_failed：收件人认证失败。需检查 Agent 配置，联系姜维(jiangwei-infra)排查
 • crash_limit：收件人处理时多次崩溃。系统异常，建议稍后重试
 • task_timeout：处理超时。建议重发或通过其他方式联系
 • 其他原因：建议联系副军师(pangtong-fujunshi)排查
-——系统自动通知"""
+def _fmt_retry_info(reason: str, detail: dict) -> str:
    """格式化重试情况描述"""
    _NO_RETRY_REASONS = {
        "no_reply_found", "auth_failed", "agent_error",
        "agent_failed", "compact_failed",
    }
    if reason in _NO_RETRY_REASONS:
        reason_human = _REASON_MAP.get(reason, _REASON_MAP.get("_default", ("未知原因", lambda d: "")))[0]
        return f"无法重试（{reason_human}）"
    count = (detail or {}).get("count", 0)
    fallback_count = (detail or {}).get("fallback_count", 0)
    if count > 0:
        return f"已自动重试 {count} 次"
    if fallback_count > 0:
        return f"已自动重试 {fallback_count} 次（fallback）"
    return "系统已尝试恢复，但仍失败"
 # reason_raw → (reason_human_readable, detail_format_fn)
 _REASON_MAP: Dict[str, tuple] = {
    "no_reply_found": ("收件人未回复（Agent 未能识别或处理此邮件）", lambda d: ""),
    "crashed": ("处理时进程崩溃", lambda d: f"stderr: {_extract_stderr(d)}" if _extract_stderr(d) else "无 stderr 输出"),
    "max_crash_count": ("连续崩溃达上限", lambda d: f"崩溃 {d.get('count', '?')} 次"),
    "max_retries": ("续杯耗尽（已自动重试）", lambda d: f"重试 {d.get('count', '?')} 次"),
    "max_api_retry_count": ("API 连续失败达上限", lambda d: f"API 重试 {d.get('count', '?')} 次"),
    "max_monitor_timeouts": (
        "处理超时达上限",
        lambda d: f"超时 {d.get('count', '?')} 次，"
                  f"共约 {d.get('elapsed_seconds', 0) // 60} 分钟"),
    "gateway_timeout": ("Agent 执行超时（已续杯重试）", lambda d: ""),
    "session_stuck": ("会话假死（lock PID 死亡）", lambda d: f"假死 {d.get('stuck_count', '?')} 次"),
    "revive_failed": ("会话恢复失败", lambda d: f"假死 {d.get('stuck_count', '?')} 次"),
    "auth_failed": ("Agent 认证失败（配置问题）", lambda d: f"stderr: {_extract_stderr(d)}" if _extract_stderr(d) else ""),
    "fallback_exhausted": (
        "主模型和备用模型均失败",
        lambda d: f"fallback {d.get('fallback_count', '?')} 次，"
                  f"原因: {d.get('fallback_reason', '未知')}"),
    "agent_error": (
        "Agent 内部错误",
        lambda d: f"stderr: {_extract_stderr(d)}" if _extract_stderr(d) else ""),
    "agent_failed": ("收件人主动标记失败", lambda d: ""),
    "compact_failed": ("上下文压缩失败", lambda d: f"stderr: {_extract_stderr(d)}" if _extract_stderr(d) else ""),
    "compact_hanging": ("上下文压缩长时间未完成", lambda d: ""),
    "compact_interrupted": ("上下文压缩被中断（已自动重试）", lambda d: ""),
    "gateway_unreachable": (
        "Gateway 不可达（已自动重试）",
        lambda d: f"stderr: {_extract_stderr(d)}"
                  if _extract_stderr(d) else ""),
    "lock_conflict": ("会话锁冲突（已自动重试）", lambda d: ""),
    "max_retry_count": ("重试耗尽", lambda d: f"重试 {d.get('count', '?')} 次"),
    "max_lock_retry_count": ("锁冲突重试耗尽", lambda d: f"重试 {d.get('count', '?')} 次"),
    "max_connect_retry_count": ("连接重试耗尽", lambda d: f"重试 {d.get('count', '?')} 次"),
    "_default": ("未知原因", lambda d: f"stderr: {_extract_stderr(d)}" if _extract_stderr(d) else ""),
 }
 # 常见失败原因参考（AI Native：提供知识库让收件 AI 自行判断）
 _REASON_REFERENCE = """常见失败原因参考：
 • no_reply_found：收件人未回复（Agent 未能识别或处理此邮件）
 • crashed / max_crash_count：收件人处理时进程崩溃（已自动重试 3 次）
 • max_retries：续杯耗尽（已自动重试 3 次，共约 34 分钟）
 • max_api_retry_count：API 连续失败达上限（rate_limit/500/503）
 • max_monitor_timeouts：处理超时达上限（共约 31.5 分钟）
 • gateway_timeout：Agent 执行超时（已续杯重试）
 • session_stuck：Agent 会话假死（lock PID 死亡，revive 失败）
 • revive_failed：会话假死后恢复失败
 • auth_failed：Agent 认证失败（配置问题）
 • fallback_exhausted：主模型和备用模型均失败
 • agent_failed：收件人主动标记失败
 • compact_failed：上下文压缩失败
 • compact_hanging：上下文压缩长时间未完成（等待超 31.5 分钟）
 • compact_interrupted：上下文压缩被中断（已自动重试 3 次）
 • gateway_unreachable：Gateway 不可达（已自动重试 3 次）
 • lock_conflict：会话锁冲突（已自动重试 3 次）
 • 其他：建议排查系统日志"""
 def _build_notify_text(title: str, to_agent: str, reason: str,
                       detail: Optional[dict] = None) -> str:
    """构建通知正文（v2.0 AI Native）"""
    reason_human, detail_fn = _REASON_MAP.get(reason, _REASON_MAP["_default"])
    detail_info = detail_fn(detail or {})
    retry_info = _fmt_retry_info(reason, detail or {})
    lines = [
        "邮件投递失败通知",
        "",
        f"📧 原始邮件：「{title}」",
        f"👤 收件人：{to_agent}",
        f"❌ 失败原因：{reason_human}（{reason}）",
        f"📊 重试情况：{retry_info}",
    ]
    if detail_info:
        lines.append("📋 上下文信息：")
        lines.append(f"  {detail_info}")
    lines.append("")
    lines.append(_REASON_REFERENCE)
    lines.append("")
    lines.append("——系统自动通知")
    return "\n".join(lines)
 def _is_mail_project(db_path: Path) -> bool:
@@ -43,7 +143,7 @@ def notify_mail_failed(db_path: Path, original_mail_id: str,
    """Mail 失败后以 system 身份给发件人发通知邮件
    直接通过 Blackboard 创建 Task，不走 HTTP API。
-    防递归：检查原邮件 must_hives.system_notify，为 true 则跳过。
+    防递归：检查原邮件 must_haves.system_notify，为 true 则跳过。
    发件人不是有效 Agent（如 system）→ 通知庞统代处理，避免广播风暴。
    """
    try:
@@ -83,12 +183,8 @@ def notify_mail_failed(db_path: Path, original_mail_id: str,
                           original_mail_id, from_agent)
            target_agent = "pangtong-fujunshi"
-        # 构造通知正文
+        # 构造通知正文（v2.0 AI Native）
-        text = _NOTIFY_TEMPLATE.format(
+        text = _build_notify_text(title, to_agent, reason, detail)
            title=title,
            to_agent=to_agent,
            reason=reason,
        )
        # 创建通知邮件 Task
        notify_id = f"mail-{int(datetime.now().timestamp() * 1000)}"
@@ -845,6 +845,8 @@ curl -X POST http://{api_host}:{api_port}/api/projects/{project_id}/tasks/{task_
                cls.get("retry_field", "retry_count")
            )
        elif outcome == "api_error":
            # A9: [DEPRECATED] api_error 已改为 should_retry=True 走续杯路径。
            # 此分支理论上不再命中，保留作为安全兜底。
            # A9: 429/API 错误 → release counter(on_complete)+ 推回 pending + 冷却
            # 有上限:api_retry_count 累计达 max_retries 则标 failed
            await self._do_on_complete_async(on_complete, agent_id, outcome)
@@ -1842,7 +1844,8 @@ curl -X POST http://{api_host}:{api_port}/api/projects/{project_id}/tasks/{task_
                        "retry_field": "retry_count", "cooldown_seconds": 60}
            if any(kw in stderr_lower for kw in [
                   "rate_limit", "500", "503", "api error"]):
-                return {"outcome": "api_error", "should_retry": False}
+                return {"outcome": "api_error", "should_retry": True,
                        "retry_field": "retry_count", "cooldown_seconds": 60}
            if any(kw in stderr_lower for kw in [
                   "compaction-diag", "context-overflow"]):
                return {"outcome": "compact_failed", "should_retry": False}
@@ -165,14 +165,16 @@ class TestClassifyErrorApi:
            1, {"status": "error"}, "rate_limit exceeded", None
        )
        assert result["outcome"] == "api_error"
-        assert result["should_retry"] is False
+        assert result["should_retry"] is True
        assert result["cooldown_seconds"] == 60
    def test_stderr_500(self):
        result = Spawner._classify_outcome(
            1, {"status": "error"}, "HTTP 500 Internal Server Error", None
        )
        assert result["outcome"] == "api_error"
-        assert result["should_retry"] is False
+        assert result["should_retry"] is True
        assert result["cooldown_seconds"] == 60
 class TestClassifyErrorCompact:
Author	SHA1	Message	Date
cfdaily	fe7f914681	fix: _handle_pull_request 补充 synchronize action dispatch CI / lint (pull_request) Successful in 7s Details CI / test (pull_request) Successful in 8s Details CI / notify-on-failure (pull_request) Successful in 0s Details 姜维排查发现 _handle_pull_request 只处理 opened/closed， Gitea 发 pull_request + action=synchronize 时被静默丢弃。 _handle_pr_synchronize 已存在但未被 dispatch 到。修复：加 elif action == synchronize dispatch。 pull_request_sync 注册保留作为双保险。	2026-06-13 14:42:38 +08:00
cfdaily	eccb4d2723	docs: 设计文档编号重排(20→14, 24→15) + 已完成文档状态标注更新 CI / lint (pull_request) Successful in 7s Details CI / test (pull_request) Successful in 9s Details CI / notify-on-failure (pull_request) Successful in 0s Details	2026-06-13 10:12:39 +08:00
pangtong-fujunshi	9e2145171a	Merge PR #57	2026-06-13 01:36:24 +00:00
cfdaily	67cad2dd96	fix: _REASON_MAP 补 agent_error 条目（G2） CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 8s Details CI / notify-on-failure (pull_request) Successful in 0s Details spawner 会产生 agent_error reason，之前缺映射走到 _default 显示'未知原因'。	2026-06-13 09:35:15 +08:00
pangtong-fujunshi	79da0bd07e	Merge PR #56	2026-06-13 01:34:39 +00:00
cfdaily	a116f7e6c0	fix: 注释拼写 must_hives → must_haves CI / lint (pull_request) Successful in 7s Details CI / test (pull_request) Successful in 8s Details CI / notify-on-failure (pull_request) Successful in 0s Details	2026-06-13 09:33:59 +08:00
cfdaily	7fb4d988ec	fix: lint 修复 + api_error 测试更新 CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 8s Details CI / notify-on-failure (pull_request) Successful in 0s Details - mail_notify: f-string 反斜杠修复、行过长修复、unused import - test_classify_outcome: api_error should_retry 改 True	2026-06-13 09:29:52 +08:00
cfdaily	f4dd9ff78d	feat(daemon): Mail 失败通知 v2.0 — api_error retry + 通知增强 CI / lint (pull_request) Failing after 7s Details CI / test (pull_request) Has been skipped Details CI / notify-on-failure (pull_request) Successful in 1s Details P1: api_error rate_limit/500/503 改为可恢复 retry（should_retry=True，60s cooldown） P2: 通知模板动态化（reason 人话翻译 + detail 信息 + 重试情况 + AI Native 知识库）设计文档：§20.7 (20-task-type-architecture.md)	2026-06-13 09:27:17 +08:00
pangtong-fujunshi	6520e78c0b	Merge PR #55	2026-06-13 01:23:33 +00:00
cfdaily	0169823b72	chore(docs): 合并 mail-failure-notification 到 §20，更新设计方案 CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 9s Details CI / notify-on-failure (pull_request) Successful in 0s Details - mail-failure-notification.md → archive-3.0/ - §20 新增 §20 Mail 失败通知机制（v2.0 AI Native） - 失败场景与重试耗时完整表 - reason 人话翻译映射 - 通知模板增强（detail 传入 + 重试情况） - api_error rate_limit 待改为可恢复 retry - §18→§21，§19→§22 编号顺延	2026-06-13 09:22:32 +08:00
pangtong-fujunshi	77252c39c6	Merge PR #54	2026-06-13 00:59:11 +00:00
cfdaily	5a80d6c5cd	chore(docs): gateway-watchdog.md 改编号 99 CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 9s Details CI / notify-on-failure (pull_request) Successful in 0s Details	2026-06-13 08:58:04 +08:00
pangtong-fujunshi	322263585d	Merge PR #53	2026-06-13 00:54:39 +00:00
cfdaily	c7b4b262b1	chore(docs): 归档 §13-sim §18 §21 §25 至 archive-3.0 CI / lint (pull_request) Successful in 7s Details CI / test (pull_request) Successful in 9s Details CI / notify-on-failure (pull_request) Successful in 0s Details - 13-toolchain-and-dev-workflow-simulation.md → archive-3.0/（模拟报告，§16 已覆盖） - 18-toolchain-e2e-test.md → archive-3.0/（E2E 测试记录，§13 已引用） - 21-e2e-verification-handler.md → archive-3.0/（Handler 验证，§20 §19 已覆盖） - 25-gitea-mention-toolchain.md → archive-3.0/（@mention 集成，§13 §16 已覆盖）	2026-06-13 08:53:23 +08:00
pangtong-fujunshi	e43d87f3db	Merge PR #52	2026-06-13 00:53:09 +00:00
cfdaily	b07e311921	chore(docs): 归档 §22 §23 至 archive-3.0，§13 追加 §7.6 CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 8s Details CI / notify-on-failure (pull_request) Successful in 0s Details - 22-cd-production.md → archive-3.0/（部署成功通知草案） - 23-toolchain-pr-lifecycle.md → archive-3.0/（PR 全生命周期，已由 §13 §16 覆盖） - §13 §7 新增 §7.6 部署成功通知（草案引用）	2026-06-13 08:51:35 +08:00
pangtong-fujunshi	6ca9b19876	Merge PR #51	2026-06-13 00:50:36 +00:00
cfdaily	98eb15125d	chore(docs): 归档 §20 审查文档至 archive-3.0，追加审查历史 CI / lint (pull_request) Successful in 6s Details CI / test (pull_request) Successful in 9s Details CI / notify-on-failure (pull_request) Successful in 0s Details - review-v3-vs-head-pangtong.md → archive-3.0/ - review-v3-vs-head-simayi.md → archive-3.0/ - step5-audit-report.md → archive-3.0/ - step5-impact-analysis.md → archive-3.0/ - §20 新增 §19 审查与验证历史（关键发现+修复状态汇总）	2026-06-13 08:49:41 +08:00
pangtong-fujunshi	a01bedb193	Merge PR #50	2026-06-13 00:35:50 +00:00