From a3aeb0a4922c2756e2164362ab9b84cfa2e41ff0 Mon Sep 17 00:00:00 2001 From: cfdaily Date: Tue, 2 Jun 2026 14:44:53 +0800 Subject: [PATCH] auto-sync: 2026-06-02 14:44:53 --- .../08-classify-outcome-optimization.md | 44 +++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/docs/design/08-classify-outcome-optimization.md b/docs/design/08-classify-outcome-optimization.md index 2f3c6a8..738ebfb 100644 --- a/docs/design/08-classify-outcome-optimization.md +++ b/docs/design/08-classify-outcome-optimization.md @@ -37,6 +37,50 @@ architecture-v3.0.md §8.2 描述三层并发控制,代码已实现但存在 2. **Registry 清理**:删除项目时同步清理 registry,discover 时同步清理孤儿 3. **并发控制对齐**:标记 AgentProfile.max_concurrent 为 TODO 或接入 counter +## 2.1 Phase -1: Gateway 存活检查(新增) + +### 问题 + +spawner Phase 0-4 只检查 agent 本地文件状态(sessions.json、lock),不检查 Gateway 进程是否活着。如果 Gateway 挂了或重启中,`openclaw agent` 命令会 hang 10+ 分钟后失败,浪费资源。 + +### 方案 + +在 Phase 0 之前增加 Gateway liveness check: + +```python +async def _probe_gateway(self, timeout: float = 3.0) -> bool: + """TCP + WebSocket Upgrade 握手探测 Gateway""" + try: + reader, writer = await asyncio.wait_for( + asyncio.open_connection('127.0.0.1', 18789), timeout=timeout + ) + writer.write( + b'GET /ws HTTP/1.1\r\n' + b'Host: 127.0.0.1:18789\r\n' + b'Upgrade: websocket\r\n' + b'Connection: Upgrade\r\n' + b'Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==\r\n' + b'Sec-WebSocket-Version: 13\r\n\r\n' + ) + await writer.drain() + resp = await asyncio.wait_for(reader.read(256), timeout=timeout) + writer.close() + return b'101' in resp + except Exception: + return False +``` + +### 处理 + +Gateway 不可达 → `AgentBusyError(reason="gateway_down")` → 任务留 pending → ticker 30s 后自动重试。 + +### 性能 + +- 延迟:~1-10ms(正常情况) +- 无新依赖(原生 asyncio) +- 对 Gateway 零负担(等同一次普通 HTTP 请求) +- 验证脚本:`scripts/gateway_monitor.py`(每 10s 探测 + 日志记录) + ## 3. Classify Outcome 判定树 v2.0 ### 3.1 核心分类原则