Fix CUDA OOM by mutually unloading Whisper and ChatTTS on 8GB GPU.

Release GPU memory before TTS/ASR switches, lower TTS token limits, and set PYTORCH_CUDA_ALLOC_CONF in PM2. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 17:03:37 +08:00
parent 82f99c0b89
commit 0cce6cda7c
7 changed files with 169 additions and 40 deletions
@@ -733,10 +733,26 @@ nvidia-smi
 fuser -v /dev/nvidia*
 ```

-Whisper 与 ChatTTS 不会同时常驻最大显存，但首次加载模型时峰值较高。建议：
+Whisper 与 ChatTTS **不能同时常驻** 8GB 显存（会 CUDA OOM）。应用已自动互斥卸载：

- 锁定 120W 功耗墙
- `max_memory_restart: "6G"` 已在 PM2 配置中设置
+- 识别前卸载 ChatTTS
+- 合成 / 锁定音色前卸载 Whisper
+
+若仍 OOM：
+
+```bash
+pm2 restart trading_studio
+nvidia-smi   # 确认无其他占 GPU 进程
+```
+
+在 `.env` 调低合成峰值：
+
+```ini
+TTS_MAX_CHARS_PER_CHUNK=150
+TTS_MAX_NEW_TOKEN=768
+```
+
+PM2 已配置 `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` 缓解碎片。建议锁定 120W 功耗墙。

 ### 10.3 Whisper 模型加载失败