Fix CUDA OOM by mutually unloading Whisper and ChatTTS on 8GB GPU.

Release GPU memory before TTS/ASR switches, lower TTS token limits, and set PYTORCH_CUDA_ALLOC_CONF in PM2.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
dekun
2026-06-12 17:03:37 +08:00
parent 82f99c0b89
commit 0cce6cda7c
7 changed files with 169 additions and 40 deletions
+19 -3
View File
@@ -733,10 +733,26 @@ nvidia-smi
fuser -v /dev/nvidia*
```
Whisper 与 ChatTTS 不会同时常驻最大显存,但首次加载模型时峰值较高。建议
Whisper 与 ChatTTS **不能同时常驻** 8GB 显存(会 CUDA OOM)。应用已自动互斥卸载
- 锁定 120W 功耗墙
- `max_memory_restart: "6G"` 已在 PM2 配置中设置
- 识别前卸载 ChatTTS
- 合成 / 锁定音色前卸载 Whisper
若仍 OOM
```bash
pm2 restart trading_studio
nvidia-smi # 确认无其他占 GPU 进程
```
`.env` 调低合成峰值:
```ini
TTS_MAX_CHARS_PER_CHUNK=150
TTS_MAX_NEW_TOKEN=768
```
PM2 已配置 `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` 缓解碎片。建议锁定 120W 功耗墙。
### 10.3 Whisper 模型加载失败