Fix CUDA OOM by mutually unloading Whisper and ChatTTS on 8GB GPU.

Release GPU memory before TTS/ASR switches, lower TTS token limits, and set PYTORCH_CUDA_ALLOC_CONF in PM2.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
dekun
2026-06-12 17:03:37 +08:00
parent 82f99c0b89
commit 0cce6cda7c
7 changed files with 169 additions and 40 deletions
+4
View File
@@ -13,3 +13,7 @@ OLLAMA_PORT=11434
# WHISPER_MODEL_DIR=/opt/Trading_Studio/models/whisper
# WHISPER_MODEL_SIZE=small
# HF_ENDPOINT=https://hf-mirror.com
# 8GB 显存 OOM 时可调低(合成按段切分)
# TTS_MAX_CHARS_PER_CHUNK=150
# TTS_MAX_NEW_TOKEN=768