12 Commits

Author SHA1 Message Date
dekun 541df29722 Fix inconsistent voice across TTS segments
Use the same manual_seed for every chunk and normalize per-segment peaks before concat so long voiceovers no longer sound like different speakers between segments.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:46:25 +08:00
dekun bdc63c04df Add voice history, default preset voice, and one-click tab
Keep synthesized wav files browsable with playback and download, default to preset steady male voice, show one-click pipeline as the first tab, and reduce post-synthesis UI flicker.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:37:53 +08:00
dekun 7c50b13c57 Fix TTS synthesis UI stuck on loading state
Enable Gradio queue, immediate pending feedback, segment progress, and gr.update for Audio so long syntheses show logs and playback correctly.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:02:34 +08:00
dekun eb71e28427 Add local GPU preset voices with dropdown selection.
Generate ChatTTS sample_random_speaker presets without cloud APIs; choose clone or preset in synthesize UI.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 17:28:17 +08:00
dekun 8be34a2fd5 Fix ChatTTS CUDA device-side assert with text sanitize and GPU recovery.
Re-enable KV cache by default, normalize digits and unsafe chars, disable per-chunk split_text, and reload ChatTTS after CUDA errors.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 17:13:57 +08:00
dekun 1779449bba Fix ChatTTS recursion depth exceeded on empty generation.
Disable ensure_non_empty retries, set min_new_token, always refine text, and use per-chunk manual_seed.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 17:10:26 +08:00
dekun 0cce6cda7c Fix CUDA OOM by mutually unloading Whisper and ChatTTS on 8GB GPU.
Release GPU memory before TTS/ASR switches, lower TTS token limits, and set PYTORCH_CUDA_ALLOC_CONF in PM2.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 17:03:37 +08:00
dekun 82f99c0b89 Fix ChatTTS Corrupt input data by correcting speaker params.
Use spk_smp plus txt_smp for voice clone instead of mis-encoding into spk_emb; migrate legacy speaker_emb.pt and improve error hints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 16:41:23 +08:00
dekun f36056d293 Add TTS markdown sanitization and expand deployment docs.
Strip Markdown and stage directions before ChatTTS synthesis with chunked long scripts; document model pre-download, server-update, and microphone HTTPS notes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 16:31:06 +08:00
dekun 39e29fe6a9 Load mobile audio via ffmpeg to avoid librosa PySoundFile warnings.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 16:05:55 +08:00
dekun aacdffac77 Fix ChatTTS load: pre-download via HF mirror, avoid GitHub timeout.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 15:16:27 +08:00
dekun 5e95d3af2f Initial commit: add Trading Studio voice-over pipeline for quant trading review videos.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 13:19:44 +08:00