Issues - pAI AI Charlie Debate Bot

Documentation of Issues in the AI Charlie Debate Bot Project

As of January 12, 2026, this document compiles the major issues encountered during the development and troubleshooting of the AI Charlie Debate ChatBot (a Gradio-based web app for text/voice debates with an AI mimicking Charlie Kirk, using LM Studio for text generation, faster-whisper for STT, and f5-tts-mlx for TTS/voice cloning). The project was initially on Windows with NVIDIA RTX PRO 6000 (Blackwell architecture, sm_120 compute capability) but shifted to Mac M1 (Apple Silicon).

1. Slow TTS Generation and Sound Output

Description: TTS audio generation takes 20–60+ seconds per response on Mac M1, making the app feel unresponsive, though the quantized models (--q 4) output is amazing with 210+ TPS real-time on my RTX PRO 6000.(see below).
Causes:
- f5-tts-mlx optimized for Apple Silicon but still slow on M1/M2 (older chips); newer M3/M4 are faster, but M1 struggles with model load/inference. Same slow on my PC too.
- Quantization is super fast. Waiting mainly comes from tts pc model.
Attempts to Solve: Tried prompts to reduce output, debug prints to track subprocess but latency remains high.
Status: Seems unsolvable on M1 and GPU-less PC; or without switching models

2. Voice Cloning Quality: Garbled/Broken Sound, Not Speaking the Text

Description: Generated WAV files sometimes are uninterpretable noise/gibberish/artifacts instead of clear speech matching the text. Voice cloning fails to capture Charlie Kirk's timbre, prosody, or content accurately.
Causes:
- Low quantization (--q 4) introduces artifacts/gibberish; higher q (8) better but slower.
- Reference clip quality: 24kHz mono s16, clip has no noise/silence/multiple speakers.
- From GitHub issues/browse: Open bugs in f5-tts-mlx for "weird artefacts, gibberish output" on Apple Silicon.
- Standalone tests succeed with short audio, but app responses (longer text) exacerbate issues.
Attempts to Solve: Multiple conversions (ffmpeg to 24kHz/s16/mono), shortened REF_TEXT, higher q — standalone generates short clips, but quality poor; app output garbled.
Status: Unsolvable with current f5-tts-mlx.

3. No Support for RTX PRO 6000 (Blackwell GPU, sm_120 Architecture)

Description: The app doesn't utilize the RTX PRO 6000 GPU on Windows/PC setup; CUDA compute capability sm_120 (Blackwell) not supported by key dependencies, forcing CPU fallback (slow/no acceleration).
Causes:
- PyTorch stable versions in early 2026 (e.g., 2.5–2.8) don't fully support sm_120.
Attempts to Solve: Tried GPU enable in TTS (gpu=True), but deps fail compilation for sm_120.
Status: Seems unsolvable in my current setup.

Documentation of Issues in the AI Charlie Debate Bot Project

1. Slow TTS Generation and Sound Output

2. Voice Cloning Quality: Garbled/Broken Sound, Not Speaking the Text

3. No Support for RTX PRO 6000 (Blackwell GPU, sm_120 Architecture)

4. Hold-to-Record Button Not Clickable/Responsive