Table of Contents
Building a macOS-Inspired TTS UI in Rust: A Simple Yet Powerful Text-to-Speech Interface
The Project
Simple-ttsui is a Rust-based GUI application built with egui that provides an elegant interface for TTS operations. Rather than relying on command-line scripts for text processing and audio playback, this tool offers a visual approach to managing TTS operations with real-time feedback and configurable settings.
The application currently integrates with Kokoro, an open-weight TTS model that delivers high-quality voice synthesis perfect for local, offline text-to-speech generation without requiring cloud services.
Core Features
Streamlined Pipeline Management
The application handles intelligent text segmentation for optimal TTS processing with sequential streaming that processes text chunks in order for natural speech flow. Unlike simple character or word-based splitting, it implements semantic chunking that preserves sentence boundaries, handles punctuation appropriately, and manages paragraph breaks while optimizing chunk sizes for processing efficiency.
Visual Configuration Interface
The GUI provides easy switching between different Kokoro voice styles, real-time adjustment of speech rate, and visual configuration of TTS engine paths. Voice mixing is already supported with ratios like "af_sky.4+af_nicole.5" for blended vocal characteristics.
Unix Philosophy Integration
Following Unix philosophy, the application accepts input via stdin, enabling powerful pipeline compositions:
echo "Hello, world" | ./target/release/ttsui
cat speech.txt | ./target/release/ttsui
curl -s https://example.com/article.txt | ./target/release/ttsui
The system automatically detects available audio players (aplay, paplay, play, ffplay, mpg123) and falls back appropriately, ensuring compatibility across different Unix configurations.
Development Process
Technology Stack Selection
Rust + egui was chosen for compelling reasons: egui is a simple, fast, and highly portable immediate mode GUI library perfect for real-time TTS status updates and cross-platform compatibility.
Integration Approach
While the application currently integrates with Kokoro TTS, the architecture is designed to be flexible. The program could easily be adapted to work with various modern AI TTS APIs, but Kokoros was chosen as it currently ranks as the best performing model according to Hugging Face statistics. The configuration structure demonstrates this modular integration:
impl Default for KokoroConfig {
fn default() -> Self {
let home = std::env::var("HOME").unwrap_or_else(|_| ".".to_string());
Self {
exec_path: format!("{}/kokoros/target/release/koko", home),
model_path: format!("{}/kokoros/checkpoints/kokoro-v1.0.onnx", home),
voice_data: format!("{}/kokoros/voices-v1.0.bin", home),
speed: 1.1,
voice_style: "af_heart".to_string(),
chunking: ChunkingConfig::default(),
}
}
}
Pipeline Architecture
Input processing via stdin, intelligent chunking engine, TTS processing through Kokoro, coordinated audio playback, and real-time UI feedback. The configuration system balances ease of use with flexibility through sensible defaults, environment awareness, and runtime adjustment capabilities.
graph TD
A[Input via stdin] --> B[Input Processing]
B --> C[Intelligent Chunking Engine]
C --> D[TTS Processing via Kokoro]
D --> E[Coordinated Audio Playbook]
E --> F[Real-time UI Feedback]
G[Configuration System] --> H[Sensible Defaults]
G --> I[Environment Awareness]
G --> J[Runtime Adjustment]
H --> B
I --> B
J --> E
style A fill:#e1f5fe,stroke:#0277bd
style C fill:#f3e5f5,stroke:#7b1fa2
style D fill:#fff3e0,stroke:#ef6c00
style E fill:#e8f5e8,stroke:#2e7d32
style F fill:#fff8e1,stroke:#f57f17
style G fill:#fce4ec,stroke:#c2185b
Build and Usage
Setup
# Clone and build
git clone https://github.com/JRoshthen1/simple-ttsui
cd simple-ttsui
cargo build --release
# Install Kokoro TTS dependency
git clone https://github.com/lucasjinreal/Kokoros
cd Kokoros
pip install -r scripts/requirements.txt
python scripts/fetch_voices.py
cargo build --release
Running the Application
# Basic usage
echo "Hello, world" | ./target/release/ttsui
# Process text files
cat my_speech.txt | ./target/release/ttsui
# Interactive mode
./target/release/ttsui
Future Improvement Ideas
Plugin System: Support for additional TTS engines
Memory Optimization: Reduce memory footprint for large text processing
Conclusion
The project serves as an excellent example of how modern Rust GUI development can create applications that are both technically sophisticated and user-friendly, bringing desktop-class interfaces to Unix command-line workflows.
For the complete source code, visit the GitHub repository.