Building a macOS-Inspired TTS UI in Rust: A Simple Yet Powerful Text-to-Speech Interface

The Project

Simple-ttsui is a Rust-based GUI application built with egui that provides an elegant interface for TTS operations. Rather than relying on command-line scripts for text processing and audio playback, this tool offers a visual approach to managing TTS operations with real-time feedback and configurable settings.

The application currently integrates with Kokoro, an open-weight TTS model that delivers high-quality voice synthesis perfect for local, offline text-to-speech generation without requiring cloud services.

Core Features

Streamlined Pipeline Management

The application handles intelligent text segmentation for optimal TTS processing with sequential streaming that processes text chunks in order for natural speech flow. Unlike simple character or word-based splitting, it implements semantic chunking that preserves sentence boundaries, handles punctuation appropriately, and manages paragraph breaks while optimizing chunk sizes for processing efficiency.

Visual Configuration Interface

The GUI provides easy switching between different Kokoro voice styles, real-time adjustment of speech rate, and visual configuration of TTS engine paths. Voice mixing is already supported with ratios like "af_sky.4+af_nicole.5" for blended vocal characteristics.

Unix Philosophy Integration

Following Unix philosophy, the application accepts input via stdin, enabling powerful pipeline compositions:

echo "Hello, world" | ./target/release/ttsui
cat speech.txt | ./target/release/ttsui
curl -s https://example.com/article.txt | ./target/release/ttsui

The system automatically detects available audio players (aplay, paplay, play, ffplay, mpg123) and falls back appropriately, ensuring compatibility across different Unix configurations.

Development Process

Technology Stack Selection

Rust + egui was chosen for compelling reasons: egui is a simple, fast, and highly portable immediate mode GUI library perfect for real-time TTS status updates and cross-platform compatibility.

Integration Approach

While the application currently integrates with Kokoro TTS, the architecture is designed to be flexible. The program could easily be adapted to work with various modern AI TTS APIs, but Kokoros was chosen as it currently ranks as the best performing model according to Hugging Face statistics. The configuration structure demonstrates this modular integration:

impl Default for KokoroConfig {
    fn default() -> Self {
        let home = std::env::var("HOME").unwrap_or_else(|_| ".".to_string());
        Self {
            exec_path: format!("{}/kokoros/target/release/koko", home),
            model_path: format!("{}/kokoros/checkpoints/kokoro-v1.0.onnx", home),
            voice_data: format!("{}/kokoros/voices-v1.0.bin", home),
            speed: 1.1,
            voice_style: "af_heart".to_string(),
            chunking: ChunkingConfig::default(),
        }
    }
}

Pipeline Architecture

Input processing via stdin, intelligent chunking engine, TTS processing through Kokoro, coordinated audio playback, and real-time UI feedback. The configuration system balances ease of use with flexibility through sensible defaults, environment awareness, and runtime adjustment capabilities.

  graph TD
    A[Input via stdin] --> B[Input Processing]
    B --> C[Intelligent Chunking Engine]
    C --> D[TTS Processing via Kokoro]
    D --> E[Coordinated Audio Playbook]
    E --> F[Real-time UI Feedback]
    
    G[Configuration System] --> H[Sensible Defaults]
    G --> I[Environment Awareness]
    G --> J[Runtime Adjustment]
    
    H --> B
    I --> B
    J --> E
    
    style A fill:#e1f5fe,stroke:#0277bd
    style C fill:#f3e5f5,stroke:#7b1fa2
    style D fill:#fff3e0,stroke:#ef6c00
    style E fill:#e8f5e8,stroke:#2e7d32
    style F fill:#fff8e1,stroke:#f57f17
    style G fill:#fce4ec,stroke:#c2185b

Build and Usage

Setup

# Clone and build
git clone https://github.com/JRoshthen1/simple-ttsui
cd simple-ttsui
cargo build --release

# Install Kokoro TTS dependency
git clone https://github.com/lucasjinreal/Kokoros
cd Kokoros 
pip install -r scripts/requirements.txt
python scripts/fetch_voices.py
cargo build --release

Running the Application

# Basic usage
echo "Hello, world" | ./target/release/ttsui

# Process text files
cat my_speech.txt | ./target/release/ttsui

# Interactive mode
./target/release/ttsui

Future Improvement Ideas

Plugin System: Support for additional TTS engines

Memory Optimization: Reduce memory footprint for large text processing

Conclusion

The project serves as an excellent example of how modern Rust GUI development can create applications that are both technically sophisticated and user-friendly, bringing desktop-class interfaces to Unix command-line workflows.

For the complete source code, visit the GitHub repository.

Table of Contents