Speech Audio
The Speech Audio system in Story creates immersive, cinematic experiences by allowing characters to emit voice lines and AI-generated speech. The system integrates with Minecraft's sound engine and supports both pre-recorded audio files and real-time AI voice generation through ElevenLabs.
Audio Features
Voice Playback
Randomized Voice Selection: Voices are selected from preconfigured sound files per gender (e.g.,
feminine_01,masculine_05)Anti-Repetition Logic: Prevents the same voice line from playing twice in a row
Proximity Sound Emission: Characters emit sounds at their location, audible to nearby players
Custom Voice Support: Individual NPCs can have unique voices through ElevenLabs integration
AI Voice Generation (v0.2.0+)
ElevenLabs Integration: Real-time voice generation for dynamic dialogue
Character-Specific Voices: Each NPC can have a unique voice profile
Contextual Speech: AI generates appropriate tone and emotion based on conversation context
Client-Side Mod Required: Real-time AI voice playback requires the Story-Client Fabric mod to be installed on each player's client
Voice Types
Pre-Recorded Audio
Simlish-Style: Non-language vocalizations similar to The Sims games
Gender-Based Pools: Separate voice collections for masculine and feminine characters
Emotional Variants: Different tones for various emotional states
Ambient Sounds: Background vocalizations for atmosphere
AI-Generated Speech
Natural Language: Full text-to-speech with proper pronunciation
Character Consistency: Each NPC maintains their unique voice across conversations
Emotional Range: AI adjusts tone based on context and character personality
Dynamic Content: Can speak any text, including player names and dynamic content
Client Mod Dependency: Requires Story-Client Fabric mod for real-time playback
Configuration
Basic Audio Settings
misc:
voiceGenerationEnabled: true
scheduleVoiceGenerationEnabled: true
playerVoiceGenerationEnabled: true
elevenLabsApiKey: "your_api_key_here"NPC Voice Settings
name: Elder Huran
role: Default role
location: Lysathara
context:
"Elder Huran is ambitious, has the quirk of no quirks, is motivated by protecting their home, and their flaw is obsessive. They speak in a passionate tone."
appearance: ""
customVoice: DQCYGgKbvha45IXs96FO # define this if you want elevenlabs voiceVoice Assignment
Default Voices
Characters use gender-based voice pools by default
Voice selection is randomized from available options
Anti-repetition ensures variety in voice playback
Custom Voices
Set
customVoicefield in NPC data for unique voicesUse ElevenLabs voice IDs for specific character voices
Custom voices override default gender-based selection
Voice Generation
AI-generated voices are created on-demand during conversations
Voices are cached for performance and consistency
Fallback to pre-recorded audio if AI generation fails
Usage Examples
Basic Voice Playback
Player: "Hello there!"
NPC: [Plays voice line] "Greetings, traveler. How may I assist you?"AI-Generated Speech
Player: "Tell me about the Brotherhood"
NPC: [AI generates voice] "The Brotherhood of the Unseen Path is an ancient organization..."Note: AI-generated speech requires the Story-Client Fabric mod for real-time playback
Custom Voice Character
name: "Maester Valen"
customVoice: "eleven_labs_voice_id_here"
context: "A wise teacher with a distinctive, scholarly tone"Audio Quality and Performance
Optimization Features
Voice Caching: Generated voices are stored for reuse (in schedule barks, or multiple players in one spot during generation)
Compression: Audio files are optimized for Minecraft's sound system
Fallback System: Pre-recorded audio plays if AI generation fails
Performance Monitoring: System tracks generation times and success rates
Quality Settings
Sample Rate: Optimized for Minecraft's audio engine
Bit Depth: Balanced quality and file size
Format: Compatible with Minecraft's sound system
Length Limits: Prevents excessively long audio generation
Integration with Game Systems
Citizens Plugin
NPCs emit audio at their physical location (not for real time elevenlabs voice)
Sound travels realistically through the world
Players hear audio based on game audio settings
Pre-recorded audio: Works with standard Minecraft audio
AI-generated audio: Requires Story-Client mod for real-time playback
Conversation System
Audio plays automatically during AI conversations
Voice generation happens in real-time
Text and audio are synchronized for natural flow
Quest System
Quest-related dialogue includes voice generation
Important story moments use enhanced audio
Character voices maintain consistency across quest interactions
Best Practices
Voice Design
Character Consistency: Use the same voice for each NPC across all interactions
Appropriate Tone: Match voice characteristics to character personality
Performance Balance: Balance audio quality with generation speed
Fallback Planning: Always have pre-recorded alternatives
Performance Optimization
Voice Caching: Enable caching for frequently used voices
Batch Generation: Generate multiple voices during off-peak times
Quality Settings: Adjust quality based on server performance
Monitoring: Track generation success rates and adjust accordingly
User Experience
Volume Control: Ensure voices are audible but not overwhelming
Proximity Awareness: Use realistic sound falloff distances
Contextual Audio: Match voice tone to conversation context
Accessibility: Provide text alternatives for hearing-impaired players
Troubleshooting
Common Issues
No Audio Playing: Check ElevenLabs API key and voice generation settings
Poor Audio Quality: Verify API quota and network connection
Voice Inconsistency: Ensure custom voice IDs are correctly set
Performance Issues: Monitor generation times and enable caching
AI Voice Not Playing: Verify players have the Story-Client Fabric mod installed
Client Mod Issues: Check that the Story-Client mod is compatible with the server version
Debug Features
Audio Logging: Track voice generation requests and responses
Performance Metrics: Monitor generation times and success rates
Fallback Testing: Verify pre-recorded audio plays when AI fails
Voice Validation: Check that custom voice IDs exist in ElevenLabs
Future Development
Planned Features
Voice Cloning: Create custom voices from sample audio
Emotional AI: More sophisticated emotional voice generation
Multi-Language Support: Voices in different languages
Voice Customization: Player-created voice profiles
Performance Improvements
Local Generation: On-server voice generation for better performance
Advanced Caching: Smarter voice caching and management
Batch Processing: Generate multiple voices simultaneously
Quality Optimization: Better balance of quality and performance
API Integration
ElevenLabs Configuration
API Key: Required for AI voice generation
Voice IDs: Specific voices for different character types
Rate Limits: Respect API usage limits and quotas
Error Handling: Graceful fallback when API is unavailable
Voice Management
Voice Library: Maintain collection of available voices
Character Assignment: Link voices to specific NPCs
Quality Control: Monitor and maintain voice quality
Backup Systems: Ensure audio always plays, even if AI fails
Last updated