Speech Audio

The Speech Audio system in Story creates immersive, cinematic experiences by allowing characters to emit voice lines and AI-generated speech. The system integrates with Minecraft's sound engine and supports both pre-recorded audio files and real-time AI voice generation through ElevenLabs.

Audio Features

Voice Playback

  • Randomized Voice Selection: Voices are selected from preconfigured sound files per gender (e.g., feminine_01, masculine_05)

  • Anti-Repetition Logic: Prevents the same voice line from playing twice in a row

  • Proximity Sound Emission: Characters emit sounds at their location, audible to nearby players

  • Custom Voice Support: Individual NPCs can have unique voices through ElevenLabs integration

AI Voice Generation (v0.2.0+)

  • ElevenLabs Integration: Real-time voice generation for dynamic dialogue

  • Character-Specific Voices: Each NPC can have a unique voice profile

  • Contextual Speech: AI generates appropriate tone and emotion based on conversation context

  • Client-Side Mod Required: Real-time AI voice playback requires the Story-Client Fabric mod to be installed on each player's client

Voice Types

Pre-Recorded Audio

  • Simlish-Style: Non-language vocalizations similar to The Sims games

  • Gender-Based Pools: Separate voice collections for masculine and feminine characters

  • Emotional Variants: Different tones for various emotional states

  • Ambient Sounds: Background vocalizations for atmosphere

AI-Generated Speech

  • Natural Language: Full text-to-speech with proper pronunciation

  • Character Consistency: Each NPC maintains their unique voice across conversations

  • Emotional Range: AI adjusts tone based on context and character personality

  • Dynamic Content: Can speak any text, including player names and dynamic content

  • Client Mod Dependency: Requires Story-Client Fabric mod for real-time playback

Configuration

Basic Audio Settings

misc:
  voiceGenerationEnabled: true
  scheduleVoiceGenerationEnabled: true
  playerVoiceGenerationEnabled: true
  elevenLabsApiKey: "your_api_key_here"

NPC Voice Settings

npcs/elder_huran.yml
name: Elder Huran
role: Default role
location: Lysathara
context:
  "Elder Huran is ambitious, has the quirk of no quirks, is motivated by protecting their home, and their flaw is obsessive. They speak in a passionate tone."
appearance: ""
customVoice: DQCYGgKbvha45IXs96FO # define this if you want elevenlabs voice

Voice Assignment

Default Voices

  • Characters use gender-based voice pools by default

  • Voice selection is randomized from available options

  • Anti-repetition ensures variety in voice playback

Custom Voices

  • Set customVoice field in NPC data for unique voices

  • Use ElevenLabs voice IDs for specific character voices

  • Custom voices override default gender-based selection

Voice Generation

  • AI-generated voices are created on-demand during conversations

  • Voices are cached for performance and consistency

  • Fallback to pre-recorded audio if AI generation fails

Usage Examples

Basic Voice Playback

Player: "Hello there!"
NPC: [Plays voice line] "Greetings, traveler. How may I assist you?"

AI-Generated Speech

Player: "Tell me about the Brotherhood"
NPC: [AI generates voice] "The Brotherhood of the Unseen Path is an ancient organization..."

Note: AI-generated speech requires the Story-Client Fabric mod for real-time playback

Custom Voice Character

name: "Maester Valen"
customVoice: "eleven_labs_voice_id_here"
context: "A wise teacher with a distinctive, scholarly tone"

Audio Quality and Performance

Optimization Features

  • Voice Caching: Generated voices are stored for reuse (in schedule barks, or multiple players in one spot during generation)

  • Compression: Audio files are optimized for Minecraft's sound system

  • Fallback System: Pre-recorded audio plays if AI generation fails

  • Performance Monitoring: System tracks generation times and success rates

Quality Settings

  • Sample Rate: Optimized for Minecraft's audio engine

  • Bit Depth: Balanced quality and file size

  • Format: Compatible with Minecraft's sound system

  • Length Limits: Prevents excessively long audio generation

Integration with Game Systems

Citizens Plugin

  • NPCs emit audio at their physical location (not for real time elevenlabs voice)

  • Sound travels realistically through the world

  • Players hear audio based on game audio settings

  • Pre-recorded audio: Works with standard Minecraft audio

  • AI-generated audio: Requires Story-Client mod for real-time playback

Conversation System

  • Audio plays automatically during AI conversations

  • Voice generation happens in real-time

  • Text and audio are synchronized for natural flow

Quest System

  • Quest-related dialogue includes voice generation

  • Important story moments use enhanced audio

  • Character voices maintain consistency across quest interactions

Best Practices

Voice Design

  1. Character Consistency: Use the same voice for each NPC across all interactions

  2. Appropriate Tone: Match voice characteristics to character personality

  3. Performance Balance: Balance audio quality with generation speed

  4. Fallback Planning: Always have pre-recorded alternatives

Performance Optimization

  1. Voice Caching: Enable caching for frequently used voices

  2. Batch Generation: Generate multiple voices during off-peak times

  3. Quality Settings: Adjust quality based on server performance

  4. Monitoring: Track generation success rates and adjust accordingly

User Experience

  1. Volume Control: Ensure voices are audible but not overwhelming

  2. Proximity Awareness: Use realistic sound falloff distances

  3. Contextual Audio: Match voice tone to conversation context

  4. Accessibility: Provide text alternatives for hearing-impaired players

Troubleshooting

Common Issues

  • No Audio Playing: Check ElevenLabs API key and voice generation settings

  • Poor Audio Quality: Verify API quota and network connection

  • Voice Inconsistency: Ensure custom voice IDs are correctly set

  • Performance Issues: Monitor generation times and enable caching

  • AI Voice Not Playing: Verify players have the Story-Client Fabric mod installed

  • Client Mod Issues: Check that the Story-Client mod is compatible with the server version

Debug Features

  • Audio Logging: Track voice generation requests and responses

  • Performance Metrics: Monitor generation times and success rates

  • Fallback Testing: Verify pre-recorded audio plays when AI fails

  • Voice Validation: Check that custom voice IDs exist in ElevenLabs

Future Development

Planned Features

  • Voice Cloning: Create custom voices from sample audio

  • Emotional AI: More sophisticated emotional voice generation

  • Multi-Language Support: Voices in different languages

  • Voice Customization: Player-created voice profiles

Performance Improvements

  • Local Generation: On-server voice generation for better performance

  • Advanced Caching: Smarter voice caching and management

  • Batch Processing: Generate multiple voices simultaneously

  • Quality Optimization: Better balance of quality and performance

API Integration

ElevenLabs Configuration

  • API Key: Required for AI voice generation

  • Voice IDs: Specific voices for different character types

  • Rate Limits: Respect API usage limits and quotas

  • Error Handling: Graceful fallback when API is unavailable

Voice Management

  • Voice Library: Maintain collection of available voices

  • Character Assignment: Link voices to specific NPCs

  • Quality Control: Monitor and maintain voice quality

  • Backup Systems: Ensure audio always plays, even if AI fails

Last updated