Speech Audio

The Speech Audio system in Story creates immersive, cinematic experiences by allowing characters to emit voice lines and AI-generated speech. The system integrates with Minecraft's sound engine and supports both pre-recorded audio files and real-time AI voice generation through ElevenLabs.

Audio Features

Voice Playback

Randomized Voice Selection: Voices are selected from preconfigured sound files per gender (e.g., feminine_01, masculine_05)
Anti-Repetition Logic: Prevents the same voice line from playing twice in a row
Proximity Sound Emission: Characters emit sounds at their location, audible to nearby players
Custom Voice Support: Individual NPCs can have unique voices through ElevenLabs integration

AI Voice Generation (v0.2.0+)

ElevenLabs Integration: Real-time voice generation for dynamic dialogue
Character-Specific Voices: Each NPC can have a unique voice profile
Contextual Speech: AI generates appropriate tone and emotion based on conversation context
Client-Side Mod Required: Real-time AI voice playback requires the Story-Client Fabric mod to be installed on each player's client

Voice Types

Pre-Recorded Audio

Simlish-Style: Non-language vocalizations similar to The Sims games
Gender-Based Pools: Separate voice collections for masculine and feminine characters
Emotional Variants: Different tones for various emotional states
Ambient Sounds: Background vocalizations for atmosphere

AI-Generated Speech

Natural Language: Full text-to-speech with proper pronunciation
Character Consistency: Each NPC maintains their unique voice across conversations
Emotional Range: AI adjusts tone based on context and character personality
Dynamic Content: Can speak any text, including player names and dynamic content
Client Mod Dependency: Requires Story-Client Fabric mod for real-time playback

Configuration

Basic Audio Settings

misc:
  voiceGenerationEnabled: true
  scheduleVoiceGenerationEnabled: true
  playerVoiceGenerationEnabled: true
  elevenLabsApiKey: "your_api_key_here"

NPC Voice Settings

npcs/elder_huran.yml

name: Elder Huran
role: Default role
location: Lysathara
context:
  "Elder Huran is ambitious, has the quirk of no quirks, is motivated by protecting their home, and their flaw is obsessive. They speak in a passionate tone."
appearance: ""
customVoice: DQCYGgKbvha45IXs96FO # define this if you want elevenlabs voice

Voice Assignment

Default Voices

Characters use gender-based voice pools by default
Voice selection is randomized from available options
Anti-repetition ensures variety in voice playback

Custom Voices

Set customVoice field in NPC data for unique voices
Use ElevenLabs voice IDs for specific character voices
Custom voices override default gender-based selection

Voice Generation

AI-generated voices are created on-demand during conversations
Voices are cached for performance and consistency
Fallback to pre-recorded audio if AI generation fails

Usage Examples

Basic Voice Playback

Player: "Hello there!"
NPC: [Plays voice line] "Greetings, traveler. How may I assist you?"

AI-Generated Speech

Player: "Tell me about the Brotherhood"
NPC: [AI generates voice] "The Brotherhood of the Unseen Path is an ancient organization..."

Note: AI-generated speech requires the Story-Client Fabric mod for real-time playback

Custom Voice Character

name: "Maester Valen"
customVoice: "eleven_labs_voice_id_here"
context: "A wise teacher with a distinctive, scholarly tone"

Audio Quality and Performance

Optimization Features

Voice Caching: Generated voices are stored for reuse (in schedule barks, or multiple players in one spot during generation)
Compression: Audio files are optimized for Minecraft's sound system
Fallback System: Pre-recorded audio plays if AI generation fails
Performance Monitoring: System tracks generation times and success rates

Quality Settings

Sample Rate: Optimized for Minecraft's audio engine
Bit Depth: Balanced quality and file size
Format: Compatible with Minecraft's sound system
Length Limits: Prevents excessively long audio generation

Integration with Game Systems

Citizens Plugin

NPCs emit audio at their physical location (not for real time elevenlabs voice)
Sound travels realistically through the world
Players hear audio based on game audio settings
Pre-recorded audio: Works with standard Minecraft audio
AI-generated audio: Requires Story-Client mod for real-time playback

Conversation System

Audio plays automatically during AI conversations
Voice generation happens in real-time
Text and audio are synchronized for natural flow

Quest System

Quest-related dialogue includes voice generation
Important story moments use enhanced audio
Character voices maintain consistency across quest interactions

Best Practices

Voice Design

Character Consistency: Use the same voice for each NPC across all interactions
Appropriate Tone: Match voice characteristics to character personality
Performance Balance: Balance audio quality with generation speed
Fallback Planning: Always have pre-recorded alternatives

Performance Optimization

Voice Caching: Enable caching for frequently used voices
Batch Generation: Generate multiple voices during off-peak times
Quality Settings: Adjust quality based on server performance
Monitoring: Track generation success rates and adjust accordingly

User Experience

Volume Control: Ensure voices are audible but not overwhelming
Proximity Awareness: Use realistic sound falloff distances
Contextual Audio: Match voice tone to conversation context
Accessibility: Provide text alternatives for hearing-impaired players

Troubleshooting

Common Issues

No Audio Playing: Check ElevenLabs API key and voice generation settings
Poor Audio Quality: Verify API quota and network connection
Voice Inconsistency: Ensure custom voice IDs are correctly set
Performance Issues: Monitor generation times and enable caching
AI Voice Not Playing: Verify players have the Story-Client Fabric mod installed
Client Mod Issues: Check that the Story-Client mod is compatible with the server version

Debug Features

Audio Logging: Track voice generation requests and responses
Performance Metrics: Monitor generation times and success rates
Fallback Testing: Verify pre-recorded audio plays when AI fails
Voice Validation: Check that custom voice IDs exist in ElevenLabs

Future Development

Planned Features

Voice Cloning: Create custom voices from sample audio
Emotional AI: More sophisticated emotional voice generation
Multi-Language Support: Voices in different languages
Voice Customization: Player-created voice profiles

Performance Improvements

Local Generation: On-server voice generation for better performance
Advanced Caching: Smarter voice caching and management
Batch Processing: Generate multiple voices simultaneously
Quality Optimization: Better balance of quality and performance

API Integration

ElevenLabs Configuration

API Key: Required for AI voice generation
Voice IDs: Specific voices for different character types
Rate Limits: Respect API usage limits and quotas
Error Handling: Graceful fallback when API is unavailable

Voice Management

Voice Library: Maintain collection of available voices
Character Assignment: Link voices to specific NPCs
Quality Control: Monitor and maintain voice quality
Backup Systems: Ensure audio always plays, even if AI fails

PreviousAI Generation NextQuests

Last updated 4 months ago

hashtagAudio Features

hashtagVoice Playback

hashtagAI Voice Generation (v0.2.0+)

hashtagVoice Types

hashtagPre-Recorded Audio

hashtagAI-Generated Speech

hashtagConfiguration

hashtagBasic Audio Settings

hashtagNPC Voice Settings

hashtagVoice Assignment

hashtagDefault Voices

hashtagCustom Voices

hashtagVoice Generation

hashtagUsage Examples

hashtagBasic Voice Playback

hashtagAI-Generated Speech

hashtagCustom Voice Character

hashtagAudio Quality and Performance

hashtagOptimization Features

hashtagQuality Settings

hashtagIntegration with Game Systems

hashtagCitizens Plugin

hashtagConversation System

hashtagQuest System

hashtagBest Practices

hashtagVoice Design

hashtagPerformance Optimization

hashtagUser Experience

hashtagTroubleshooting

hashtagCommon Issues

hashtagDebug Features

hashtagFuture Development

hashtagPlanned Features

hashtagPerformance Improvements

hashtagAPI Integration

hashtagElevenLabs Configuration

hashtagVoice Management

Audio Features

Voice Playback

AI Voice Generation (v0.2.0+)

Voice Types

Pre-Recorded Audio

AI-Generated Speech

Configuration

Basic Audio Settings

NPC Voice Settings

Voice Assignment

Default Voices

Custom Voices

Voice Generation

Usage Examples

Basic Voice Playback

AI-Generated Speech

Custom Voice Character

Audio Quality and Performance

Optimization Features

Quality Settings

Integration with Game Systems

Citizens Plugin

Conversation System

Quest System

Best Practices

Voice Design

Performance Optimization

User Experience

Troubleshooting

Common Issues

Debug Features

Future Development

Planned Features

Performance Improvements

API Integration

ElevenLabs Configuration

Voice Management