How many voices does Free Voice Reader offer?

Free Voice Reader offers 900+ AI voices including Google Neural, Wavenet, and standard voices across 100+ languages and accents.

Is Free Voice Reader free to use?

Yes. Free Voice Reader has a free tier with basic voices and limited daily usage. The Pro plan provides 87 hours of audio annually for $249/year.

How does Free Voice Reader compare to ElevenLabs?

Free Voice Reader is 89% cheaper than ElevenLabs, offering 87 hours of TTS audio for $249/year compared to ElevenLabs' limited character quotas at higher prices.

What formats does Free Voice Reader support?

Free Voice Reader accepts plain text and documents up to 1M characters. Audio is exported as MP3 files for instant download.

Your Voice Agents Just Got Eyes: ElevenLabs Multimodal Update

TL;DR:

Multimodal Support: Voice agents can now process images and PDFs mid-conversation using the new sendMultimodalMessage function in the JS SDK.
Scoped Conversation Analysis: Debugging multi-agent workflows is now drastically easier, allowing you to isolate metrics for specific sub-agents rather than parsing entire call transcripts.
Apple Ecosystem Upgrades: The new Swift SDK v3.1.2 brings ultra-low latency LiveKit WebRTC and reactive SwiftUI integration for Mac and iOS developers.
Workflow Overrides: Developers can now restrict specific agents to distinct tool_ids and knowledge_base documents to prevent hallucinations.

If you build or use voice AI tools daily, you already know the frustration of a "blind" voice agent. Imagine a user trying to read a 16-character router serial number aloud to a support bot, or spelling out a complex foreign address. It's a massive friction point that text-to-speech (TTS) and speech-to-text (STT) alone simply cannot solve.

In a major update to its ElevenAgents platform, ElevenLabs has fundamentally changed this dynamic. By introducing Multimodal Support and Scoped Conversation Analysis, the company is aggressively pivoting from a specialized voice-cloning provider into a comprehensive "Agentic AI" powerhouse.

Here is exactly what this means for your daily workflows, your app development, and the future of voice interfaces.

Multimodal Support: The End of "Spelling It Out"

The most immediately impactful feature for end-users is the addition of multimodal message support. The JavaScript SDK (@elevenlabs/client) now includes a sendMultimodalMessage hook.

Instead of forcing users to choose between a text chat or a voice call, developers can now build hybrid interactions. During a live, real-time voice conversation, a user can upload a photo of a broken product, a screenshot of an error code, or a PDF of a receipt. The agent can "see" this visual data and respond verbally in real-time.

This is a massive leap for data extraction and CRM integration. By allowing users to augment their voice with visual context, businesses can drastically reduce call times and eliminate the hallucination risks associated with poor phonetic transcriptions of complex data.

Scoped Conversation Analysis: Debugging the Multi-Agent Mess

As enterprises have started deploying ElevenAgents for complex tasks, they've run into a scaling problem: debugging a multi-agent workflow is a nightmare.

Previously, if you had a "Greeting Agent" that routed to a "Billing Agent" or a "Tech Support Agent," conversation analysis was applied to the entire call transcript. If an evaluation failed, pinpointing exactly which sub-agent dropped the ball required tedious manual review.

With Scoped Conversation Analysis, developers can now apply evaluation criteria and data collection items to either the full conversation or a specific agent node.

Technical Implementation

For the developers under the hood, here is how the new tools shape up:

Feature	Technical Implementation
Analysis API	`POST /v1/convai/conversations/{id}/analysis/run`
New Schema	`ScopedAnalysisResult` (array containing per-agent evaluation breakdowns)
JS SDK Hook	`useConversationControls().sendMultimodalMessage`
Input Type	`MultimodalMessageInput` (exported from `@elevenlabs/client`)
Workflow Config	`PromptAgentAPIModelOverrideConfig` now includes `tool_ids` and `knowledge_base`

By utilizing the new tool_ids and knowledge_base overrides, you can ensure your Billing Agent only has access to billing APIs, while your Tech Support Agent only searches your technical documentation. This sandbox approach is the most effective way to reduce hallucinations in production environments.

Mac and iOS Developers Get a Massive Boost

ElevenLabs has clearly prioritized the Apple ecosystem in this rollout. If you are building voice apps for Mac or iOS, the new Swift SDK v3.1.2 brings several quality-of-life improvements.

The SDK now utilizes LiveKit WebRTC for ultra-low latency audio streaming, ensuring that conversational prosody feels natural and uninterrupted. Furthermore, it features deep SwiftUI Integration. The SDK is fully reactive, meaning your iOS app's UI will automatically update its transcripts and visual states as the AI speaks, requiring zero manual state management from the developer.

ElevenLabs also added environment-specific agent connections, making it infinitely easier for iOS devs to toggle between "Development" and "Production" versions of their agents while testing on TestFlight.

The Competitive Landscape: ElevenLabs vs. OpenAI

Industry analysts are already dubbing ElevenLabs the "audio layer" of the internet, but they face stiff competition. OpenAI's Realtime API offers a highly capable "single-brain" multimodal experience.

However, where ElevenLabs continues to win is in production-ready prosody. While pure LLM-voice models might have a slight edge in raw latency, ElevenLabs' underlying models (like the newly available Eleven v3 and Scribe v2) offer unmatched voice quality, emotional nuance, and character consistency. With the addition of "Versioning" for A/B testing live traffic and structured "Agent Test Folders" for automated testing, ElevenLabs is clearly targeting serious, enterprise-grade developers who need granular control over their voice outputs.

The Privacy Angle: Cloud vs. Local

While the ability to send images, PDFs, and real-time voice data to a cloud-based agent is incredibly powerful, it also introduces significant privacy and cost concerns. Every multimodal message sent to a cloud API consumes tokens, and transmitting sensitive documents (like invoices or personal IDs) to third-party servers is often a non-starter for healthcare, finance, and privacy-conscious users.

If you love the power of voice AI but need to keep your data strictly on your own hardware, cloud APIs aren't the only way forward.

About FreeVoice Reader

FreeVoice Reader is a privacy-first voice AI suite that runs 100% locally on your device:

Mac App - Lightning-fast dictation, natural TTS, voice cloning, meeting transcription
iOS App - Custom keyboard for voice typing in any app
Android App - Floating voice overlay with custom commands
Web App - 900+ premium TTS voices in your browser

One-time purchase. No subscriptions. Your voice never leaves your device.

Try FreeVoice Reader →

Your Voice Agents Just Got Eyes: What ElevenLabs' Multimodal Update Means for Developers

Multimodal Support: The End of "Spelling It Out"

Scoped Conversation Analysis: Debugging the Multi-Agent Mess

Technical Implementation

Mac and iOS Developers Get a Massive Boost

The Competitive Landscape: ElevenLabs vs. OpenAI

The Privacy Angle: Cloud vs. Local

About FreeVoice Reader

Sources & References

Try Free Voice Reader for Mac

Related Articles

Native Audio AI Dictation: Why Text Summaries Miss the Sarcasm (And How to Fix It)

Best Zero-Cloud Voice-to-Text Apps for iPhone (2026 Comparison)

Android's New Offline Voice AI Transcribes and Summarizes Your Messy Audio in Real-Time