productivity

I Built a Ruthless AI Negotiator to Prep for My Salary Review. Here's the Setup.

Prepping for a high-stakes meeting by talking to a mirror is dead. Here's how to set up an ultra-low latency, full-duplex voice AI that argues back, cuts you off, and helps you secure the bag.

FreeVoice Reader Team
FreeVoice Reader Team
#Productivity#Voice AI#Local LLMs

The Bottom Line

You are going to build a hyper-realistic, interrupting AI voice agent to practice your highest-stakes professional conversations—without leaking your proprietary data to the cloud.

The Problem with "Walkie-Talkie" AI

Practicing a pitch against ChatGPT is like playing tennis against a brick wall. It's polite, it waits its turn, and it never interrupts you.

But real high-stakes conversations—salary negotiations, M&A battles, hostile investor pitches—don't work like that. They are messy. People cut you off. They sigh. They use aggressive vocal tones to throw you off your game.

Until recently, practicing against voice AI was a strictly turn-based affair. You talked, you waited three seconds, the AI talked. It was passive Text-to-Speech (TTS).

Now, the standard has shifted to Full-Duplex Speech-to-Speech (S2S).

The AI can now listen to you while it's speaking. If you ramble, it barge-ins and cuts you off. Using models like Hume EVI 3 or ElevenLabs v3, it injects emotional prosody—sounding genuinely skeptical or frustrated.

Welcome to the era of the AI Sparring Partner.

The Engine Room: Cloud vs. Local Models

To build a sparring partner that doesn't break the immersion, you need ultra-low latency. If the AI takes longer than 500 milliseconds to respond, your brain registers it as a machine.

Here is what the 2026 performance landscape actually looks like:

The Cloud Heavyweights

If you have a fast internet connection and don't care about data privacy, cloud APIs are terrifyingly good.

  • Cartesia Sonic 3 (The Speed King): Pulling an insane 40-90ms Time-to-First-Audio (TTFA). At ~$0.05/minute, it's essential for rapid-fire pitch simulations where delays ruin the adrenaline spike.
  • ElevenLabs Conversational AI v3: Still the gold standard for raw emotional range. It costs ~$0.08/minute, but if you want an AI that authentically sounds like a disappointed CFO, this is it.
  • OpenAI Realtime API (GPT-5.3 Codex/4o): Native audio-to-audio. No separate STT (Speech-to-Text) and TTS modules. It handles turn-taking flawlessly, but at ~$0.30/minute total, the bill adds up fast.

The Local Rebels (Free & Private)

Power users are aggressively abandoning cloud APIs for sensitive simulations. If you are practicing a pitch involving unreleased financial data, you cannot pipe that to a server.

  • Kokoro-82M: The breakout star of the year. At just 82 million parameters, this model runs entirely locally on Mac (Metal) and Windows (CUDA) with about 1GB of VRAM. It's free, completely private, and pulls an RTF (Real-Time Factor) of 0.05x on an RTX 5090.
  • NVIDIA PersonaPlex-7B: A heavier full-duplex conversational model that streams understanding and generation simultaneously.
MetricCartesia Sonic 3OpenAI RealtimeKokoro-82M (Local)Whisper Large v3 (Turbo)
Latency (TTFA)40-90ms250ms120ms (on RTX 4090)N/A (STT only)
Accuracy (WER)N/A< 2.5%N/A1.8% - 2.1%
VRAM RequiredCloudCloud~1GB~6GB
Pricing$0.05/min$0.30/minFree (Local)Free (Local)

The "Salary Battle" Workflow

Reddit communities like r/SaaS and r/SillyTavernAI have been quietly perfecting the "Salary Battle" setup. Here is how you can replicate it today.

Step 1: The Local Stack Grab the Voice-Chat-AI repo from GitHub. You are going to use Ollama as the brain (running a smart, quantized model like Llama 3), Whisper.cpp for your ears, and Kokoro-82M for the AI's mouth.

Step 2: Context Loading (RAG) Feed the AI your actual job description, your brag sheet, and the raw text from your boss's LinkedIn profile.

Step 3: The System Prompt You need to force the AI out of its polite, default state. Use this exact prompt:

"Act as a highly skeptical CFO at my company. You have a strict budget constraint and believe I am already overpaid. Be firm, interrupt me immediately if I start to ramble, and use dense corporate jargon. Do not offer a raise easily."

Step 4: The Session Start talking. Practice the "barge-in." When the AI starts saying, "We just don't have the runway for—", cut it off mid-sentence with your counter-metric. If you're using a framework like Pipecat, the AI will instantly stop generating audio, listen to your interruption, and dynamically pivot its argument.

What to Do Now

If you're tired of recording yourself on Voice Memos and cringing at the playback, it's time to upgrade your workflow.

  1. Test the Speed: Check out platforms like Retell AI or Vapi in your browser to feel what sub-100ms conversational latency actually feels like.
  2. Go Local: Download Kokoro-82M from HuggingFace to get cloud-quality voice generation running locally on your hardware for zero ongoing cost.
  3. Build the Pipeline: Look into Pipecat (an open-source Python framework for voice AI) to wire up your own full-duplex sparring agent over the weekend.

About FreeVoice Reader

FreeVoice Reader is a privacy-first voice AI suite that runs 100% locally on your device:

  • Mac App - Lightning-fast dictation, natural TTS, voice cloning, meeting transcription
  • iOS App - Custom keyboard for voice typing in any app
  • Android App - Floating voice overlay with custom commands
  • Web App - 900+ premium TTS voices in your browser

One-time purchase. No subscriptions. Your voice never leaves your device.

Try FreeVoice Reader →

Transparency Notice: This article was written by AI, reviewed by humans. We fact-check all content for accuracy and ensure it provides genuine value to our readers.

Try Free Voice Reader for Mac

Experience lightning-fast, on-device speech technology with our Mac app. 100% private, no ongoing costs.

  • Fast Dictation - Type with your voice
  • Read Aloud - Listen to any text
  • Agent Mode - AI-powered processing
  • 100% Local - Private, no subscription

Related Articles

Found this article helpful? Share it with others!