AI Weekly: DeepSeek V4, Falcon-H1R, Open-Source Surge

This Week in AI

It's been exactly one year since DeepSeek's R1 model triggered a $750 billion stock market wipeout — and the company is about to do it again. Meanwhile, small models are embarrassing big ones, OpenAI is releasing open-weight models (yes, really), and the Chinese open-source community is having a moment. Here's what you need to know.

🔥 DeepSeek V4 Targets Mid-February Launch — Insiders Say It Beats Claude and ChatGPT at Coding

The biggest story this week: DeepSeek is preparing to release V4 around February 17th, coinciding with Lunar New Year.

People with direct knowledge of the project claim V4 outperforms both Anthropic's Claude and OpenAI's GPT series on internal benchmarks, particularly when handling extremely long code prompts. The model incorporates DeepSeek's newly published Engram memory architecture and their breakthrough Manifold-Constrained Hyper-Connections paper, co-authored by founder Liang Wenfeng.

The backstory matters: DeepSeek originally planned to release R2 in May 2025, but founder Liang delayed it because he wasn't satisfied with performance. That perfectionism seems to be paying off — V4 is reportedly the result of that extended development.

Why it matters for you: If V4 delivers on these claims, expect another round of price drops across the industry. DeepSeek already offers frontier-level performance at $0.07/million tokens — roughly 50x cheaper than GPT-5. For anyone building AI workflows or using coding assistants, this could reshape what you pay (or don't pay) for AI.

Falcon-H1R 7B: A 7-Billion Parameter Model That Embarrasses Models 7x Its Size

The Technology Innovation Institute (TII) unveiled Falcon-H1R 7B, a compact model built on a Transformer–Mamba hybrid architecture that delivers genuinely surprising results:

88.1% on AIME-24 math benchmark (beating 15B-parameter Apriel 1.5's 86.2%)
68.6% on LCB v6 coding tasks (outperforming 32B-parameter Qwen3 by ~7 points)
Processes ~1,500 tokens/second per GPU at batch size 64

This isn't just academic. Falcon-H1R fits on consumer hardware — it's a genuine local AI option for developers and power users running Apple Silicon Macs or modest GPUs.

What you can do: If you're running local models via Ollama or LM Studio, keep an eye on Falcon-H1R quantizations. This could be the best code-assist model you can run on a MacBook.

OpenAI Goes Open-Source: GPT-oss-120B and GPT-oss-20B Released Under Apache 2.0

In a move nobody predicted two years ago, OpenAI released open-weight models under the Apache 2.0 license. GPT-oss-120B and GPT-oss-20B are optimized for agentic workflows, tool use, and few-shot function calling.

The 20B variant is specifically optimized for consumer hardware deployment, which signals OpenAI is taking the local AI movement seriously. This is a company that fought tooth and nail against open-source for years.

Why it matters: The 20B model could be a game-changer for on-device AI. If it runs well on Apple Silicon, expect it to show up in Mac apps within weeks.

GLM-4.7 (Thinking) Leads Open-Source Rankings

GLM-4.7 has quietly taken the top spot in open-source model rankings this month. The numbers are hard to argue with:

89% on LiveCodeBench — matching or exceeding GPT-5 on coding
95% on reasoning benchmarks
Completely free to download and use

Combined with Qwen3-Next (1T+ parameters via MoE, 92.3% on AIME25, Apache 2.0 license) and DeepSeek V3.2, the open-source ecosystem now has three frontier-class model families competing head-to-head with proprietary offerings.

⚡ Quick Hits

NVIDIA PersonaPlex-7B — A speech-to-speech model that generates English speech responses from English speech input. Interesting for real-time voice assistants.
SmolDocling (256M params) — A compact vision-language model that performs end-to-end document conversion. Turns scanned PDFs into structured text at a fraction of the compute cost.
China Open Source Highlights collection — HuggingFace's curated January 2026 collection. The top-liked models on HuggingFace are no longer majority US-developed. The gap between Chinese and Western frontier models has shrunk from months to weeks.
One year since the DeepSeek crash — On January 27, 2025, the S&P 500 shed $750B and Nvidia alone lost $590B after DeepSeek R1 launched. US hyperscalers have since increased AI infrastructure spending to $600B+. The lesson: efficient models didn't kill demand — they expanded it.

What We're Watching Next Week

DeepSeek V4's February 17 launch date is approaching fast. Anthropic is expected to release Claude 5 in "early 2026" — February or March seems likely. And with OpenAI's open-weight models now in the wild, expect the local AI community to start publishing benchmarks and quantizations on r/LocalLLaMA within days.

About FreeVoice Reader

FreeVoice Reader is a privacy-first voice AI suite for Mac. It runs 100% locally on Apple Silicon, offering:

Lightning-fast dictation using Parakeet/Whisper AI
Natural text-to-speech with 9 Kokoro voices
Voice cloning from short audio samples
Meeting transcription with speaker identification

No cloud, no subscriptions, no data collection. Your voice never leaves your device.

Try FreeVoice Reader →

AI Weekly: DeepSeek V4 Is Coming for ChatGPT, Falcon-H1R Punches Above Its Weight, and 4 More Stories

This Week in AI

🔥 DeepSeek V4 Targets Mid-February Launch — Insiders Say It Beats Claude and ChatGPT at Coding

Falcon-H1R 7B: A 7-Billion Parameter Model That Embarrasses Models 7x Its Size

OpenAI Goes Open-Source: GPT-oss-120B and GPT-oss-20B Released Under Apache 2.0

GLM-4.7 (Thinking) Leads Open-Source Rankings

⚡ Quick Hits

What We're Watching Next Week

About FreeVoice Reader

Sources & References

Try Free Voice Reader for Mac

Related Articles

Meeting Bots in 2026: Building Visible vs. Invisible AI Agents

Building Custom Voice Agents on Mobile: The 2026 Guide

Local Voice AI for Unity in 2026: The Ultimate Offline Stack