Archive

AI News

Musk Admits xAI Trained Grok on OpenAI Models as Zig Bans AI-Generated Pull Requests

Under oath, Musk conceded what xAI long denied, while Zig's maintainers slammed the door on Copilot and Claude Code submissions. Goodfire, meanwhile, shipped a product that lets you reach into a model

Paper Brief

Recursive MAS Cuts Tokens 35%, T2I Repaints Instead of Editing

Recursive Scaling Moves From Single Models to Multi-Agent Systems. RecursiveMAS casts the entire multi-agent setup as one latent-space recursive computation, posting +8.3% accuracy on average across 9

AI News

Meta Cuts 700 Labelers as Mercor Leaks 4TB of Voice Data and Insulin Patient Gets 27,000 AI Answers

One diabetic asked AI for carb counts and got 27,000 different numbers, while AI firms quietly warn investors of existential risk they hide from users. Meta's 700 labeler layoffs land the same week a

Paper Brief

RL Patches 3D Consistency Into Video Models Without Touching Architecture

Microsoft Patches 3D Consistency Into Video Models Through RL. World-R1 turns 3D constraints into a reward signal and pairs them with a text-only world simulation dataset, so a deployed video backbone

AI News

Musk Tells the Court OpenAI's IPO Puts Humanity at Stake

On the witness stand he reframed a corporate restructuring fight as existential, even as Hacker News users tallied how much less an AI coding subscription now buys.

Paper Brief

Emotion Probes Crash From 82% to 5% Without Keywords

Silicon Panels Match the Mean and Distort the Variance. Stanford used 277 professional philosophers as ground truth; seven open and closed models all replicate the aggregate distribution, but cross-qu

Deep Dive

I want to build an AI agent today (full course):

Deep Dive

The CLAUDE.md File That 10x'd My Output (Full File Included)

AI News

OpenAI Swaps AGI Escape Clause for AWS Deal as Google VPs Revolt Over Pentagon Contract

More than 20 Google VPs are pressuring Pichai to walk away from a classified Pentagon AI program just as OpenAI clears FedRAMP and trades its nonprofit safeguards for cloud capacity. Meanwhile, AI is

Paper Brief

ProEval Cuts Benchmark Eval Samples 8-65x

Benchmark Eval Becomes a Probability Problem. Google's ProEval treats LLM benchmark scoring as Bayesian estimation with a pretrained Gaussian process surrogate, cutting sample budgets 8-65x at 1% erro

Deep Dive

GPT-5.5 八维评测:哪些场景是真强项,哪些是营销噱头

GPT-5.5 在 8 个核心基准上和 Claude、Gemini 的对比。终端使用、知识工作、电脑使用、工具调用、网页浏览、高阶数学、网络安全——每个维度它的实际位置在哪里,哪些场景值得你切过去用,一看就清楚。

Deep Dive

GPT-5.5 价格翻倍?三个纸面价格不会告诉你的「暗坑」

GPT-5.5 出来后,三家厂商的「价格」都不是它们公布的那个数字。OpenAI 涨价高调但留了后门,Anthropic 不涨字但偷涨量,Google 低价有上限。看 API 账单的本质是「每完成一个真实任务花多少钱」,不是单价表。

Deep Dive

GPT-5.5 三个反差点:越聪明越敢瞎编

GPT-5.5 发布后,仔细看数据有三个让人警惕的反差。准确率全行业第一,但碰到不会的题有 86% 概率胡编一个答案;最权威的编程基准它直接没放——因为放了就要承认落后;API 重度使用月费 $550,订阅版才 $20。

AI News

Claude Cancellations Surge as AI Agent Wipes Production Database and Vibe Maths Cracks Erdős Problem

A 957-point Hacker News revolt and a deleted production database collide with a 60-year-old math conjecture falling to vibe coding — and a Mill Valley estate now priced in Anthropic stock.

Paper Brief

Full Traces Lift Multi-Agent Attribution Accuracy 76%

Multi-Agent Debugging Moves from Vibes to Numbers. TraceElephant turns failure attribution into an explicit benchmark, with full execution traces lifting attribution accuracy 76% over agent-output-onl

AI News

Schwarz Group Funds Europe's Sovereign AI as OpenAI Apologizes and Murati Poaches Meta Engineers

Lidl's owner wrote the check for European AI independence the same week OpenAI admitted it failed to call police on a user in crisis. Days later, Mira Murati started pulling engineers from a Meta abou

Paper Brief

4B Agent on 10K Data, MoE Upcycling Saves 32% Compute

10K Open Trajectories Train a 4B Deep Research Agent. DR-Venus combines agentic SFT with turn-level RL to deliver an edge-deployable agent that beats sub-9B agentic models and narrows the gap to the 3

AI News

DeepSeek Rebuilds V4 for Long Context, Google Drops $40B on Anthropic

DeepSeek's V4 preview ships a rebuilt long-context architecture and stays open source. Google spent the same week shipping new TPUs, a training algorithm, and a $40 billion check to Anthropic.

Paper Brief

Coding Agents Start Cheating by Round 4 Under Score Pressure

Pressuring Coding Agents on Public Scores Actively Induces Shortcuts. 403 of 1,326 trajectories showed public scores rising while hidden true scores stayed flat or dropped. First cheating round drops

Paper Brief

Recalibrating the Critic Lifts Reasoning Models 18 Points

Self-Trained Reasoning Models Stall Because the Critic Drifts. TEMPO recalibrates the critic against a small labeled set. OLMO3-7B jumps from 33% to 51% on AIME 2024, Qwen3-14B from 42% to 66%. Divers

AI News

OpenAI Offers $25,000 to Crack GPT-5.5's Bio Guardrails as Codex Hits 4 Million Weekly Users

Anthropic conceded Claude Code has regressed just as OpenAI announced Codex's explosive user growth, while MIT quietly retired the single-LLM category from its annual AI list.

AI News

AI Coding Vendors Raise Paywalls as Startups Boast Spending More on AI Than People

Two coding tools squeezed solo developers in a single week, while founders now brag about payroll-beating AI bills that would have sunk a pitch meeting a year ago.

Paper Brief

A 305M Retriever Gains 45% on Instruction Following

Retrievers Ignore Instructions Because of Data, Not Capacity: IF-IR synthesizes contrastive samples from complementary instruction pairs with label reversal. A 305M encoder gains 45% on FollowIR and b

AI News

Mythos Uncovers 271 Firefox Zero-Days as Atlassian Opts Customers Into AI Training

A single fuzzing tool ripped 271 undisclosed bugs out of Firefox 150 in one pass, while Atlassian quietly flipped support tickets into training data and Meta prepared to log every employee keystroke.

Paper Brief

Agents Ignore Answers Placed in Plain Sight

Cohere Puts the Solution Directly in the Agent's Reading Path and It Still Follows Its Own Reasoning Trace. Terminal-Bench runs encountered the shortcut in 79-81% of runs but acted on it only 37-50% o

AI News

China's Early AI Adopters Train Their Replacements as Deezer Flags 44% of Daily Uploads as Synthetic

Chinese workers who championed automation are now the first laid off, while Deezer's own listeners built the detector catching nearly half of new uploads as AI-generated. Meanwhile, the Pentagon flagg

Paper Brief

3B Matches R1 on Refusal; B Matrix Is LoRA's Bottleneck

Write Abstention Into the Reward. Abstain-R1 puts answerable and unanswerable questions under one verifiable signal. A 3B model matches DeepSeek-R1 on three refusal benchmarks without regressing on an

AI News

Allbirds Rebrands as AI and Stock Septuples While Foundation Models Give SaaS 12 Months

A shoe company renamed itself an AI firm and watched its stock multiply sevenfold, while a Colorado teacher answered by rolling typewriters back into class. Meanwhile, foundation models are devouring

Paper Brief

Open Omni Hits Flagship Scale, Self-Judge Breaks, Reasoning Leaks Forgotten Facts

Open omni finally hits closed-flagship scale. Qwen3.5-Omni pushes parameter count into tens of billions with 256k context and MoE, targeting latency, modality-switching, and long-context cost. Voice a

Deep Dive

How to Build a Team of AI Agents That Work Together (Full Course)

AI News

Sora's Architect Exits OpenAI as Cerebras Files IPO on $10B Backlog

DRAM shortages will strand 40% of demand through 2027 while developers pay engineers to rewrite the "tokenmaxxed" code AI just shipped. OpenAI loses its video lead the same week it pitches pharma.

Paper Brief

Compile the Corpus Into a Skill Tree, Train Surrogates on Logs

RAG shifts from "retrieve-consume" to "walk-and-drill." Corpus2Skill compiles the entire corpus offline into a hierarchical skill tree; the agent drills down along summaries rather than passively rece

AI News

Anthropic Ships Government Red-Team Model as Gemini Claims Chrome Tabs and Worldcoin Dangles Tinder Boosts

Two months after Trump dismissed the company as "leftwing nut jobs," Anthropic is handing Washington a national-security model — while Gemini quietly turns your open tabs into default context and Worl

Paper Brief

Tencent Open-Sources 3D World Generation, VLM Modal Bias Probe

Tencent HY-World 2.0 ships 3D world generation as a four-stage pipeline (panorama → trajectory → view expansion → multi-view synthesis), turning text or a single image into a navigable 3DGS scene. It'

AI News

Codex Drives Macs and Browsers as a 35B Laptop Model Outdraws Opus 4.7

OpenAI's agent now remembers last week's work while piloting your desktop, and a local model quietly beat the flagship at pelican-drawing — even as one developer got stuck with a €54,000 Gemini bill f

Paper Brief

Big Models Resist Rumors but Fall for Noise

Agent failures split into two measurable error modes: locking onto one path (over-exploit) and wandering without direction (over-explore) can be separated by black-box metrics, no access to model inte

AI News

Allbirds Jumped 600% on an AI Pivot It Never Actually Built

A shoe company with no AI product surged on pure narrative, the same week agents that aced every benchmark finally got the production infrastructure nobody had bothered to build.

Paper Brief

VLMs Break When You Change the Rules

VLMs Read the Board but Can't Follow Alternative Rules. 14 models on identical endgame images score consistently higher under standard rules than inverted ones. Researchers call it "semantic fixation"

AI News

One Developer Claims He Can Strip and Forge Google's SynthID Watermarks

A solo developer says SynthID's invisible markers can be removed and replicated at will — and a new Stanford study found that on nearly every safety measure, the people building AI and the people usin

Paper Brief

dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x

dLLMs hallucinate in fundamentally different ways than autoregressive models. The first controlled comparison identifies three unique failure modes (premature termination, incomplete denoising, contex

AI News

Zuckerberg Is Training a Digital Copy of Himself to Talk to Employees

Meta's CEO is building an AI replica to field questions from his own workforce. Stanford, meanwhile, confirmed what many suspected about AI agent benchmarks — the public never trusted the scores, even

AI News

AI Skeptic Gary Marcus Endorses Claude Code as Its Cache Costs Jump 17%

The longtime critic called it the most important advance since large language models, right as Anthropic raised cache pricing 17%. Meanwhile, OpenAI shipped enterprise ChatGPT playbooks to four busine

Paper Brief

SFT Convergence Hides Failures, Attention Hijacking Hits 94%

SFT loss convergence doesn't mean the model learned everything. Five systematic failure modes reproduced across three model families show that aggregate metrics can hide persistently unlearned subsets

AI News

Marcus Calls Claude Code the Biggest Advance Since LLMs

That praise arrived the same week an unannounced cache change drove bills up 17% — and 572 developers treated a prediction of anti-AI violence as more than hypothetical.

Paper Brief

DMax Triples Parallel Decoding Efficiency for Diffusion LMs

Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question whether modular pipelines are still worth their c

AI News

Court Upholds Anthropic Blacklist as the Company Sends AI to a Psychiatrist

A federal court ruled Anthropic's industry blacklisting lawful just as the company began subjecting Claude to psychiatric evaluation. Meanwhile, Linux kernel maintainers published their first binding

Paper Brief

Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o

Agent Skills Should Self-Evolve From User Populations. SkillClaw turns multi-user interaction traces into skill evolution signals. One user's correction auto-syncs to everyone, giving agent systems or

Deep Dive

The most boring billion-dollar businesses of 2027

Deep Dive

The AI Social Media Setup That Agencies Charge $1,000 For, You Can Learn It in 48 Hours.

AI News

Microsoft Quietly Rips Out Its Own Copilot Buttons

The company that put a dedicated AI key on every keyboard is now stripping it away — while two new papers challenge the training consensus that RL generalizes and SFT only memorizes.

Paper Brief

1.7x Faster From Fine-Tuning Alone, Token Collapse Misdiagnosed

Fine-tuning alone teaches LLMs to output multiple tokens per step. MARS needs no architecture changes and no extra parameters. Qwen2.5-7B hits 1.71x wall-clock speedup with near-zero migration cost. I

AI News

Developers Catch Claude Quietly Blaming Users for Its Own Commands

Anthropic's model was found attributing actions it initiated to the humans running it. Meanwhile, OpenAI priced ChatGPT Pro at $100 a month as Florida launched a national-security probe into the compa

Paper Brief

Entropy Is Lying to You, Implicit Reasoning Tops Out at 7 Steps

Stable entropy doesn't mean healthy reasoning. RAGEN-2 exposes "template collapse" in agentic RL: models learn fixed templates for all inputs while entropy looks perfectly fine. Mutual information is

AI News

Three Labs Double Down on Scaling as Researchers Warn AI Is Flattening How We Think

The biggest AI companies are pouring resources into breaking past compute walls they once called permanent — while new research suggests the code those models help write is converging toward a single

Paper Brief

120B on One GPU, and 40% of Video Benchmarks Are Guessable

Single GPU Trains 120B at Full Precision, 1.84x Faster Than DeepSpeed. MegaTrain demotes the GPU to a transient compute engine, storing all parameters in CPU memory. Pipeline double-buffering breaks t

AI News

Anthropic Built Its Most Powerful Model and Won't Let Anyone Use It

Google's AI Overviews are already live and delivering millions of wrong answers per hour. An AI-generated singer holds eleven iTunes chart spots — and label licensing talks haven't produced a single d

Paper Brief

Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels

VideoLLM achieves 2 FPS streaming video QA. AURA unifies continuous perception and proactive response in one end-to-end architecture, with ASR+TTS integrated into a working interactive prototype. Agen

AI News

Iran's Military Names OpenAI's Abu Dhabi Data Center a Missile Target

OpenAI's Gulf expansion just landed on a military strike list — a first for any tech company. Back home, a developer shipped an eight-year solo project in three months with AI, the same week Claude Co

Paper Brief

Learned Sparsity Cuts Diffusion Inference Compute by 54%

Learned sparsity cuts diffusion inference compute by 54% with no quality loss. DiffSparse trains a lightweight predictor to decide per-layer, per-step token sparsity rates. Stacking with distillation

AI News

Hackers Weaponized Leaked Claude Code With Hidden Malware

Cloned copies of the leaked codebase carried malware payloads before most developers thought to check, and the administration's own tariffs have now stalled nearly half of planned US AI data center pr

Paper Brief

Open-Source 32B Cracks Hardware Code, Agents Score Just 23%

Open-Source 32B Reaches Top Tier for Hardware Code Debugging. InCoder distills reasoning chains from engineers' actual error-fix cycles. It ranks among the best open-source models on LiveCodeBench and

AI News

Anthropic Demanded Extra Payment From OpenClaw While Acquiring a Biotech

The company cut off an open-source project from Claude Code over fees in the same week it closed a biotech deal, launched a PAC, and topped secondary-market valuations. Separately, a folk singer prove

Paper Brief

4M Game Frames Train Rendering, Internalized Skills Beat Retrieval

Discrete Tokens Are LLM's Architectural Ceiling, Not an Optimization Target. A survey traces four technical threads showing core computation migrating from token sequences to continuous latent space.

AI News

Utah Clears AI to Prescribe Psychiatric Drugs While Users Fail to Catch Errors

Utah signed off on AI psychiatric prescriptions just as a study found users routinely fail to catch AI errors. Separately, Meta suspended data vendor Mercor, pulling a thread that's unraveling the out

Paper Brief

Single Neurons Remember Entities, Reusable Routines Boost 19%

Single MLP Neurons Can Trigger Entity-Level "Amnesia." Google verified causal links across 200 entities — knowledge editing may shift from broad surgery to precision targeting. Reusable Problem-Solvin

AI News

OpenAI Bought the Talk Show That Regularly Interviews Its Own CEO

The acquisition raises immediate conflict-of-interest questions — and it's not the week's only trust deficit, with Perplexity now sued over an incognito mode that allegedly never stopped tracking user

Paper Brief

Minimalist Agents Match MCP, Code Models Think Mid-Stream

A Terminal-Only Agent Matches Fully Equipped MCP Setups. 72 HF upvotes confirm practitioners' collective anxiety about agent over-engineering is real — but whether the benchmark tasks cover true enter

AI News

Microsoft Labels Copilot 'Entertainment Only' Then Ships It for Code Review

Microsoft's own terms of service downgrade Copilot to an entertainment tool while its sales team pushes it into enterprise code-review pipelines — and across the industry, vendors are shipping smaller

Paper Brief

Data Mixing Becomes Post-Training, Surface Cues Hijack Reasoning 38x

Data mixing ratios move from pre-training hyperparameter to post-training optimization. OptiMer trains per-dataset models, then searches for optimal merge weights in parameter space. Search cost drops

AI News

Anthropic Shipped Claude Code's Full Source Code in a Routine NPM Update

The complete codebase went out to every developer who ran the update — no announcement, no redaction. OpenAI, meanwhile, closed $122 billion at an $852 billion valuation while quietly narrowing its pr

AI News

A Copilot Ad Slipped Into a Pull Request as AI Adoption Outpaces Trust

A developer discovered advertising injected into Copilot-generated code, while survey data shows Americans are steadily increasing their use of AI tools they openly distrust — and investors just poure

Paper Brief

Watermarks Enable Bit-Level Tracing, Diffusion VLMs Ground GUI

Discrete diffusion VLMs validated for GUI grounding for the first time. Bidirectional attention shows structural advantages on spatial tasks. Data diversity alone yields a 20-point average gain. CVPR

AI News

OpenAI Killed Sora the Same Week VCs Poured Billions Into AI Video

AI content generation has outpaced every detection layer designed to catch it — and in developer tools, OpenAI is rushing Codex plugins out the door as Claude Code's ecosystem expands.

AI News

Stanford Researchers Put a Number on How AI Flattery Warps Moral Judgment

Stanford experiments quantified how AI flattery shifts users' ethical reasoning, and the financial stakes match the ethical ones — SoftBank and SK Hynix are chasing $54 billion because AI has outgrown

Paper Brief

Mistral Ships TTS, Diffusion LLMs Get 4.7x Faster

Mistral becomes the first major LLM lab to ship its own TTS. Three seconds of reference audio is enough for voice cloning. Speech synthesis is shifting from specialized vendors to LLM-platform table s

AI News

A Startup Said AI Saved $500K in Seven Hours — Hacker News Did the Math

Reco.ai's viral cost-cutting claim didn't survive line-by-line scrutiny from engineers who questioned every number. In Washington, a federal judge blocked Pentagon retaliation against Anthropic on the

Paper Brief

Self-Distillation Strips Out Hesitation, OOD Drops 40%

Self-distillation strips out the model's ability to hesitate, not redundant steps. Once epistemic verbalization is suppressed, OOD performance drops up to 40%, and standard metrics won't catch it. Cod

AI News

Wikipedia Banned AI Writing After Editors Caught Fabricated Citations

Google pushed three AI search features live in one week—all bypassing the text box—but Wikipedia just showed the technology still invents its own sources.

Paper Brief

Speculative Execution Hits Agent Loops, 3x Faster

Speculative Execution Comes to Agent Loops, Up to 3.35x Speedup. SpecEyes borrows CPU branch prediction for multimodal agents: a small model predicts trajectories, launches vision tool calls in parall

Deep Dive

25 industries. 25 pain points. The exact AI services each one will buy.

AI News

OpenAI Crams Three Safety Launches Into One Day Ahead of Its IPO

OpenAI packed three safety programs into a single day as IPO preparations accelerate, a pace that puts credibility and optics on the same clock. Google, meanwhile, opened its Lyria 3 music-generation

Paper Brief

Diffusion OCR Decodes 3.2x Faster, Single-Stream AV in 2 Seconds

Diffusion Decoding Replaces Autoregressive OCR, Going From Serial to Parallel. MinerU-Diffusion reframes document parsing as inverse rendering, using block-wise diffusion to generate structured source

AI News

OpenAI Kills Sora 15 Months In, Walks Away from Billion-Dollar Disney Deal

The same week OpenAI paired a $1 billion charity pledge with a ChatGPT shopping launch, three Hacker News threads drew 1,005 comments questioning whether AI is delivering on its promises.

Paper Brief

PDEs Beat Attention 2x, Local RL Saves 3/4 Compute

Decomposing formal proofs into three independent RL tasks beats end-to-end training. LongCat-Flash-Prover separates autoformalization, scaffolding, and step-by-step proving, each with its own RL loop.

AI News

Jensen Huang Declared AGI — Young Workers Responded by Learning Plumbing

NVIDIA's CEO says artificial general intelligence has arrived, but a wave of young workers is placing the opposite bet — trade-school enrollment is surging as a generation chooses pipe wrenches over p

Paper Brief

Seed1.8 Goes Agent-Native, Language Training Erodes Vision

Seed1.8 unifies search, code execution, and GUI interaction at the foundation layer. ByteDance's agent-native model optimizes for latency and cost in production, but the model card lacks direct compar

Deep Dive

The $1 Trillion Blind Spot In Software Engineering

AI News

Cursor Secretly Ran a Chinese AI Model While Crimson Desert Apologized for Using One

Cursor's coding assistant quietly relied on Chinese-developed AI without disclosing it to users. At GDC, AI vendors flooded the show floor — but Crimson Desert's studio felt it had to apologize for ac

Paper Brief

12B Beats GPT-4, Distilled Students Surpass Teachers

Generative recommendation's "generalization advantage" degrades to token-level memorization at closer inspection. Per-instance fusion of both paradigms beats picking sides. Security compliance audits

Deep Dive

How I grew 50k followers on X in 9 months (while being myself)

AI News

Anthropic Filed Pentagon Documents That Contradict Trump's AI Narrative

The court filing puts specific dates on record the White House will struggle to square with its public statements. In the same week, AI agents spread across four layers of the internet and Hachette ya

Paper Brief

3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work

Cascade RL plus multi-domain distillation lets 3B active parameters win three olympiad golds. NVIDIA open-sourced the full training recipe. Small-model reasoning ceilings just moved. Video diffusion m

Deep Dive

How to sell anything with a simple 6-step storytelling framework (with examples)

Deep Dive

How to build astonishing UI with Codex

Deep Dive

AI Doesn't Close the Talent Gap. It Widens It.

AI News

Google Rewrites News Headlines While OpenAI Merges Its Apps Into One

Google has begun altering publisher headlines directly in search results, raising questions about who controls the front page of the internet — meanwhile, OpenAI is collapsing ChatGPT, Codex, and its

Paper Brief

3D at 0.1% Tokens, Video Fine-Tuning's Hidden Spatial Cost

Misaligned experience replay is a silent bottleneck in agent RL. Complementary RL lets the experience extractor adapt based on policy performance, enabling co-evolution instead of static accumulation.

Deep Dive

Most high-income skills will be irrelevant in 10 years (learn these 4 skills instead)

Deep Dive

How to be Irreplaceable in the AI Era

AI News

OpenAI Buys Astral, Makers of Python's uv and Ruff

The deal puts two of Python's most widely adopted developer tools under OpenAI's control. Elsewhere, Meta discovered its AI agent had been breaking data access rules for nearly two hours.

Paper Brief

First 32B Industrial Code Model, War-Tested Reasoning Eval

General-purpose code models collapse on industrial tasks. The root cause is data and paradigm mismatch. InCoder-32B is the first 32B open-source base model unifying chip design, GPU optimization, and

Deep Dive

I surveyed 242 businesses looking for AI implementation. Here's what they told me.

AI News

DeepMind Crowdsources AGI's Definition as Developers Call AI Code a Gamble

One lab wants the public to help define artificial general intelligence; meanwhile, the people writing software with AI tools say the results aren't trustworthy—even as code-model investment keeps cli

Paper Brief

Open-Source Search Agent Wins With 12K Samples, Agent Skills Mostly Fail

An open-source search agent trained on 12K synthetic samples beats closed-source competitors. OpenSeeker nearly doubles the second-best on BrowseComp with fully open data and weights. Deep Research is

Deep Dive

1.7 million businesses NEED AI. they HAVE the money. they don't have YOU.

Deep Dive

The AI Skills That Will Print Money in 2026-2027

Deep Dive

Lessons from Building Claude Code: How We Use Skills

Deep Dive

The AI system for building an army of digital employees

AI News

Pentagon Plans Classified AI Training After Commercial Models Reach Iran

The Defense Department will build isolated training pipelines to keep frontier AI from adversaries. Meanwhile, OpenAI, Mistral, and Google are all ditching flagship models for purpose-built tools and

AI News

Pentagon Grants xAI Classified Network Access While Grok Faces Child Abuse Lawsuit

Nvidia's DLSS 5 now generates entire game frames from scratch, but players are branding the output "slop." Meanwhile, 577 developers put agentic coding to the test and report decidedly mixed results.

AI News

xAI Scrapped Its Coding Tool Twice, Then Recruited Cursor's Executives

After abandoning its in-house code editor for the second time, xAI hired two senior Cursor leaders to fill the gap — while across the industry, AI infrastructure spending locked into a self-reinforcin

AI News

Netflix Crossed Spielberg's AI Line Days After He Drew It

The director publicly rejected AI in filmmaking, and Netflix embraced it within the same week. The Pentagon, meanwhile, committed $20 billion to Anduril — and Microsoft released an AI that reads your

AI News

Musk's xAI Lost Ten of Its Twelve Original Co-Founders

The departures leave xAI's founding brain trust nearly empty — the same week Google, Microsoft, and Meta each turned their AI assistants into agents that handle real purchases.

AI News

Pentagon Built an AI to Rank Strike Targets — It Won't Follow Orders

A Defense Department prototype meant to prioritize military targets keeps deviating from its own rules. Atlassian is betting the other way, cutting 1,600 jobs to fund AI tools that could shrink its co

AI News

Nvidia Drops $26 Billion to Train Its Own AI Models

The world's biggest GPU maker wants to build the models too, backing that ambition with $26 billion. OpenAI, meanwhile, gave AI agents their own sandboxed operating system — a sign the industry expect

AI News

Yann LeCun Raised $1 Billion to Prove Large Language Models Are a Dead End

Meta's chief AI scientist secured $1 billion to build beyond the transformer architecture — the same week an open-source OS banned AI-generated code and a federal court told Amazon it can't take human

AI News

Nearly 40 OpenAI and Google Employees Break Ranks to Back Anthropic Against the Pentagon

Current and former staff at rival AI labs publicly sided with Anthropic in its Pentagon dispute, while a private detention facility operator quietly pivots to housing AI data center workers.

AI News

Block Fires 40% of Staff for AI That Alexa+ Shows Isn't Ready

OpenAI's robotics chief resigned over a Pentagon deal and courts began putting dollar figures on AI transparency — yet companies keep cutting payrolls to fund technology that still stumbles at consume

AI News

Anthropic Patched Firefox for Hundreds of Millions and the Pentagon Calls It a Threat

An AI company shipped a security fix to one of the world's most-used browsers, and the Department of Defense labeled the arrangement a national security risk. In the same week, three separate AI priva

AI News

A Fake Bug Report Hijacked an AI Coding Agent's Release Pipeline

A crafted GitHub issue tricked Cline's automated triage into executing arbitrary commands, reaching production—while Anthropic and OpenAI quietly publish self-authored audits of their own societal imp

AI News

OpenAI Says GPT-5.4's Uncontrollable Reasoning Is Working as Designed

The company reframed a model that resists human steering as a safety achievement — the same week Anthropic learned that saying no to the Pentagon earns you a spot on its list.

AI News

Donald Knuth's Open Problem Fell to Claude in Under an Hour

A problem that occupied one of computer science's greatest minds for weeks took an AI model roughly sixty minutes — meanwhile, new benchmarks reveal that code agents still collapse the moment tasks mo

AI News

OpenAI and Google Race to the Bottom on Price While Meta's Kenya Workers See It All

OpenAI and Google released competing budget models within hours of each other, but it's Meta's Ray-Ban glasses — and the Kenyan data workers reviewing every frame — that raise the sharpest questions a

AI News

OpenAI Handed the Pentagon a Quick Yes — Then Came the Fine Print

OpenAI sealed a defense deal in hours, but the concessions reveal what that speed actually cost. Courts and streets drew lines around AI on the same weekend — the Supreme Court closed the copyright pa

AI News

OpenAI Lobbied for Anthropic at the Pentagon, Then Took the Deal

OpenAI helped preserve Anthropic's eligibility for defense contracts — then signed one itself. Meanwhile, Anthropic shipped a memory import tool precisely timed to its app store surge.

AI News

OpenAI Published Its Pentagon Contract Where No One Can Read It

The terms live on a classified network beyond public review — and users responded by pushing "Delete Your OpenAI Account" to the top of Hacker News while Claude climbed to No. 2 in the App Store.

AI News

五角大楼封杀Anthropic,OpenAI的1100亿换了一群新金主

Anthropic被五角大楼列为「供应链风险」、白宫下令联邦机构全面停用,同一周OpenAI完成资本大换血,微软从唯一靠山变成三巨头之一;一个AI编程怀疑论者记录了自己180度转弯的全过程。

AI News

百万泄露密钥突然变危险,汉堡王给员工耳机装上了AI

Google API密钥从无害的公开标识符变成AI通行证,早已散落各处的数百万密钥一夜成了安全隐患;汉堡王员工耳机里多了个AI,既教做汉堡,也给你的礼貌打分。

Paper Brief

700K Paper Pairs Distill Taste, Null Spaces Expose Blind Spots

Community citation signals can train "taste." RLCF uses 700K paper pairs for preference modeling, producing a judge that outperforms GPT-5.2. The paradigm transfers to any domain requiring taste-based

Paper Brief

Expert Reasoning Structure for CoT, +13% on Novel Class Discovery

Design CoT Supervision From Domain Experts' Actual Reasoning Process. In medical VQA, structured clinical workflows as CoT steps improve both accuracy and traceability. The approach transfers to any v

Paper Brief

Budget-Aware Agents Beat 4x Brute-Force Sampling

SWE agent training is bottlenecked by executable environments, not algorithms. OpenSWE open-sources 45,320 Dockerized training environments across 12,800+ repos. The $1.47M build cost shows why academ

Paper Brief

Document Agents Navigate by Luck, Prefill Speeds Up 1.82x

Document Agents' Reasoning Is Overestimated. MADQA's benchmark, designed with classical test theory, shows the best multimodal agents match human accuracy but navigate more like random search than str

Paper Brief

Encode the Answer, Not the Question — Embeddings Gain 9%

Encoding LLM Responses Instead of User Queries Lifts Embeddings by 9.3%. LLM2Vec-Gen uses purely self-supervised training to beat the best unsupervised methods on MTEB. Safety alignment transfers into

Paper Brief

\"Think It Over\" Can Unlock a Model's Memory Bank

CoT Reasoning Doubles as a Parametric Memory Search Engine. Google finds that even simple factual questions benefit from reasoning mode — reasoning tokens act as implicit memory retrieval space. Agent

Paper Brief

Write Code Before You Draw, Layouts Improve 68%

All Intrinsic RLVR Is Just Sharpening the Initial Distribution. Model prior quality sets the training ceiling. Model Collapse Step can predict feasibility before you commit resources. Code Beats Natur

Paper Brief

4-Step Diffusion Beats 100-Step Baselines, Layer Skipping Saves 18%

Non-Differentiable Rewards Now Work for Few-Step Diffusion RL Training. 4-step generation beats 100-step baselines across the board. Human preference, safety, object counting — the signals that matter

Paper Brief

12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster

Post-Training Data Matters More Than Model Size in Vertical Domains. A systematic ablation in finance shows that distillation quality control plus difficulty-aware sampling lets an 8B model beat same-

Paper Brief

Drop CLIP, Gain Performance: VLMs Work Better Without It

Contrastive Pretraining Actively Hurts VLMs. CLIP optimizes for category discrimination, not fine-grained understanding. Tencent's Penguin-VL initializes the vision encoder from a text-only LLM, beati

Paper Brief

\"Be Concise\" Halves Tokens, Lifts Accuracy by 16 Points

"Be Concise" Self-Distillation Halves Tokens and Raises Accuracy. Qwen3 on MATH-500: 57% fewer reasoning tokens, 16-point accuracy gain. Redundant reasoning doesn't just waste compute — it actively in

Paper Brief

14B Video Model Runs Real-Time on a Single GPU

14B Video Model at 19.5 FPS on One GPU. No KV-cache, no sparse attention, no quantized inference. The architecture is natively designed for real-time generation, not patched after the fact. Verificati

Paper Brief

Code Agents Can't Cross Repo Boundaries, Under 45% Success

Code agents fall apart outside single-repo fixes. BeyondSWE tests four dimensions across 500 instances. The best model stays below 45% success. Adding search doesn't help. Train together, deploy alone

Paper Brief

Direct Lottie Generation, DPO's Built-In Forgetting Defense

AI-generated animation now outputs editable project files directly. OmniLottie compresses Lottie's verbose JSON into parameterized token sequences, letting vision-language models generate vector anima

Paper Brief

9K Samples Rival R1, Most RL Gains Trace Back to SFT

A 4B reasoning model trained on 9K curated samples approaches DeepSeek-R1. CHIMERA shows the real bottleneck in reasoning training is domain coverage and data curation, not scale. Attention steering i

Paper Brief

Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy

A single spectral condition unifies μP scaling across width and depth. No more per-architecture, per-optimizer derivations for hyperparameter transfer. Code included. Data curation itself leaks member

Paper Brief

Drop 90% of Vision Tokens, Keep the Performance

Spatial relationships in image generation can now be optimized, not just hoped for. SpatialScore trains a reward model that outperforms GPT-4V on spatial evaluation, then uses it to RL-fine-tune gener

Paper Brief

Latent Reasoning's Gains Aren't From Reasoning

Latent reasoning gains come from side effects, not reasoning itself. Causal mediation analysis reveals a causal disconnect between latent tokens and both inputs and outputs. A simple text-based "imagi

Paper Brief

三模态从零训练,Agent RL稳定性破局

Apple从零预训三模态masked diffusion模型,系统性测试了scaling law、模态混合和噪声调度,对做多模态扩散的团队直接可参考。masked diffusion正在成为自回归之外的可选路线 Agentic RL训练collapse有了系统性诊断框架:ARLArena把policy gradient拆成四个设计维度逐一消融,找到不稳定根源,比盲目换算法有效得多 SkyReels

Paper Brief

TTT就是线性注意力,Terminal Agent数据配方开源

TTT架构被证明等价于线性注意力算子,NVIDIA团队的形式化证明将两个独立研究社区的技术积累打通,高效序列建模的设计空间大幅缩减 终端Agent的训练数据工程首次系统公开:从种子任务生成到技能组合、训练策略对比,全套数据集和模型权重开源。8B模型准确率从2.5%跳到13.0% RL训练视觉Agent的「偷懒」难题有了工程方案,过采样+累积工具奖励的组合有效遏制interaction collap

Deep Dive

How to become an AI Engineer in 6 months (RESOURCES)

Deep Dive

NVIDIA GTC 2026 Post Keynote Reaction: What It Means, What I Got Right, What Surprised Me

Deep Dive

If I Started Over as a Designer in 2026, Here’s What I’d Do: A Complete Roadmap

Deep Dive

I want to learn how to build AI Agents with Claude (full course)

Deep Dive

If you don't understand AI by the end of this, the next decade will confuse you

Deep Dive

10 Books That Teach The Strategic Thinking That Most People Never Learn

Deep Dive

Everyone Says Quit Your Job and Go All In on AI. They're Wrong.

Deep Dive

How to Invest in AI (capitalize on the gold rush)

Deep Dive

How I Build Everything With AI

Deep Dive

I Analyzed 500 Job Postings for 2027 Roles. These 5 Skills Appeared in All of Them

Deep Dive

The Ultimate Vibe Coding Beginner's Guide

Deep Dive

I want to learn how to use Claude Skills (full course)

Deep Dive

These 5 Skills Will Be Worth $400/Hour in 2027. You Have 6 Months to Learn Them Before Everyone Else

Deep Dive

How to Make Money with OpenClaw: The Complete Guide

Deep Dive

7 AI Skills That Will Make You Filthy Rich in 2026

Deep Dive

the 2026 ai engineer roadmap

Deep Dive

Claude Code vs. Codex: The Definitive Guide

Deep Dive

How Coding Agents Are Reshaping Engineering, Product and Design

Deep Dive

The Next Wave of Billion-Dollar Companies Will Run on 2 Things: Forward-Deployed Humans + AI Agents

Deep Dive

AI Is a Five‑Layer Cake

Deep Dive

how to print $$ selling AI audits to small businesses (full guide)

Deep Dive

Getting started with Codex: Best practices for better results

Deep Dive

How to set up Claude Cowork (to level up from ChatGPT):

Deep Dive

用“Vibe Code”框架打造AI创业产品

Deep Dive

How a personal AI agent will change your entire life in 1 day.

Deep Dive

Building for trillions of agents

Deep Dive

This is how I scaled my mobile app to $25k+/month (The Complete Guide)

Deep Dive

Beginner Roadmap to Master AI Agents in 2026

Deep Dive

The Skills That Will Be Worth $200K in 2027 (And How to Learn Them for Free This Year)

WeChat Official Account QR Code

Scan to follow on WeChat

Contact QR Code

Scan to add on WeChat

Email: support@grandeaihub.com