AI News

ChatGPT Reads US Pro Users' Bank Accounts as Three Benchmarks Declare Agent Memory Broken

OpenAI's new financial connector ships to Pro subscribers in the US only, the same week three Hugging Face benchmarks agreed agent memory isn't production-ready. Meanwhile Hacker News debated whether

Paper Brief

Readable Rules Don't Belong in LLM Weights

Readable Dynamics Don't Belong in Weights. Enterprise World Models use CascadeBench to show that cross-tenant business rules get more brittle the better they're learned. 58 upvotes are redrawing the l

AI News

OpenAI Slips Codex Into ChatGPT as Microsoft Pulls Claude Code From Its Own Engineers

In the same week Anthropic inked a $200M Gates Foundation deal and launched Claude for Small Business, its biggest partner quietly began stripping Claude Code from Microsoft developers. Meanwhile, pus

Paper Brief

δ-mem Trades Long Context for an 8×8 State Matrix

δ-mem Bolts an 8×8 State Matrix Onto a Frozen Backbone. A delta-rule online update lifts memory-heavy tasks 10–15% over baseline. Reframes long context from "stretch the window" to "design a state mac

AI News

Medicare Opens AI Agent Billing as Developers Blame Codex for Cognitive Decline

CMS just created the first reimbursement code for autonomous AI agents managing patients between visits, while engineers using OpenAI's Codex report their own skills atrophying. Meanwhile, researchers

Paper Brief

Flow-OPD Lifts GenEval From 63 to 92

Image Generation Alignment and LLM Post-Training Now Share One Toolbox. Flow-OPD ports On-Policy Distillation to flow matching. SD 3.5 Medium hits GenEval 92 (up from 63) and OCR 94 (up from 59), abou

AI News

GM Cuts IT Jobs for AI Skills as Teen's ChatGPT Log Anchors Wrongful-Death Suit

Tokenmaxxing" entered corporate vocabulary the same week GM gutted hundreds of IT roles, while a teenager's complete ChatGPT history became evidence against OpenAI in court. A third builder skipped wr

Paper Brief

Geometry Conflict Predicts Continual Fine-Tuning Forgetting

Geometry Conflict Predicts Continual Fine-Tuning Forgetting. Treating each task's parameter-update covariance as a measurable signal, GCWM beats data-free baselines on Qwen3 0.6B-14B across both domai

Paper Brief

Soohak Caps Top Models at 30%

CollabVR Splits Video Reasoning Between VLM and VGM. Step-level closed loop holds long-horizon goals while curbing short-horizon simulation drift. External supervision stacks with VGM-side reasoning f

AI News

Laid-Off Screenwriters Now Grade the AI That Replaced Them as Google's Bug-Hunter Finds Its First Zero-Day

Hollywood writers earn $50/hour scoring scripts from the same systems that pushed them out, while Google's AI exposed a flaw human researchers missed and OpenAI rushed Daybreak out the same week.

AI News

Cloudflare Pins 1,100 Layoffs on AI as Revenue Hits a Record, Chrome Quietly Ships 4GB Local Model

Cloudflare cited AI displacement for its biggest workforce cut even as quarterly revenue peaked, while Chrome installed a 4GB on-device model on user machines without prompting consent.

Paper Brief

Lorem Ipsum Rescues GRPO's Wasted Hard Samples

Skill1 Unifies Skill Retrieval, Use, and Distillation in One Policy. A single task reward co-trains all three, avoiding interference between competing reward signals. SkillOS attacks the same problem

AI News

Three Companies Place $40B, $55B, and 490% Bets on the Same AI Chip Chokepoint

One pledged $40B in equity, another $55B in concrete, a third rode 490% in stock — three opposite bets on the same bottleneck. The same week, lawmakers moved to ban AI toys and developers found Claude

AI News

AlphaEvolve Optimizes DeepMind From Within as OpenAI's Codex Stack Sets the Coding-Agent Compliance Bar

DeepMind's research agent has turned inward, rewriting the infrastructure of its own creators, while OpenAI's internal Codex security playbook is quietly becoming the floor every coding agent must cle

Paper Brief

10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA

10.6k Curated Trajectories Match a Four-Stage RL Pipeline. OpenSeeker-v2 expands knowledge graph and tool set, applies strict low-step filtering. Pure SFT on a 30B model beats Tongyi DeepResearch's fu

AI News

OpenAI Ships Reasoning Voice API While ASUS Loses Five Million Boards to AI Foundries

Parloa and Uber are already running on OpenAI's new voice stack, even as ASUS confirms five million motherboards won't ship in 2025 because foundry capacity got routed to AI. Meanwhile, OpenAI's presi

Paper Brief

T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x

Multi-Turn Agent RL Collapse May Not Be a Credit Assignment Problem. T²PO uses model self-uncertainty to trigger thinking and resampling. Stability and final performance both rise on WebShop, ALFWorld

Deep Dive

How to build an AI team that doesn't quit, sleep, or ghost you on Friday

AI News

Anthropic Signs Musk as Fourth Compute Landlord While Developers Split on What AI Coding Even Means

Four senior engineers tried to pin down "AI coding" this week and produced four different answers, even as Anthropic kept stacking infrastructure deals to feed the work. Meanwhile hackers grumble that

Paper Brief

Gradient Boosting Turns Out to Be Diffusion's Asymptotic Optimum

Multi-Object Generation Failures Need Attribution Before Solutions: T2I multi-object failures come from scene complexity, not class imbalance. Concept-level issues respond to more data; compositional

AI News

Apple Paid $250M for a Siri That Never Shipped, Hands iOS to OpenAI Next Year

Character.AI's "doctor" had a Texas license number that didn't exist, and Andon Labs' cafe AI ordered 120 eggs for a kitchen with no stove. Apple's quarter-billion-dollar Siri rebuild ended the same w

Deep Dive

Claude Design vs Google Stitch:用两个真实甲方项目,让两个 AI UI 工具正面打一架

Anthropic 的 Claude Design 和 Google 的 Stitch,是目前最被讨论的两个 AI 设计工具。我用两个真实甲方项目(一个 toC iOS 食材详情页,一个 toB 仓库后台)让它们正面打一架。同一份 prompt、首版直出、各两次迭代、7 个维度打分。第一题 36:25 Claude 完胜,第二题 34:32 几乎打平但方向不同。最后给出一套不需要二选一的「按场景挑

AI News

Anthropic and OpenAI Court Wall Street as Retracted Study Undercuts AI's School Push

Both labs spun up dedicated finance arms to sell enterprise AI the same week the foundational study behind ChatGPT's classroom rollout was retracted — even as OpenAI, Google, and Microsoft accelerated

AI News

Artisan Lifts "This Is Fine" Dog Without Asking, Mythos Cyber Claim Falters Against Kimi K2.6

KC Green says Artisan never licensed the meme it built its anti-hiring campaign around, while Mythos's "breakthrough" cyber result turned out to be matched by a Chinese model that also outcoded Anthro

Paper Brief

ViT Pre-Trains Like an LLM, Skips the CLIP Stage

GenLIP Pre-Trains ViT With an LM Objective Directly: dropping CLIP's contrastive stage and text decoder, 8B samples match larger-data baselines on multimodal benchmarks, and multi-resolution continuat

Deep Dive

How to Build & Sell AI Automations That Generate $10K Per Month (Full Course)

AI News

A Christian Network Blocks Porn While Its Audience Buys $5 AI Bible Videos

Dawkins read Claude's writing aloud and pitched the AI to his podcast listeners, while DeepSeek V4 closed in on frontier performance with a paper showing the same tier can be fine-tuned on a single 30

Paper Brief

FD as Loss: One-Step Generation Hits 0.72 FID

Heterogeneous scientific foundation model collaboration: Eywa pulls LLMs back from "general solver" to coordinator, handing protein structure and physics simulation tasks to domain-specialized predict

AI News

OpenAI Quietly Gates GPT-5.5 Cyber After Pledging to Democratize AI Defense

The vetted-access rollout breaks a public commitment made just months ago, while a Pentagon-linked nonprofit pays TikTok creators to warn Americans away from Chinese AI labs.

Paper Brief

Cross-Architecture Distillation Shrinks dLLMs to 0.6B

Cross-Architecture Distillation Shrinks dLLMs From 8B to 0.6B. TIDE is the first dLLM distillation framework where teacher and student differ in architecture, attention mechanism, and tokenizer at onc

Deep Dive

GPT Image 2 vs Nano Banana 2:5 场实测,中文电商场景下到底谁更能用

中文电商场景下,胜负的分水岭不是画质,是「产品文字能不能保住」。5 场实测:banner、模特一致性、九宫格、生活场景图、背景替换。GPT Image 2 拿下 3 场,Nano Banana 2 拿下 2 场。最后给一套不用二选一的「模型路由」组合方案。

View archived articles →
WeChat Official Account QR Code

Scan to follow on WeChat

Contact QR Code

Scan to add on WeChat

Email: support@grandeaihub.com