Daily TEA – The CEOs Who Fell for the Happy Path

AI delusions, verification, model “sleep”, MFA token theft, DeepSeek usage surge

May 28, 2026

Hello, dear TEA-mates! Here is what you need to know today.

1. 🧠 Language models may need “sleep”

Transformer-based LLMs struggle as context grows because attention cost scales poorly. A new approach proposes a “sleep-like” consolidation step where the model periodically reprocesses recent context offline, updates persistent fast weights via a learned local rule, clears the KV cache, then resumes inference with the consolidated memory. Tests on synthetic reasoning tasks and a long-horizon math reasoning setting show performance improves as “sleep duration” (number of offline passes) increases, with the biggest gains on deeper reasoning cases. (Read More)

🫖 TEA For Thought: “The point of dreaming.”

2. 🔍 Nobody owns “correct”, teams need “verifiable”

Harvey AI rebuilt a document review system that was already working because it was “insufficiently verifiable”, meaning it did not show evidence and reasoning at a per-statement level. The piece argues that “insufficiently verifiable” is a failure mode separate from “incorrect”, and that many orgs optimize for benchmark scores while missing whether outputs can be checked, trusted, and safely acted on. It highlights how AI evaluation can be gamed by non-answers that look good numerically, and why verification needs deterministic guardrails for scope and permissions, not just LLM judging. (Read More)

🫖 TEA For Thought: “Insufficiently verifiable is a failure mode distinct from incorrect.”

3. 🤖 CEOs and “AI psychosis” (happy-path delusions)

A TechCrunch piece argues some CEOs are overestimating what agents can do because they see AI demos on the “happy path” and are far from the last-mile work: reviewing code, finding bugs, catching hallucinated libraries, and dealing with edge cases. It cites Box CEO Aaron Levie’s framing that executives are uniquely prone to “AI psychosis”, plus notes 2026 layoffs have already nearly matched 2025 levels, with many companies pointing to AI as a rationale. It also points to research suggesting measured productivity gains often lag perceived gains, and that organizational bottlenecks can shift to executives when AI increases output volume. (Read More)

🫖 TEA For Thought: “But the thing is, a CEO is not just focusing on here and now, but also what AI could do later down the road. A lot of things that AI does now can be buggy, but with the improvement of models, which happens so fast and so frequently, they will not have to be the blocker anymore.”

4. 🔐 MFA resets and token theft are the new front door

An emerging dominant pattern in financial services intrusions is not stealing passwords, but getting MFA reset (often through social engineering) and capturing OAuth tokens that grant persistent access. The piece cites multiple reports: CrowdStrike describing vishing over Microsoft Teams to convince employees to reset MFA and register attacker devices, the FBI warning about token theft via device code flow, and Verizon DBIR data showing vulnerability exploitation and identity-based paths shifting. The takeaway is MFA protects password-based login, but attackers are increasingly bypassing it through resets, tokens, and workflow abuse. (Read More)

🫖 TEA For Thought: “This is even scarier when we give agents so many accesses via OAuth.”

5. 📈 DeepSeek tops OpenRouter usage rankings

OpenRouter usage data reportedly shows DeepSeek-V4-Flash leading global model usage with a weekly invocation volume of 3.43 trillion tokens, while total model usage reached 28.9 trillion tokens from May 18 to May 24, up 7.4% week-over-week. The article attributes growth to broader AI agent adoption and workloads like code generation, long-document processing, and enterprise retrieval that drive very large token consumption. It also reports Chinese model usage at 9.223 trillion tokens weekly versus US model usage at 4.93 trillion, with China leading for multiple consecutive weeks. (Read More)

🫖 TEA For Thought: “This will only go up exponentially.”

🛠️ Skill of the Day

Reality Check Prompt: sanity-test an “AI can do this” claim before you commit budget, headcount, or reputation.

You are a skeptical operations lead. Your job is to test whether an “AI agent can do X” claim is real, safe, and worth shipping.

Task claim: [PASTE THE CLAIM HERE]

Context: [WHO WILL USE IT, WHAT SYSTEMS IT TOUCHES, WHAT “DONE” MEANS]

Do this:

List the hidden steps the demo is skipping (last-mile work, edge cases, approvals, handoffs).
Identify the top 10 failure modes (incorrect output, missing context, wrong scope/permissions, compliance, data leakage).
Propose a minimum viable pilot that forces reality (smallest scope, clear success metrics, human review points).
Define a “stop rule”: what evidence would make us pause or kill the project.
Output a one-page decision memo with: Go/No-Go, risks, mitigations, and the next 7-day plan.

Constraints: