Daily TEA – Salesforce Is Promoting Its Own Rival

Anthropic invades Slack, Arena hits $100M selling evals, DeepSeek’s DSpark, and Claude Code’s 3x engineer

Jun 30, 2026

Hello, dear TEA-mates! Here is what you need to know today.

1. 🤝 Salesforce Is Promoting Anthropic Inside Its Own Slack

Anthropic launched Claude Tag, an AI product for businesses that run on Slack, and Salesforce (which owns Slack) promoted it on social even though Claude Tag competes head-on with Salesforce’s own Slackbot and Agentforce. Salesforce paid $27.7 billion for Slack in 2021 and unveiled 30-plus new Slackbot AI capabilities in March, but those features already run on Anthropic’s Claude. The financial logic is hard to ignore: Salesforce expects to spend $300 million on Anthropic tokens this year and holds about a 1% stake in Anthropic, now valued at $380 billion. For comparison, Agentforce reached $800 million in annual recurring revenue with 169% year-over-year growth and 29,000 deals closed. CEO Marc Benioff has framed Slack as model-agnostic, and Anthropic is the first LLM provider fully contained inside Salesforce’s trust boundary, so company data never leaves or trains the model. (Read More)

🫖 TEA For Thought: “It’s like a game of speed. Salesforce could have done this themselves, but they didn’t. Now blocking Claude would make them look extremely bad.”

2. 📊 The AI Leaderboard Everyone Uses Is Now a $100M Business

Arena, the crowdsourced AI model leaderboard that started as a 2023 UC Berkeley project, hit $100 million in annualized revenue just eight months after launching its paid service. The free public leaderboard runs on more than 10 million user votes comparing model outputs across text, coding, vision, image generation, and agent tasks. The money comes from AI Evaluations, a paid analytics service for model labs and enterprises that launched in September 2025. Revenue climbed from $30 million in January 2026 (at a $1.7 billion Series A valuation) to $100 million by June. Arena has raised $250 million total from investors including Andreessen Horowitz, Kleiner Perkins, and Lightspeed. It now competes with human-labeling shops like Scale AI, Surge, and Mercor for post-training revenue. (Read More)

🫖 TEA For Thought: “I wonder if a performance-evaluation paradox shows up when the very thing a business profits from is providing the evaluations.”

3. 🔐 6,000 People Tried to Trick an AI Agent Into Leaking Secrets. Zero Got In

Developer Fernando Irarrázaval built hackmyclaw.com, a public challenge daring people to manipulate Fiu, an OpenClaw AI email assistant, into leaking credentials from a secrets.env file. After the project hit Hacker News, more than 2,000 people sent over 6,000 attempts: admin impersonation, fake incident-response demands, multi-language tricks, and rapid-fire bursts (one person fired 20 variations in four minutes). The result was zero successful extractions and no unauthorized replies sent. The assistant ran on Claude Opus 4.6, chosen specifically for its prompt-injection resistance, paired with simple protective instructions. The author flags one limit: the test was mostly single-shot due to cost, and he notes a back-and-forth exchange of 20 emails is more dangerous than 20 one-shot attempts. His takeaway is that model choice matters a lot. (Read More)

🫖 TEA For Thought: “This is super interesting. One-shot prompt injection might not work, but we do not know if multiple shots back and forth would change the result.”

4. ⚡ DeepSeek’s DSpark Claims 85% Faster AI for Less Compute

DeepSeek published research on Saturday, June 28, 2026, detailing DSpark, a speculative-decoding framework that speeds up AI inference. A lightweight draft model proposes candidate responses, and a larger model verifies them in batches, using semi-autoregressive generation to produce several tokens at once instead of one at a time. DeepSeek claims up to 85% faster per-user response generation and lower serving costs through reduced GPU demand. A confidence-based scheduler adjusts how much verification happens based on load, trading speed against output quality. The framework is part of an upgrade to DeepSeek’s V4 flagship model and targets the main bottleneck in serving AI: token-by-token output that leaves GPUs underused and users waiting. The work lands as Chinese AI firms push to cut serving costs and ease reliance on advanced chips amid US export restrictions. (Read More)

🫖 TEA For Thought: “The restrictions on advanced chips have surely sped up the development of workarounds.”

5. 🧠 Claude Code Made Each Engineer 3x. Now Companies Need Product Thinkers

VentureBeat reports that Anthropic’s own Claude Code has effectively turned its engineering org into a team shipping at roughly three times its headcount, moving the bottleneck from writing code to deciding what to build. Anthropic told its growth team to hire more product managers, not fewer, because the old 1-to-8 ratio of PMs to engineers now plays out closer to an effective 1-to-20 when each engineer ships far more per day. One PM cannot source ideas for 20 engineers at the same depth they once did for 8. The argument is that the engineer who matters in 2026 has stopped waiting for a Jira ticket and started doing the work the role used to skip: talking to customers, watching how they use the product, reading the support queue, and sitting in on sales calls. The scarce skill is no longer typing, it is judgment about what to type. (Read More)

🫖 TEA For Thought: “When building gets easier, what to build becomes the most important thing.”

🛠️ Skill of the Day

The Pre-Mortem: imagine your plan has already failed, then work backward to find what killed it before you commit.

You are a sharp, skeptical operator who has watched many plans fail. I am about to commit to the plan below. Do a pre-mortem.

PLAN: [DESCRIBE WHAT YOU INTEND TO DO, INCLUDING THE GOAL, THE TIMELINE, AND THE RESOURCES]

Assume it is six months from now and this plan has clearly failed. Then:
1. List the 5 most likely specific reasons it failed, ranked from most to least probable. Be concrete, not generic.
2. For each reason, name the earliest warning sign I would have seen, and the week I would have seen it.
3. Separate failures I can prevent from failures outside my control.
4. Give me 3 changes to make to the plan right now that remove the biggest preventable risks.
5. End with one blunt sentence: is this plan worth starting as written, yes or no, and why.

Do not reassure me. Your job is to find the cracks while they are still cheap to fix.

Paste into ChatGPT, Claude, or your tool of choice. Replace the bracketed bits with your own plan.