Daily TEA – Agents, Facts & First Dates

AI security, factuality, NASA, voice dating, NeurIPS

Dec 22, 2025

Hello, dear TEA-mates—here’s what you need to know today.

1.🛡️ AI Agent “ARTEMIS” Rivals Human Hackers in Real-World Pen Tests

A new paper introduces ARTEMIS, a multi‑agent cybersecurity scaffold evaluated against ten professional penetration testers on a live university network with about 8,000 hosts across 12 subnets. ARTEMIS discovered nine valid vulnerabilities with an 82% valid submission rate, placing second overall and outperforming nine of the ten human participants in both technical sophistication and submission quality. The study finds that AI agents excel at systematic enumeration, parallel exploitation, and lower cost—some ARTEMIS variants cost 18 dollars per hour versus 60 dollars per hour for human testers—while still showing gaps such as higher false‑positive rates and difficulty handling GUI‑based tasks. (Read More)

🫖 TEA For Thought: Even if the AI agent future is full of cybersecurity attacks, this paper also shows that AI agents are already better at protecting systems than human professionals. The future may be agents on top of agents on top of agents, where peer review and robust project design are the best answers to cybersecurity threats.

2.🧾 FACTS Leaderboard Sets a New Standard for Measuring Model “Factuality”

The FACTS Leaderboard paper proposes an online suite of benchmarks to evaluate language models’ ability to generate factually accurate text across diverse scenarios using automated judge models. It aggregates performance over four sub-leaderboards: Multimodal (image-based questions), Parametric (closed-book world knowledge from model parameters), Search (information-seeking with a search API), and Grounding v2 (long-form answers grounded in provided documents). By averaging scores across these components and maintaining both public and private splits, the suite aims to provide a robust, balanced, and hard‑to‑game measure of overall “factuality” that model developers and external participants can track over time. (Read More)

🫖 TEA For Thought: The real question is who gets to define what counts as a “fact.” If we judge models by how closely they match original documents, it might be more accurate to call this a “non-hallucination rate” instead of factuality. “Fact,” like “truth,” can be misleading, especially when we never see the full picture.

3.🚀 NASA Finally Gets a Full-Time Administrator in Jared Isaacman

NASA now has a full-time leader after the Senate confirmed billionaire pilot and private-astronaut Jared Isaacman as the agency’s 15th administrator, ending a 377‑day gap since his nomination by President Trump in late 2024. Isaacman, known for commanding multiple private spaceflights and organizing the first private spacewalk, was confirmed in a 67–30 vote despite concerns over his lack of political experience and questions about financial conflicts of interest. At 42, he becomes NASA’s youngest administrator, and arrives with an agenda to streamline a bureaucratic agency while pushing forward ambitious projects such as the Project Athena framework for reform and long‑term spaceflight goals. (Read More)

🫖 TEA For Thought: It may feel like old news by now, but rooting for Jared still matters—because in the end, it’s the mission that counts most.

4.🎙️ Voice AI Dating App “Known” Turns Conversations Into Real-Life Dates

San Francisco-based startup Known has built a dating app that uses a voice AI onboarding interview instead of forms or swipes, letting an AI agent spend an average of 26 minutes—sometimes over an hour—learning a user’s preferences, values, and personality. In its local beta, the company reports that roughly 80% of introductions led to in‑person dates, a conversion rate far higher than typical swipe-based apps, and has raised 9.7 million dollars from investors including Forerunner and NFX. Once onboarding is complete, the AI proposes matches, lets users query agents about potential partners, and sets tight time windows to accept intros and agree on dates to reduce ghosting while nudging people toward offline meetings. (Read More)

🫖 TEA For Thought: This is a great AI use case—identifying a real pain point in modern dating and using AI to solve it in a way that actually gets people off their screens and into real-life conversations.

5.🧠 NeurIPS 2025 Shows AI’s Brainpower Beat Doomscrolling

A new Wall Street Journal feature on NeurIPS 2025 describes how the once‑small machine-learning conference has grown into a sprawling AI mega‑event with more than 24,000 attendees and thousands of accepted papers focused on large language models, foundation models, and real-world deployment. Researchers and industry leaders there debated the limits of current systems—especially around reasoning, bias, and evaluation—while also treating NeurIPS as a de facto global summit for AI policy, investment, and talent. The piece frames the conference as a more substantive, intellectually demanding alternative to passive, short‑form content consumption, highlighting how dense technical talks and rigorous debate have become part of mainstream tech culture. (Read More)

🫖 TEA For Thought: When technology and intelligence become the main attraction, that beats scrolling through brain-rot short videos any day.