# The AI Tea

> A calm daily AI briefing with short, source-backed updates and plain-English explainers for curious readers.

Product promise: Three calm, source-backed AI updates a day, plus plain-English explainers for the ideas that keep coming up.
Current edition: June 2026 Edition · Issue No. 002
Freshness rule: public posts use source material no older than 14 days.

## Topic hubs
- [AI Tools](https://theaitea.news/topics/ai-tools/): AI tools are the apps, APIs, and workflows that make AI useful in daily work. This desk focuses on what launched, who it helps, and whether it is worth watching.
- [AI News](https://theaitea.news/topics/ai-news/): AI news moves quickly. This desk filters the noise and explains the updates most likely to affect users, builders, companies, and policy watchers.
- [AI Research](https://theaitea.news/topics/ai-research/): AI research can be dense. This desk translates papers, benchmarks, evaluation work, and model behavior into practical context.
- [AI Safety](https://theaitea.news/topics/ai-safety/): AI safety is about whether systems behave reliably, securely, and responsibly. This desk avoids panic and focuses on practical risk signals.
- [AI Agents](https://theaitea.news/topics/ai-agents/): AI agents are systems that can plan steps and use tools. This desk tracks where agentic workflows are becoming real and where they still need caution.
- [AI Policy](https://theaitea.news/topics/ai-policy/): AI policy shapes how models are tested, released, bought, and governed. This desk explains policy changes in practical terms.

## Glossary
- AI agent: An AI system that can plan steps, use tools, and work toward a goal with less step-by-step prompting from a person. A useful agent still needs permissions, constraints, monitoring, and a clear definition of success.
- Benchmark: A structured test used to compare how well AI systems perform on a task. Benchmarks are useful for comparison, but they can become stale, overfit, or disconnected from real user workflows.
- Model release: A public launch or update of an AI model, often with new capabilities, pricing, safety notes, or developer access. The important question is not only what the model can do, but who can use it and under what limits.
- Source freshness: The gap between when the original source was published and when The AI Tea published its explanation. News posts must use sources no older than 14 days so readers are not handed stale AI updates as if they were new.
- Primary source: The original announcement, paper, documentation, report, or policy page behind a story, rather than a secondhand summary. Primary sources make it easier to verify claims and catch missing context.
- RAG: Retrieval-augmented generation: a method where an AI system looks up documents or data before answering, so the response can be grounded in specific sources instead of relying only on model memory.
- Open-weight model: A model whose trained weights are publicly available under a license, allowing developers to run, inspect, or adapt it more directly than a closed API model. Open weights do not automatically mean unrestricted use.
- Evaluation: A process for testing an AI system’s capability, safety, reliability, or limitations before or after release. Good evaluations explain the task, the limits, and what result would count as failure.
- Tool use: When an AI system calls external tools such as search, code execution, databases, calendars, or APIs. Tool use can make AI more useful, but it also creates permission, security, and reliability questions.
- Context window: The amount of text, files, or conversation history an AI model can consider at once. Larger context windows can help with long documents, but they do not guarantee better reasoning or perfect memory.
- Hallucination: A confident AI answer that is wrong, unsupported, or invented. Source links, retrieval, evaluations, and human review can reduce the risk, but not eliminate it entirely.
- Guardrail: A rule, model, workflow, or product limit designed to reduce unsafe or unwanted AI behavior. Guardrails are useful signals, but they need testing because attackers and edge cases can bypass weak ones.
- Inference: The stage when an AI model is actually being used to produce an answer, image, audio clip, prediction, or action. Inference cost, speed, and reliability matter for real products.
- Fine-tuning: Additional training that adapts a model for a narrower task, style, or domain. Fine-tuning can help consistency, but it does not replace source grounding, evaluation, or careful data handling.
- AI policy: Rules, standards, laws, procurement requirements, or public commitments that shape how AI systems are built, released, purchased, and monitored.

## Latest briefings
- [OpenAI expands content provenance with C2PA conformance and SynthID watermarks](https://theaitea.news/updates/openai-content-provenance-c2pa-synthid-verification-tool/): OpenAI says it is now C2PA conformant, is adding SynthID watermarks to its AI images, and is previewing a public tool to verify OpenAI image origin.
- [Notion launches a Developer Platform with Workers, external agents, and a CLI](https://theaitea.news/updates/notion-developer-platform-workers-external-agents-cli/): Notion launched a Developer Platform with sandboxed Workers, an External Agent API, and an `ntn` CLI to deploy code and connect agents inside workspaces.
- [EngiAI proposes a multi-agent benchmark for LLM-driven engineering design](https://theaitea.news/updates/engiai-multi-agent-engineering-design-benchmark/): An arXiv paper proposes EngiBench and EngiAI, a LangGraph-based system to benchmark agent workflows across tool use, retrieval, and HPC orchestration.
- [DeepMind demos an AI-enabled pointer and Gemini help in Chrome](https://theaitea.news/updates/deepmind-ai-enabled-pointer-gemini-chrome/): DeepMind shared interaction principles and demos for an AI-enabled mouse pointer, and says Gemini in Chrome can answer questions about the exact part of a webpage you point to.
- [DeepMind and Singapore launch a national partnership for frontier AI](https://theaitea.news/updates/deepmind-singapore-national-ai-partnership/): DeepMind says it is expanding its Singapore work with new programs in healthcare, education, and sustainability, as part of Google’s national AI partnership with the Singapore Government.
- [Microsoft Research releases MagenticLite and small models for local agents](https://theaitea.news/updates/microsoft-magenticlite-magenticbrain-fara1-5/): Microsoft Research released MagenticLite plus two small models, MagenticBrain and Fara1.5, aiming to run agentic workflows across the browser and local files on a user’s machine.
- [NVIDIA outlines verified skills and signing for AI agent capabilities](https://theaitea.news/updates/nvidia-verified-agent-skills-governance/): NVIDIA describes a “verified agent skills” catalog with scanning, signing, and machine-readable skill cards to help teams trust and audit reusable agent capabilities.
- [Anthropic posts a disclosure dashboard for Claude Mythos security findings](https://theaitea.news/updates/anthropic-cvd-dashboard-mythos-vulnerability-disclosures/): Anthropic’s dashboard says Claude Mythos Preview has generated thousands of vulnerability findings, with 1,596 issues disclosed across 281 open-source projects as of May 22, 2026.
- [GRAFT proposes graph-tokenized LLMs for dependency-aware tool planning](https://theaitea.news/updates/graft-graph-tokenized-llm-tool-planning/): A May 12 arXiv paper proposes GRAFT, mapping tools to special tokens and training on sampled trajectories to improve whether multi-step tool plans follow dependency constraints.
- [Stability AI releases Stable Audio 3.0 with open-weight models](https://theaitea.news/updates/stability-stable-audio-3-open-weights/): Stability AI released Stable Audio 3.0, including open-weight Small and Medium checkpoints trained on licensed data, plus a Large model offered via its API for higher-volume use.
- [Cohere releases Command A+, an Apache-licensed open-source MoE model](https://theaitea.news/updates/cohere-command-a-plus-open-source/): Cohere released Command A+, an Apache 2.0 open-source MoE model positioned for agentic workflows, multimodal inputs, and long context while targeting self-hosted enterprise deployment.
- [Study finds a knowing–doing gap in LLM tool use decisions](https://theaitea.news/updates/knowing-doing-gap-llm-tool-use/): A new arXiv paper reports that LLMs can often recognize when a tool is needed but still fail to execute the tool call, creating a reliability gap for agent workflows.
- [Google updates the Gemini app with Daily Brief, Spark, and new models](https://theaitea.news/updates/gemini-app-agentic-spark-daily-brief/): Google says the Gemini app is adding Daily Brief and Gemini Spark plus models like Gemini 3.5 Flash and Gemini Omni, aiming to make the assistant more proactive.
- [DeepMind unveils Co‑Scientist, a multi-agent AI for hypothesis generation](https://theaitea.news/updates/deepmind-co-scientist-multi-agent-research/): Google DeepMind introduced Co‑Scientist, a multi-agent Gemini system for generating and ranking research hypotheses, alongside a Nature paper and an experimental tool.
- [Anthropic acquires Stainless to strengthen SDK and MCP tooling](https://theaitea.news/updates/anthropic-acquires-stainless-sdk-mcp/): Anthropic says it acquired Stainless, the company behind its official API SDKs, to bring SDK and MCP server tooling in-house for better developer and agent connectivity.
- [Anthropic and PwC expand alliance to deploy Claude at scale](https://theaitea.news/updates/anthropic-pwc-expand-claude-alliance/): Anthropic and PwC say they are expanding their alliance, including rolling out Claude Code and Cowork, creating a joint center of excellence, and training 30,000 PwC staff.
- [ArXiv paper: compare reasoning models by correcting for length](https://theaitea.news/updates/reasoning-models-trajectory-length-correction/): A new arXiv paper studies hidden-state trajectories during chain-of-thought and argues you must correct for response length before comparing “reasoning” behavior across tasks.
- [OpenAI previews Codex in the ChatGPT mobile app](https://theaitea.news/updates/openai-codex-mobile-chatgpt-preview/): OpenAI says Codex is now available in preview in the ChatGPT mobile app, letting you check in on long-running work and approve or redirect it from your phone.
- [Hugging Face and IBM open a leaderboard for AI agent testing](https://theaitea.news/updates/the-open-agent-leaderboard/): Hugging Face and IBM Research introduced an Open Agent Leaderboard to compare how well AI agents handle tool use and multi-step tasks.
- [Hugging Face shows how to fine-tune NVIDIA Cosmos for robot video](https://theaitea.news/updates/fine-tuning-nvidia-cosmos-predict-2-5-with-lora-dora-for-robot-video-generation/): Hugging Face and NVIDIA shared a workflow for fine-tuning Cosmos Predict with LoRA and DoRA methods for robot-video generation tasks.
- [OpenAI and Dell bring Codex closer to enterprise infrastructure](https://theaitea.news/updates/openai-and-dell-partner-to-bring-codex-to-hybrid-and-on-premise-enterprise/): OpenAI and Dell are partnering to support Codex in hybrid and on-premise enterprise environments where companies need tighter data controls.
- [AutoScout24 says AI workflows are changing its engineering cycle](https://theaitea.news/updates/autoscout24-scales-engineering-with-ai-powered-workflows/): OpenAI’s AutoScout24 case study shows how one marketplace company is using ChatGPT and Codex to speed engineering work and improve code review.
- [OpenAI launches DeployCo to help companies put AI into production](https://theaitea.news/updates/openai-launches-deployco-to-help-businesses-build-around-intelligence/): OpenAI launched DeployCo, a services-style effort meant to help companies turn frontier AI models into working business systems.
- [OpenAI adds stronger voice models for realtime API apps](https://theaitea.news/updates/advancing-voice-intelligence-with-new-models-in-the-api/): OpenAI introduced newer API voice models for realtime conversation, translation, and transcription workflows that developers can build into apps.
- [OpenAI explains how it sandboxes Codex on Windows](https://theaitea.news/updates/building-a-safe-effective-sandbox-to-enable-codex-on-windows/): OpenAI detailed the Windows sandbox work behind Codex, showing how coding agents can be given useful access without unlimited system permissions.
- [Sea Limited describes using Codex across engineering teams](https://theaitea.news/updates/sea-s-view-on-the-future-of-agentic-software-development-with-codex/): OpenAI’s Sea Limited case study shows how a large Asian technology company is thinking about Codex and agentic software development.
- [OpenAI updates ChatGPT context handling for sensitive conversations](https://theaitea.news/updates/helping-chatgpt-better-recognize-context-in-sensitive-conversations/): OpenAI says it is improving how ChatGPT recognizes context in sensitive conversations, especially when risk signals appear over time.
- [Databricks brings GPT-5.5 into enterprise agent workflows](https://theaitea.news/updates/databricks-brings-gpt-5-5-to-enterprise-agent-workflows/): OpenAI says Databricks is using GPT-5.5 for enterprise agent workflows after benchmark gains on office-style knowledge tasks.
- [Hugging Face outlines AWS building blocks for model training and inference](https://theaitea.news/updates/building-blocks-for-foundation-model-training-and-inference-on-aws/): Hugging Face and AWS published a practical overview of infrastructure pieces teams use to train, deploy, and serve foundation models.
- [OpenAI and Malta expand national access to ChatGPT Plus](https://theaitea.news/updates/openai-and-malta-partner-to-bring-chatgpt-plus-to-all-citizens/): OpenAI and Malta announced a partnership to give citizens ChatGPT Plus access and training, turning AI adoption into a national digital-skills project.
- [MathArena paper argues benchmarks are saturating](https://theaitea.news/updates/matharena-evaluation-platform-for-llm-mathematics/): A new arXiv paper expands MathArena into a continuously maintained evaluation platform for LLM mathematical reasoning, aiming to reduce benchmark saturation and improve comparisons.
- [Anthropic releases finance agents and Microsoft 365 add-ins](https://theaitea.news/updates/anthropic-finance-agent-templates-microsoft-365-addins/): Anthropic says it is releasing ten finance agent templates and Claude add-ins for Microsoft 365, so teams can run governed workflows across Excel, PowerPoint, Word, and Outlook.
- [OpenAI rotates macOS certificates after TanStack npm attack](https://theaitea.news/updates/openai-tanstack-npm-supply-chain-attack-response/): OpenAI says a TanStack npm compromise impacted two employee devices and it is rotating code-signing certificates, requiring macOS app updates by June 12, 2026.
- [Meta paper argues compute-optimal scaling should count bytes, not tokens](https://theaitea.news/updates/meta-compute-optimal-tokenization-bytes-not-tokens/): Meta researchers say tokenization changes scaling behavior and report results suggesting compute-optimal training should track data in bytes, not tokens.
- [NVIDIA and ServiceNow expand partnership for governed autonomous agents](https://theaitea.news/updates/nvidia-servicenow-project-arc-openshell-governed-agents/): NVIDIA says it is expanding work with ServiceNow on governed autonomous agents, including ServiceNow’s Project Arc and an OpenShell-based runtime for sandboxed, policy-controlled execution.
- [OpenAI launches a Deployment Company for enterprise AI rollouts](https://theaitea.news/updates/openai-deployment-company-forward-deployed-engineers/): OpenAI says it is launching the OpenAI Deployment Company and agreeing to acquire Tomoro to bring Forward Deployed Engineers into customer deployments from day one.
- [Meta introduces NeuralBench to benchmark EEG and NeuroAI models](https://theaitea.news/updates/meta-neuralbench-eeg-benchmark-for-neuroai-models/): Meta researchers introduce NeuralBench and NeuralBench‑EEG, a unified benchmark intended to compare brain-signal AI models across dozens of tasks and many datasets through one framework.
- [Anthropic donates Petri alignment audits to independent Meridian Labs](https://theaitea.news/updates/anthropic-donates-petri-alignment-tool-to-meridian-labs/): Anthropic says it is handing Petri, its open-source alignment auditing toolbox, to Meridian Labs and releasing Petri 3.0 with more adaptable and realistic behavior tests.
- [AWS makes its MCP Server generally available for AI agents](https://theaitea.news/updates/aws-mcp-server-ga-auditable-agent-access/): AWS says its MCP Server is generally available, letting AI agents call AWS APIs and read current documentation under IAM guardrails with CloudTrail and CloudWatch visibility.
- [DeepMind highlights new impact results for AlphaEvolve, its Gemini-powered coding agent](https://theaitea.news/updates/deepmind-alphaevolve-gemini-coding-agent-impact-acopf-deepconsensus/): Google DeepMind says AlphaEvolve, a Gemini-powered coding agent, found algorithm and infrastructure improvements, citing gains in genomics, grid optimization, and systems tuning.
- [Anthropic proposes Model Spec Midtraining to improve alignment generalization](https://theaitea.news/updates/anthropic-model-spec-midtraining-alignment-generalization-msm/): Anthropic describes Model Spec Midtraining (MSM), a training stage that teaches models their behavior spec, and reports large drops in agentic misalignment on scenario tests.
- [OpenAI launches GPT-Realtime-2, Translate, and Whisper for live voice apps](https://theaitea.news/updates/openai-realtime-2-translate-whisper-models-realtime-api/): OpenAI says three Realtime API audio models—GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper—support voice agents that reason, translate, and transcribe in real time.
- [OpenAI introduces Advanced Account Security for passkeys and recovery keys](https://theaitea.news/updates/openai-advanced-account-security-passkeys-recovery-keys/): OpenAI added an opt-in Advanced Account Security mode that requires passkeys or security keys, tightens recovery, and shortens sessions.
- [AgentFloor benchmark tests how far small open-weight models go in tool use](https://theaitea.news/updates/agentfloor-tool-use-ladder-open-weight-routing/): A new arXiv paper introduces AgentFloor, a 30-task tool-use benchmark, and reports many routine agent steps work well on smaller open-weight models.
- [Gemini API File Search adds multimodal retrieval and page-level citations](https://theaitea.news/updates/gemini-api-file-search-multimodal-citations-metadata/): Google says Gemini API File Search now supports images plus text, metadata filtering, and page citations to ground RAG responses.
- [OpenAI updates GPT‑5.5 Instant with fewer hallucinations and new controls](https://theaitea.news/updates/openai-gpt-5-5-instant-smarter-clearer-personalized/): OpenAI says GPT‑5.5 Instant, ChatGPT’s default model, is more accurate, cuts hallucinated claims in internal tests, and adds visibility into what context was used for personalization.
- [NIST says DeepSeek V4 Pro trails the frontier by about eight months](https://theaitea.news/updates/nist-caisi-evaluation-deepseek-v4-pro-frontier-lag/): NIST’s CAISI says its evaluation of DeepSeek V4 Pro finds the model lags the frontier by about eight months, based on benchmarks spanning cyber, coding, science, reasoning, and math.
- [NIST expands CAISI agreements for pre-deployment frontier AI testing](https://theaitea.news/updates/nist-caisi-frontier-ai-testing-agreements-google-microsoft-xai/): NIST’s CAISI signed new agreements with Google DeepMind, Microsoft, and xAI to run pre-deployment evaluations and expand federal research on AI security.
- [Meta releases RL-R CHAT, an egocentric conversation dataset for hearing AI](https://theaitea.news/updates/meta-rlr-chat-egocentric-conversation-dataset-hearing/): Meta Reality Labs released RL-R CHAT, an egocentric multimodal dataset of group conversations to support hearing-assist and speech enhancement research.
- [Google’s ReasoningBank aims to help agents learn from past runs](https://theaitea.news/updates/google-reasoningbank-agent-memory-learning-from-experience/): ReasoningBank stores distilled reasoning strategies from both successes and failures, improving tool-using agent performance on web navigation and coding benchmarks.
- [OpenAI explains how it runs low-latency voice AI with WebRTC](https://theaitea.news/updates/openai-low-latency-voice-ai-webrtc-relay-transceiver/): OpenAI describes a relay-plus-transceiver WebRTC design that keeps voice sessions stable while avoiding huge public UDP port ranges in Kubernetes.
- [Anthropic research shows how safety classifiers can be backdoored via data poisoning](https://theaitea.news/updates/anthropic-poisoning-constitutional-classifiers-backdoor/): Anthropic researchers report that a small, roughly constant number of poisoned fine-tuning examples can install a backdoor in constitutional classifiers without obvious robustness losses.
- [Anthropic updates its Responsible Scaling Policy to expand external review](https://theaitea.news/updates/anthropic-rsp-3-2-external-review-risk-reports/): Anthropic updated its Responsible Scaling Policy to version 3.2, expanding how its Long-Term Benefit Trust can request and approve external review of risk reports.
- [OpenAI brings its models, Codex, and Managed Agents to AWS](https://theaitea.news/updates/openai-on-aws-bedrock-codex-managed-agents/): OpenAI says AWS customers can access its frontier models, Codex, and Bedrock Managed Agents in limited preview inside existing AWS security and billing workflows.
- [OpenAI says Stargate is ahead of schedule on adding US compute capacity](https://theaitea.news/updates/openai-stargate-compute-infrastructure/): OpenAI says it surpassed its 10GW by 2029 infrastructure milestone early and is evaluating additional data-center sites to meet rising AI demand.