OpenAI Reclaims the Frontier with GPT-5.5 'Spud'
OpenAI's latest release marks a significant moment in the ongoing AI race. The GPT-5.5 launch introduces what the company calls a "new class of intelligence," designed as a "worker-class" model focused on task completion rather than just conversational answers. Our initial analysis of the provided benchmarks confirms its formidable capabilities.
The model sets new public records across multiple evaluations. It scored an impressive 82.7% on Terminal-Bench 2.0 and demonstrated performance comparable to industry professionals on 84.9% of GDPval tasks. In mathematics, it jumped from 27.1% to 35.4% on FrontierMath Tier 4, even contributing to a new mathematical proof about Ramsey numbers.
For developers, a key highlight is its performance on coding tasks. While trailing slightly on SWE-Bench Pro, OpenAI noted that the leading model showed signs of memorization on that specific evaluation. OpenAI also used GPT-5.5 to rewrite its own GPU code, improving infrastructure efficiency.
The API is priced at $5 per million input tokens and $30 per million output tokens, which OpenAI positions as half the cost of competitive coding models. The model is now rolling out to users with paid ChatGPT plans.
Anthropic Responds with Claude Enhancements
Not to be outdone, Anthropic has pushed several updates to its Claude ecosystem. A major new feature is the introduction of built-in memory for Claude Managed Agents, allowing the AI to learn from each session and retain context over time. These memories are stored in editable files, giving users full control.
Anthropic also expanded Claude's utility by adding new connectors to popular everyday applications. Users can now interact with services like TripAdvisor, Booking.com, Spotify, Instacart, and Uber directly within the chat interface. This move aims to make Claude a more integrated assistant for daily tasks.
In a move demonstrating transparency, the company also published a detailed post-mortem regarding recent user reports of degraded quality in Claude Code. They identified and fixed three separate bugs affecting Claude Code, the Agent SDK, and Claude Cowork, and have reset usage limits for subscribers.
A Flood of New and Specialized AI Tools
Beyond the two industry giants, the market saw a flurry of new AI tools and model releases. Microsoft made its Copilot more agentic, setting "Agent" as the default mode in Office apps to enable multi-step actions across documents. Google also added AI Overviews to Gmail, allowing users to query their inbox with natural language.
New Models and Developer Infrastructure
Several new models have been announced. DeepSeek unveiled its V4 Flash and Pro series, featuring a 1-million-token context window. Alibaba released Qwen3.6-27B, and Tencent open-sourced its Hy3 preview model. For developers building with these models, new infrastructure tools are emerging:
- Agentspan: An open-source framework for building more durable and resilient AI agents.
- Band: Pitched as a missing infrastructure layer for multi-agent systems.
- turbopuffer: A high-speed, cost-effective search engine used by companies like Cursor and Notion.
- Google Stitch: An open-source spec (DESIGN.md) to help coding agents understand and implement a product's visual identity.
Productivity and Niche Applications
The consumer and business tool landscape also expanded. We noted several intriguing new applications designed to streamline workflows:
| Tool Name | Primary Function |
|---|---|
| Clico | Browser extension that writes using context from open tabs. |
| FocuSee | Automatically converts screen recordings into product videos. |
| Docsio | Generates a complete, branded documentation site from a URL. |
| AirJelly | A proactive desktop agent that flags overdue tasks. |
| Sinceerly | A tool designed to "humanize" AI-generated text. |