Frontier Model Upgrades and Agentic Coding
The landscape of ai tool updates shifted significantly today with Anthropic releasing Claude Opus 4.7. This new model offers vast improvements in long-running task execution and visual reasoning. Image processing capabilities have leaped to support up to 2,576 pixels on the long edge, which is over three times higher than previous versions. The model retains the $5 and $25 per million token pricing of its predecessor.
However, developers should note that the new tokenizer can consume up to 35% more tokens for identical prompts. Opus 4.7 also introduces an ultrareview command to catch bugs and design flaws efficiently. While this model hit 64.3% on the SWE-bench Pro evaluation, Anthropic's unreleased Mythos Preview model still holds the frontier lead at 77.8%.
OpenAI responded by heavily upgrading Codex, transitioning it from a standard coding assistant into a full desktop superapp. The platform now features background computer use, allowing it to operate Mac applications autonomously even without APIs. Developers also gain access to parallel agents, an Atlas-powered in-app browser, and the inline gpt-image-1.5 model for rapid mockups. Furthermore, persistent memory and automations enable Codex to resume complex tasks across multiple days.
Specialized Platforms and Open Source Models
Perplexity introduced its Personal Computer application specifically designed for Mac users. This agent securely connects to local folders to read, search, and edit files 24/7. It also natively integrates with Apple Mail and iMessage, shifting the operating system dynamic toward probabilistic goal completion.
OpenAI also launched GPT-Rosalind, a specialized model built exclusively for biological research and drug discovery. The model can query lab databases, generate biological hypotheses, and read complex scientific papers. Early testing with partners like Moderna and Amgen shows it outperforms human scientists on specific prediction tasks. In the open-source arena, Alibaba released Qwen3.6-35B-A3B, a highly efficient sparse model. This model offers impressive vision capabilities that rival top-tier proprietary models.
Design, Workflow, and Development Environments
Canva has officially rebranded as an AI platform with design tools, marking the shift with its Canva AI 2.0 launch ahead of a $42 billion IPO test. Users can now prompt plain English descriptions to receive editable designs with persistent memory orchestration. Meanwhile, Windsurf 2.0 launched its Agent Command Center, integrating Devin for seamless collaboration between local and cloud agents.
Google updated its Chrome AI Mode to support side-by-side browsing. This feature allows users to keep web pages open alongside AI chat windows for continuous context retention. Vercel announced the general availability of Workflows to support long-running, durable systems. Additionally, developers can now explore Ternary Bonsai, a 1.58-bit language model family that dramatically reduces memory usage while maintaining high benchmark scores.
Emerging AI Startups and 3D Generators
Several specialized tools are making waves in today's ai tool updates. Factory secured a $150 million raise for its autonomous coding agents that switch models based on task complexity. Tencent open-sourced HY-World 2.0 to generate editable 3D scenes with physics-aware movement. Alibaba's ATH team also introduced Happy Oyster in beta to create interactive 3D environments on the fly.
For those looking to optimize their setups, several new guides and applications emerged. A new HuggingFace tool assists contributors with porting transformer models to mlx-lm by providing agent-assisted PRs. Developers can run models locally using Ollama without internet access. Finally, rapid deployment tools like Aippy for game creation, ZenCreator for video control, Airbrush Studio for portraits, Fathom for meeting histories, Composio for API agent connections, and Viktor for Slack brand protection round out the day's massive list of software launches.
Model / Tool | SWE-Bench Pro Score | Key Feature |
|---|---|---|
Claude Opus 4.7 | 64.3% | 2,576 pixel vision processing |
Anthropic Mythos Preview | 77.8% | Frontier reasoning (Unreleased) |
Claude Opus 4.6 | 53.4% | Previous generation baseline |