testified.ai Logo

GPT-5.4 and Copilot Cowork Lead AI Tool Updates

This week's major AI tool updates are led by OpenAI's release of GPT-5.4, a powerful new model with enhanced coding and vision capabilities, and Microsoft's launch of Copilot Cowork, an agentic AI designed to automate complex tasks across the Microsoft 365 ecosystem. Anthropic also enhanced Claude with new features, alongside a wave of new developer tools for building, testing, and deploying AI agents.

Microsoft and OpenAI Unveil Major Enterprise Tools

The AI tooling landscape saw significant releases this week, with major players targeting enterprise productivity and developer efficiency. Microsoft, in a significant collaboration with Anthropic, launched Copilot Cowork, a new feature for Microsoft 365. This tool is designed to execute multi-step tasks across applications such as Outlook, Teams, and Excel, serving as an intelligent workflow manager built on Microsoft's "Work IQ" intelligence layer.

Unlike desktop-only agents, Copilot Cowork operates in the cloud, leveraging a deep understanding of a user's emails, files, and meetings to automate tasks such as meeting preparation and customer follow-ups. It is currently in a limited research preview and will be bundled into a new $99/month E7 enterprise tier.

Not to be outdone, OpenAI released GPT-5.4 in "thinking" and "pro" variants. This model boasts a 1-million-token context window, superior vision, and more efficient tool use, significantly improving its performance on computer operations and financial tasks. OpenAI also launched ChatGPT for Excel, a sidebar extension, and Codex Security, an AI app security agent evolved from Project Aardvark, which is free for one month to Enterprise customers.

Anthropic Enhances Claude's Capabilities

Anthropic continues to build out its ecosystem with several key updates. The new /loop skill in Claude Code allows users to schedule recurring tasks within a single session for up to three days. For enterprise clients, Anthropic introduced Code Review by Claude, which uses a team of agents to analyze GitHub pull requests for errors and vulnerabilities, with an average cost of $15-25 per review.

Claude Logo
Claude
4.8/5

Additionally, the new Claude Marketplace enables enterprises to use their Anthropic spending commitments to pay for other AI applications, such as GitLab and Replit, consolidating their AI expenditure.

A Surge in Developer and Agent-Focused Tools

The developer community received a wealth of new tools aimed at building and managing AI agents. Andrej Karpathy released autoresearch, an open-source project where agents autonomously iterate on and improve LLM training code, achieving an 11% speedup in early tests. Another key release is Paperclip, an open-source tool that organizes AI agents into a company-like structure with org charts, budgets, and goal alignment.

The rise of agentic frameworks like Paperclip and Copilot Cowork indicates a clear industry shift from single-prompt chatbots to autonomous systems that manage complex, multi-step workflows. This move places a new emphasis on security, governance, and orchestration.

Several new platforms have emerged to provide the necessary infrastructure for this shift. 21st Agents and Terminal Use offer runtime environments, sandboxing, and billing for integrating agents into applications. For developers working locally, Agent Safehouse provides macOS-native sandboxing.

Specialized Tools for Code and Content Generation

A variety of specialized tools was also launched, targeting specific developer needs from code review to content creation. A comparison of new code review tools is below:

Tool Name

Key Feature

Focus Area

Warden by Sentry

A set of skills to review every PR

Codebase-wide analysis

Vet by Imbue

Fast and local review

Ensuring agent instructions were followed

OpenReview

Open-source and self-hosted

Powered by Vercel AI Cloud

Code Review by Claude

Multi-agent analysis

Deep analysis of logic and security

Other notable releases include:

  • Cursor Automations: Build always-on agents that run on a schedule or are triggered by events.

  • Air by JetBrains: An agentic development environment for working with agents from different providers.

  • Context Hub: An open-source tool from Andrew Ng that provides coding agents with up-to-date API documentation.

  • IronClaw: An open-source alternative to OpenClaw focused on hardware-enforced security for always-on agents.

  • Manus: A tool to automatically generate videos from blog posts, press releases, or other written content.

  • Figure's Helix 02: A robot demoed tidying a living room 100% autonomously, showcasing advances in real-world agent application.

These releases underscore a maturing market where the focus is shifting from foundational models to the practical application, management, and security of AI agents in both consumer and enterprise settings. Find more in-depth analysis in our news section.

#AI Tools#New Releases#GPT-5.4#Copilot Cowork#AI Agents#Developer Tools#Productivity
Tamás Bőzsöny
Partnership Manager, System Auditor

Meet Tamás Bőzsöny, Senior Systems Auditor at testified.ai. With 22 years in digital media forensics and 15 years as a software workflow coach, Tamás leverages his background as a professional accountant to audit AI tools for UI efficiency, technical integrity, and financial ROI.