Windsurf Introduces Arena Mode to Compare AI Models During Development

Windsurf has introduced Arena Mode inside its IDE allowing developers to compare large language models side by side while working on real coding tasks. The feature is designed to let users evaluate models directly within their existing development context, rather than relying on public benchmarks or external evaluation websites.

Arena Mode runs two Cascade agents in parallel on the same prompt, with the underlying model identities hidden during the session. Developers interact with both agents using their normal workflow, including access to their codebase, tools, and context. After reviewing the outputs, users can select which response performed better, and those votes are used to calculate model rankings. The results feed into both a personal leaderboard based…

Source link

Leave a Comment