# Devtool Arena Changelog

All notable changes to the Devtool Arena leaderboard are documented here.

## June 24, 2026

Synced new **Browser CLI** runs across Codex CLI and Claude Code CLI. **Browserbase** enters the **Codex CLI** top 5 at #4 with a score of 81, while **Steel** reaches #6 on Claude Code CLI; **Firecrawl** remains #1 on Codex CLI with 88.

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Browserbase | Browser | Cloud browser infrastructure for reliable web automation at scale | 81 | B | #4 of 55 | #1 of 4 |
| Steel | Browser | Cloud browser automation API and CLI for running browser sessions | 72 | C | #15 of 55 | #2 of 4 |
| Browser Use | Browser | AI-powered browser automation framework for web interactions | 69 | C | #18 of 55 | #3 of 4 |
| Anchor Browser | Browser | Headless browser automation API for web scraping and testing | 66 | C | #21 of 55 | #4 of 4 |

New company rankings above are based on the Codex CLI leaderboard.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 88 | B | — |
| 2 | WorkOS | Auth | 85 | B | — |
| 3 | LiveKit | Voice Infra | 82 | B | — |
| 4 | Browserbase | Browser | 81 | B | New |
| 5 | Jina AI | Search | 78 | B | Down 1 |
| 6 | Deepgram | Voice STT | 76 | B | Down 1 |
| 6 | Resend | Email | 76 | B | Down 1 |
| 8 | E2B | Sandboxes | 75 | B | Down 1 |
| 9 | Stripe | Payment | 73 | C | Down 1 |
| 9 | Datadog | Observability | 73 | C | Down 1 |

Top 10 based on the latest completed Codex CLI evaluation for each company.

### Claude Code CLI Browser Results

**Steel** ranks #6 with a score of 79. **Browserbase** ranks #11 with 76, **Anchor Browser** ranks #15 with 73, and **Browser Use** ranks #16 with 72.

## June 19, 2026

Synced the latest **Claude Code API** run for **Reducto**, lifting it into the top 10 at a #5 tie with a score of 84 and making it the leading Document Parsing result. **Firecrawl** remains #1 overall with a score of 93.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 93 | A | — |
| 2 | Box | Storage | 89 | B | — |
| 3 | ElevenLabs | Voice TTS | 86 | B | — |
| 3 | TinyFish | Search | 86 | B | — |
| 5 | Fireworks AI | Inference | 84 | B | — |
| 5 | Nimble | Search | 84 | B | — |
| 5 | Reducto | Document Parsing | 84 | B | — |
| 8 | Chroma | Vector Databases | 83 | B | Down 1 |
| 8 | Datadog | Observability | 83 | B | Down 1 |
| 8 | Daytona | Sandboxes | 83 | B | Down 1 |

Top 10 based on the latest completed Claude Code API evaluation for each company.

Synced the latest **Claude Code API** Document Parsing runs for **LlamaParse** and **Reducto**. **Reducto** leads Document Parsing with a score of 76; **LlamaParse** improves to 74 and ties **Unstructured**, while **Firecrawl** remains #1 overall with 93.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 93 | A | — |
| 2 | Box | Storage | 89 | B | — |
| 3 | ElevenLabs | Voice TTS | 86 | B | — |
| 3 | TinyFish | Search | 86 | B | — |
| 5 | Fireworks AI | Inference | 84 | B | — |
| 5 | Nimble | Search | 84 | B | — |
| 7 | Chroma | Vector Databases | 83 | B | — |
| 7 | Datadog | Observability | 83 | B | — |
| 7 | Daytona | Sandboxes | 83 | B | — |
| 7 | OpenCageData | Geocoding | 83 | B | — |

Top 10 based on the latest completed Claude Code API evaluation for each company.

Synced the latest **Claude Code API** evaluations, including fresh Document Parsing runs for **Reducto**, **Unstructured**, and **Extend.ai**. **Firecrawl** remains #1 with a score of 93; **Box** is now #2 with 89, while **ElevenLabs** and **TinyFish** climb into the #3 tie.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 93 | A | — |
| 2 | Box | Storage | 89 | B | — |
| 3 | ElevenLabs | Voice TTS | 86 | B | Up 5 |
| 3 | TinyFish | Search | 86 | B | Up 5 |
| 5 | Fireworks AI | Inference | 84 | B | — |
| 5 | Nimble | Search | 84 | B | — |
| 7 | Chroma | Vector Databases | 83 | B | — |
| 7 | Datadog | Observability | 83 | B | Up 1 |
| 7 | Anchor Browser | Browser | 83 | B | — |
| 7 | Daytona | Sandboxes | 83 | B | — |

Top 10 based on the latest completed Claude Code API evaluation for each company.

## June 17, 2026

Synced the latest **Codex CLI** evaluations with **5 new companies** across Code Review and Inference, including Pioneer AI, Semgrep, CodeRabbit, Greptile, and Qodo. **Firecrawl** remains #1 with a score of 88, while **LiveKit**, **Jina AI**, **Deepgram**, **Resend**, **E2B**, **Datadog**, and **Stripe** each move up one spot as Nimble drops out of the top 10 after its newer run.

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Pioneer AI | Inference | Inference platform for running and integrating AI models | 2 | D | #50 of 50 | #4 of 4 |
| Semgrep | Code Review | Static analysis and code security scanning CLI | 13 | D | #44 of 50 | #1 of 4 |
| CodeRabbit | Code Review | AI code review tooling for pull requests and repositories | 12 | D | #45 of 50 | #2 of 4 |
| Greptile | Code Review | Codebase intelligence and AI code review tooling | 12 | D | #45 of 50 | #2 of 4 |
| Qodo | Code Review | AI code quality and test generation tooling | 9 | D | #48 of 50 | #4 of 4 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 88 | B | — |
| 2 | WorkOS | Auth | 85 | B | — |
| 3 | LiveKit | Voice Infra | 82 | B | Up 1 |
| 4 | Jina AI | Search | 78 | B | Up 1 |
| 5 | Deepgram | Voice STT | 76 | B | Up 1 |
| 5 | Resend | Email | 76 | B | Up 1 |
| 7 | E2B | Sandboxes | 75 | B | Up 1 |
| 8 | Datadog | Observability | 73 | C | Up 1 |
| 8 | Stripe | Payment | 73 | C | Up 1 |
| 10 | Agentmail | Email | 72 | C | — |

Top 10 based on the latest completed Codex CLI evaluation for each company.

## June 12, 2026

Synced the latest **Claude Code MCP** Document Parsing result for **Extend.ai**, improving its best score to **75** and keeping it #1 in Document Parsing. **Exa** remains #1 overall with a score of 93; the top 10 is unchanged.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Exa | Search | 93 | A | — |
| 2 | Tavily | Search | 90 | A | — |
| 3 | ElevenLabs | Voice TTS | 88 | B | — |
| 3 | Jina AI | Search | 88 | B | — |
| 5 | Firecrawl | Search | 87 | B | — |
| 6 | Stripe | Payment | 84 | B | — |
| 7 | Descope | Auth | 83 | B | — |
| 8 | Nimble | Search | 82 | B | — |
| 9 | Clerk | Auth | 81 | B | — |
| 9 | LanceDB | Vector Databases | 81 | B | — |

Top 10 based on the latest completed Claude Code MCP evaluation for each company.

Added **Document Parsing** to the **Codex MCP leaderboard** with **Extend.ai** entering the top 10 and **Reducto** at #12. **Daytona** takes #1 with 88, while **Jina AI** moves up to #3 (+5) and **Firecrawl** falls to #9 (-5).
### New Categories

| Category | Description |
|----------|-------------|
| Document Parsing | Document extraction and OCR-style parsing APIs for PDFs, images, and structured data |

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Extend.ai | Document Parsing | Document processing API for extracting structured data from PDFs, images, and business documents | 78 | B | #10 of 70 | #1 of 2 |
| Reducto | Document Parsing | Document parsing API for converting PDFs and complex documents into structured data | 76 | B | #12 of 70 | #2 of 2 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Daytona | Sandboxes | 88 | B | New |
| 2 | Exa | Search | 87 | B | New |
| 3 | Jina AI | Search | 85 | B | Up 5 |
| 3 | Scrapfly | Webscraping | 85 | B | — |
| 5 | Stripe | Payment | 83 | B | Down 1 |
| 6 | Descope | Auth | 81 | B | — |
| 6 | ElevenLabs | Voice TTS | 81 | B | — |
| 6 | Tavily | Search | 81 | B | Down 2 |
| 9 | Firecrawl | Search | 79 | B | Down 5 |
| 10 | Extend.ai | Document Parsing | 78 | B | New |
| 10 | Render | Cloud Hosting | 78 | B | — |

Top 10 based on the latest completed Codex MCP evaluation for each company.

## June 2, 2026

Corrected **Nimble** on the **Codex API** and **Codex CLI** leaderboards after documentation/help text was counted as tool errors. Nimble now has **0 errors**, ranking **#3 on Codex CLI** with a score of 84 and **#18 on Codex API** with a score of 77.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 88 | B | — |
| 2 | WorkOS | Auth | 85 | B | — |
| 3 | Nimble | Search | 84 | B | — |
| 4 | LiveKit | Voice Infra | 82 | B | Down 1 |
| 5 | Jina AI | Search | 78 | B | Down 1 |
| 6 | Deepgram | Voice STT | 76 | B | Down 1 |
| 6 | Resend | Email | 76 | B | Down 1 |
| 8 | E2B | Sandboxes | 75 | B | Down 1 |
| 9 | Datadog | Observability | 73 | C | Down 1 |
| 9 | Stripe | Payment | 73 | C | Down 1 |

Top 10 based on the latest completed Codex CLI evaluation for each company. Codex API was also corrected for Nimble: #18 of 125 with a score of 77.

## June 1, 2026

Expanded both leaderboards with **3 new categories** (Browser, Document Parsing, Storage) and **12 new companies**. **Box** leads the new Storage category with scores of 85 (Claude Code) and 83 (Codex). **Vapi** and **Bolna** join Voice Telephony. **Diffbot** added to Webscraping.
### New Categories

| Category | Description |
|----------|-------------|
| Browser | Headless browser automation and cloud browser APIs for web scraping and testing |
| Document Parsing | Document extraction and parsing APIs for PDFs, images, and structured data |
| Storage | Cloud storage integration APIs for file upload, sharing, and management |

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Anchor Browser | Browser | Headless browser automation API for web scraping and testing | 77 | B | #22 of 118 | #1 of 3 |
| Browserbase | Browser | Cloud browser infrastructure for reliable web automation at scale | 72 | C | #30 of 118 | #2 of 3 |
| Browser Use | Browser | AI-powered browser automation framework for web interactions | 62 | C | #65 of 118 | #3 of 3 |
| Unstructured | Document Parsing | Open-source document preprocessing and extraction for LLM pipelines | 64 | C | #54 of 118 | #1 of 4 |
| Sensible | Document Parsing | Intelligent document extraction API for structured data from PDFs and images | 64 | C | #55 of 118 | #2 of 4 |
| Reducto | Document Parsing | Document intelligence API for extracting structured data from complex documents | 34 | D | #81 of 118 | #3 of 4 |
| LlamaParse | Document Parsing | LlamaIndex document parser for extracting text and tables from PDFs | 32 | D | #87 of 118 | #4 of 4 |
| Box | Storage | Cloud content management and file sharing platform with enterprise security | 85 | B | #11 of 118 | #1 of 2 |
| Dropbox | Storage | Cloud storage and file synchronization service with collaboration features | 70 | C | #35 of 118 | #2 of 2 |
| Vapi | Voice Telephony | Voice AI platform for building conversational voice agents and assistants | 60 | C | #74 of 118 | #3 of 8 |
| Bolna | Voice Telephony | Voice AI infrastructure for building telephony and voice agent applications | 37 | D | #80 of 118 | #5 of 8 |
| Diffbot | Webscraping | AI-powered web data extraction and knowledge graph API | 65 | C | #48 of 118 | #2 of 2 |

### Scores by Platform

**Claude Code API:**
- Box: 85 | Anchor Browser: 77 | Browserbase: 72 | Dropbox: 70
- Diffbot: 65 | Sensible: 64 | Unstructured: 64 | Browser Use: 62
- Vapi: 60 | Bolna: 37 | Reducto: 34 | LlamaParse: 32

**Codex API:**
- Box: 83 | Dropbox: 76 | Browserbase: 73 | Diffbot: 72
- LlamaParse: 67 | Reducto: 66 | Browser Use: 65 | Anchor Browser: 65
- Vapi: 60 | Unstructured: 33 | Bolna: 30 | Sensible: 28

## May 21, 2026

Re-ran the **Claude Code leaderboard** with updated evaluations. **LiveKit** returns to the top 10 at #7 with a score of 89, pushing the previous rank-7 group to #8. **Jina AI** achieves 88, joining a 6-way tie at #8 alongside Datadog, Cloudflare Workers, OpenCageData, ElevenLabs, and TinyFish. **Modal** improves to 78.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | You.com | Search | 93 | A | — |
| 1 | Exa | Search | 93 | A | — |
| 1 | Firecrawl | Search | 93 | A | — |
| 4 | Tavily | Search | 90 | A | — |
| 4 | Scrapfly | Webscraping | 90 | A | — |
| 4 | OpenRouter | Inference | 90 | A | — |
| 7 | LiveKit | Voice Infra | 89 | B | — |
| 8 | Datadog | Observability | 88 | B | Down 1 |
| 8 | Cloudflare Workers | Sandboxes | 88 | B | Down 1 |
| 8 | OpenCageData | Geocoding | 88 | B | Down 1 |
| 8 | ElevenLabs | Voice TTS | 88 | B | Down 1 |
| 8 | TinyFish | Search | 88 | B | Down 1 |
| 8 | Jina AI | Search | 88 | B | — |

Top 10 based on the latest completed Claude Code evaluation for each company.

## May 20, 2026

Re-ran the **Codex CLI leaderboard** with updated evaluations. **Firecrawl** climbs to #1 with a score of 88 (+2). **WorkOS** enters the top 3 at #2. **LiveKit** drops to #3 (-2) and **Jina AI** to #4 (-2). **Deepgram** moves up to #5 (+1).

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 88 | B | Up 2 |
| 2 | WorkOS | Auth | 85 | B | — |
| 3 | LiveKit | Voice Infra | 82 | B | Down 2 |
| 4 | Jina AI | Search | 78 | B | Down 2 |
| 5 | Deepgram | Voice STT | 76 | B | Up 1 |
| 5 | Resend | Email | 76 | B | Down 1 |
| 7 | E2B | Sandboxes | 75 | B | Down 3 |
| 8 | Datadog | Observability | 73 | C | — |
| 8 | Stripe | Payment | 73 | C | Down 1 |
| 10 | Render | Cloud Hosting | 72 | C | — |

Top 10 based on the latest completed Codex CLI evaluation for each company.

## May 19, 2026

Expanded both the **Claude Code** and **Codex leaderboards** to **111 companies** with **7 new categories** (Geocoding, Verification, Unified API, CMS, Code Review, Proxy, Talent Directory) and **21 new companies**. **OpenCageData** enters at #7 on Claude Code with a score of 88. Three-way tie for #1 on Claude Code: **You.com**, **Exa**, and **Firecrawl** all score 93.
### New Categories

| Category | Description |
|----------|-------------|
| Geocoding | Location lookup, address geocoding, and reverse geocoding APIs |
| Verification | Identity verification, phone number validation, and fraud detection APIs |
| Unified API | Aggregated APIs abstracting multiple underlying services behind one interface |
| CMS | Headless content management and delivery APIs |
| Code Review | Automated code analysis, review, and quality-gate APIs |
| Proxy | Residential and datacenter proxy network APIs for web access |
| Talent Directory | Developer and talent directory APIs for sourcing and profile data |

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| OpenCageData | Geocoding | Forward and reverse geocoding API built on open data | 88 | B | #9 of 111 | #1 of 1 |
| LMNT | Voice TTS | Ultra-low-latency text-to-speech API for real-time voice applications | 81 | B | #28 of 111 | #2 of 5 |
| Rev AI | Voice STT | Asynchronous and real-time speech-to-text transcription API | 73 | C | #61 of 111 | #3 of 5 |
| Restate | Durable Workflow | Durable execution framework for distributed workflows and async services | 73 | C | #59 of 111 | #7 of 8 |
| Apideck | Unified API | Unified API layer aggregating CRM, HR, accounting, and other SaaS data | 72 | C | #62 of 111 | #1 of 1 |
| Agora | Voice Infra | Real-time voice, video, and live streaming infrastructure SDK | 69 | C | #71 of 111 | #2 of 5 |
| HitPay | Payment | Payment platform for Southeast Asian businesses | 64 | C | #77 of 111 | #8 of 11 |
| Prelude | Verification | Phone number verification and OTP API for user authentication | 59 | C | #82 of 111 | #2 of 2 |
| Surge | Verification | Identity and KYC verification API for onboarding flows | 59 | C | #81 of 111 | #1 of 2 |
| Nimble | Search | Web data extraction and SERP API for structured search results | 58 | C | #84 of 111 | #7 of 9 |
| Daily | Voice Infra | Real-time audio and video infrastructure API for live communication apps | 57 | C | #87 of 111 | #3 of 5 |
| BetterStack | Observability | Uptime monitoring, log management, and incident response platform | 61 | C | #78 of 111 | #2 of 2 |
| BlindPay | Stablecoin | Stablecoin payment rails for cross-border transfers | 44 | D | #96 of 111 | #5 of 6 |
| Upsun | Cloud Hosting | PaaS platform for deploying and scaling web applications | 34 | D | #100 of 111 | #5 of 5 |
| Ash | Sandboxes | Cloud sandbox environments for ephemeral dev and test workloads | 30 | D | #102 of 111 | #8 of 8 |
| Speechmatics | Voice STT | Enterprise-grade speech recognition API with broad language support | 21 | D | #106 of 111 | #5 of 5 |
| 100ms | Voice Infra | Real-time audio and video conferencing infrastructure SDK | 17 | D | #107 of 111 | #5 of 5 |
| Macroscope | Code Review | Automated code review and engineering analytics platform | 17 | D | #109 of 111 | #1 of 1 |
| Strapi | CMS | Open-source headless CMS with a REST and GraphQL content API | 17 | D | #110 of 111 | #1 of 1 |
| Massive | Proxy | Residential proxy network API for web data access at scale | 25 | D | #105 of 111 | #1 of 1 |
| Roster | Talent Directory | Developer talent directory and sourcing API | 8 | D | #111 of 111 | #1 of 1 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | You.com | Search | 93 | A | — |
| 1 | Exa | Search | 93 | A | New |
| 1 | Firecrawl | Search | 93 | A | Up 4 |
| 4 | Tavily | Search | 90 | A | Down 2 |
| 4 | Scrapfly | Webscraping | 90 | A | New |
| 4 | OpenRouter | Inference | 90 | A | Up 4 |
| 7 | Datadog | Observability | 88 | B | Down 4 |
| 7 | Cloudflare Workers | Sandboxes | 88 | B | Down 4 |
| 7 | OpenCageData | Geocoding | 88 | B | New |
| 7 | ElevenLabs | Voice TTS | 88 | B | New |
| 7 | TinyFish | Search | 88 | B | Up 1 |

Top 10 based on the latest completed Claude Code evaluation for each company.

## May 16, 2026

Re-ran the **Codex CLI leaderboard** across most companies. **LiveKit** returns to #1 with 89; **Firecrawl** drops to #3 (-2). Added **3 new companies**: Telnyx and Vonage (Voice Telephony) and Nylas (Email).

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Telnyx | Voice Telephony | Programmable voice, SMS, and fax API for real-time cloud communications | 52 | D | #18 of 39 | #1 of 3 |
| Vonage | Voice Telephony | Cloud communications platform for voice, SMS, and video APIs | 50 | D | #19 of 39 | #2 of 3 |
| Nylas | Email | Email, calendar, and contacts API for embedding communication features | 27 | D | #29 of 39 | #3 of 3 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | LiveKit | Voice Infra | 89 | B | — |
| 2 | Jina AI | Search | 88 | B | — |
| 3 | Firecrawl | Search | 84 | B | Down 2 |
| 4 | Resend | Email | 82 | B | Up 1 |
| 4 | E2B | Sandboxes | 82 | B | Up 1 |
| 6 | Deepgram | Voice STT | 81 | B | — |
| 7 | Stripe | Payment | 80 | B | — |
| 8 | Datadog | Observability | 77 | B | — |
| 9 | Sprites | Sandboxes | 75 | B | — |
| 10 | Agentmail | Email | 73 | C | — |

Top 10 based on the latest completed Codex CLI evaluation for each company.

## May 15, 2026

**Tavily** climbs to #2 on the **Claude Code leaderboard** with a new top score of 90, up from 86 (+2 ranks). **Nebius Token Factory** joins the Inference category. **TinyFish** also adds a Codex run at score 82.

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Nebius Token Factory | Inference | Fast LLM inference API with flexible model routing and token-based pricing | 74 | C | #40 of 94 | #5 of 9 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | You.com | Search | 93 | A | — |
| 2 | Tavily | Search | 90 | A | Up 2 |
| 3 | Datadog | Observability | 88 | B | Down 1 |
| 3 | Cloudflare Workers | Sandboxes | 88 | B | Down 1 |
| 5 | Firecrawl | Search | 86 | B | Down 1 |
| 5 | Vercel | Sandboxes | 86 | B | Down 1 |
| 5 | Stripe | Payment | 86 | B | Down 1 |
| 8 | OpenRouter | Inference | 84 | B | — |
| 8 | TinyFish | Search | 84 | B | — |
| 8 | Jina AI | Search | 84 | B | — |

Top 10 based on the latest completed Claude Code evaluation for each company.

## May 13, 2026

Expanded the **Codex MCP leaderboard** to **93 companies**. Added **5 new companies** including TinyFish, Convex, and Anchor Browser. **You.com** enters at #1 with a score of 93 — the highest MCP score yet — while **Firecrawl** climbs to #4 (+2).

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| TinyFish | Search | AI-powered search and discovery API | 84 | B | #10 of 93 | #3 of 8 |
| Convex | Database | Reactive backend-as-a-service with real-time database and serverless functions | 73 | C | #46 of 93 | #1 of 1 |
| Anchor Browser | Browser Automation | Headless browser automation API for web scraping and testing | 67 | C | #58 of 93 | #1 of 1 |
| TABStack | Search | Developer search and code intelligence API | 54 | D | #77 of 93 | #5 of 8 |
| Freestyle | Sandboxes | Cloud sandbox environments for running untrusted code | 53 | D | #78 of 93 | #7 of 7 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | You.com | Search | 93 | A | New |
| 2 | Datadog | Observability | 88 | B | New |
| 2 | Cloudflare Workers | Sandboxes | 88 | B | New |
| 4 | Vercel | Sandboxes | 86 | B | New |
| 4 | Firecrawl | Search | 86 | B | Up 2 |
| 4 | Stripe | Payment | 86 | B | Down 2 |
| 4 | Tavily | Search | 86 | B | Down 1 |
| 8 | OpenRouter | Inference | 84 | B | New |
| 8 | Jina AI | Search | 84 | B | Down 7 |
| 8 | TinyFish | Search | 84 | B | New |

Top 10 based on the latest completed MCP evaluation for each company.

## May 9, 2026

Updated the **Codex MCP leaderboard** with gpt-5.5 runs across **68 companies**. **Jina AI** holds #1 with 85; **Stripe** enters at #2 with 83. Added **8 new companies** including Telnyx, Twilio, Resemble AI, Pipecat, and Fireworks AI.

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Telnyx | Voice Telephony | Programmable voice, messaging, and SIP trunking API | 66 | C | #13 of 68 | #1 of 7 |
| Plivo | Voice Telephony | Voice and SMS API platform for building communications apps | 59 | C | #19 of 68 | #2 of 7 |
| Twilio | Voice Telephony | Cloud communications platform for voice, SMS, and video | 57 | C | #22 of 68 | #3 of 7 |
| Resemble AI | Voice TTS | AI voice cloning and text-to-speech synthesis API | 47 | D | #27 of 68 | #3 of 3 |
| Sinch | Voice Telephony | Cloud communications API for voice, SMS, and messaging | 10 | D | #52 of 68 | #4 of 7 |
| Vonage | Voice Telephony | Programmable communications API for voice and messaging | 10 | D | #52 of 68 | #4 of 7 |
| Pipecat | Voice Infra | Open-source framework for real-time voice AI pipelines | 4 | D | #66 of 68 | #2 of 2 |
| Fireworks AI | Inference | Fast inference API for open-source LLMs and image models | 4 | D | #66 of 68 | #5 of 5 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Jina AI | Search | 85 | B | — |
| 2 | Stripe | Payment | 83 | B | — |
| 3 | ElevenLabs | Voice TTS | 81 | B | — |
| 3 | Tavily | Search | 81 | B | — |
| 5 | Scrapfly | Webscraping | 80 | B | — |
| 6 | Firecrawl | Search | 78 | B | Down 4 |
| 6 | Render | Cloud Hosting | 78 | B | — |
| 8 | Descope | Auth | 77 | B | — |
| 9 | Agentmail | Email | 74 | C | Down 6 |
| 10 | Qdrant | Vector Databases | 72 | C | — |

Top 10 based on the latest completed MCP evaluation for each company.

## May 8, 2026

Launched the **Codex MCP leaderboard** with **53 companies** scored. Added **Voice Telephony** as a new category with Bandwidth and Infobip. **Jina AI** leads at #1 with a score of 81; **Firecrawl** and **Agentmail** round out the top 3.
### New Categories

| Category | Description |
|----------|-------------|
| Voice Telephony | Telephony and programmable voice call APIs |

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Nylas | Email | Email and calendar API for reading, sending, and managing messages | 19 | D | #9 of 53 | #3 of 3 |
| Coinbase Payments | Stablecoin | Stablecoin and crypto payments API from Coinbase | 10 | D | #37 of 53 | #1 of 2 |
| Fireblocks | Stablecoin | Digital asset custody and stablecoin transfer infrastructure | 10 | D | #39 of 53 | #2 of 2 |
| Bandwidth | Voice Telephony | Programmable voice and messaging carrier API | 9 | D | #43 of 53 | #1 of 2 |
| Infobip | Voice Telephony | Omnichannel communications platform with voice call and SMS APIs | 4 | D | #53 of 53 | #2 of 2 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Jina AI | Search | 81 | B | New |
| 2 | Firecrawl | Search | 71 | C | New |
| 3 | Agentmail | Email | 68 | C | New |
| 4 | Resend | Email | 65 | C | New |
| 5 | Clerk | Auth | 63 | C | New |
| 6 | Cartesia | Voice TTS | 61 | C | New |
| 7 | Chroma | Vector Databases | 48 | D | New |
| 8 | Cerebras | Inference | 34 | D | New |
| 9 | Nylas | Email | 19 | D | New |
| 10 | Qdrant | Vector Databases | 13 | D | New |

Top 10 based on the latest completed Codex MCP evaluation for each company.

## April 30, 2026

Updated the **Claude Code API**, **Claude Code CLI**, **Codex API**, and **Codex CLI** leaderboards. This snapshot reflects the latest completed runs on this date, with **108 companies** currently listed on the leaderboard and **954 completed evaluations**.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | ElevenLabs | Voice TTS | 92 | A | — |
| 1 | Firecrawl | Search | 92 | A | — |
| 3 | Auth0 | Auth | 91 | A | — |
| 4 | Tavily | Search | 90 | A | — |
| 5 | LiveKit | Voice Infra | 89 | B | — |
| 6 | Datadog | Observability | 88 | B | — |
| 7 | Stripe | Payment | 86 | B | — |
| 8 | Jina AI | Search | 85 | B | — |
| 9 | OpenRouter | Inference | 84 | B | — |
| 10 | Deepgram | Voice STT | 83 | B | — |

Top 10 based on the latest completed evaluation for each company across all platforms.

## April 29, 2026

Updated the **Codex CLI leaderboard**. This snapshot reflects the latest completed Codex CLI run on this date, with **88 companies** currently listed on the leaderboard and **216 completed Codex CLI evaluations**.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 86 | B | — |
| 2 | Jina AI | Search | 85 | B | — |
| 3 | ElevenLabs | Voice TTS | 84 | B | — |
| 3 | Tavily | Search | 84 | B | — |
| 5 | E2B | Sandboxes | 82 | B | — |
| 5 | Resend | Email | 82 | B | — |
| 7 | Stripe | Payment | 80 | B | — |
| 8 | AssemblyAI | Voice STT | 79 | B | — |
| 8 | Mollie | Payment | 79 | B | — |
| 8 | WorkOS | Auth | 79 | B | — |

Top 10 based on the latest completed Codex CLI evaluation for each company.

## April 26, 2026

Added the **Codex CLI leaderboard**. This snapshot reflects the latest completed Codex CLI end-to-end run on this date, with **74 companies** currently listed on the leaderboard and **33 completed Codex CLI evaluations**.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | LiveKit | Voice Infra | 89 | B | — |
| 2 | Firecrawl | Search | 86 | B | — |
| 3 | Jina AI | Search | 85 | B | — |
| 4 | Tavily | Search | 84 | B | — |
| 5 | E2B | Sandboxes | 82 | B | — |
| 5 | Resend | Email | 82 | B | — |
| 7 | Stripe | Payment | 80 | B | — |
| 8 | WorkOS | Auth | 79 | B | — |
| 9 | Datadog | Observability | 77 | B | — |
| 10 | Sprites | Sandboxes | 75 | B | — |

Top 10 based on the latest completed Codex CLI end-to-end evaluation for each company.

## April 24, 2026

Added the **Codex API leaderboard**. This snapshot reflects the latest completed Codex API run on this date, with **74 companies** run.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | ElevenLabs | Voice TTS | 92 | A | — |
| 1 | Firecrawl | Search | 92 | A | — |
| 3 | Tavily | Search | 90 | A | — |
| 4 | Datadog | Observability | 88 | B | — |
| 5 | Stripe | Payment | 86 | B | — |
| 6 | OpenRouter | Inference | 84 | B | — |
| 7 | Deepgram | Voice STT | 83 | B | — |
| 8 | Cartesia | Voice TTS | 79 | B | — |
| 8 | Daytona | Sandboxes | 79 | B | — |
| 8 | Mollie | Payment | 79 | B | — |

Top 10 based on the latest completed Codex API evaluation for each company.

## April 15, 2026

Added the **MCP leaderboard**. This snapshot reflects the latest completed MCP run on this date, with **74 companies** run on the leaderboard and **50 companies** currently tracked with MCP metadata in the registry.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Chroma | Vector Databases | 78 | B | — |
| 2 | Agentmail | Email | 77 | B | — |
| 2 | Qdrant | Vector Databases | 77 | B | — |
| 2 | Tavily | Search | 77 | B | — |
| 5 | Firecrawl | Search | 76 | B | — |
| 5 | Jina AI | Search | 76 | B | — |
| 7 | ElevenLabs | Voice TTS | 75 | B | — |
| 7 | MeetGeek | Meeting Bot | 75 | B | — |
| 9 | Deepgram | Voice STT | 74 | C | — |
| 9 | Descope | Auth | 74 | C | — |

Top 10 based on the latest completed MCP evaluation for each company.

## April 9, 2026

Added the **Claude Code CLI leaderboard**. This snapshot reflects the latest completed Claude Code CLI end-to-end run on this date, with **74 companies** run on the leaderboard and **35 companies** currently tracked with CLI metadata in the registry.

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | LiveKit | Voice Infra | 83 | B | — |
| 2 | Firecrawl | Search | 80 | B | — |
| 3 | Jina AI | Search | 79 | B | — |
| 3 | Resend | Email | 79 | B | — |
| 5 | Daytona | Sandboxes | 78 | B | — |
| 5 | Tavily | Search | 78 | B | — |
| 7 | Vercel | Sandboxes | 77 | B | — |
| 8 | Sprites | Sandboxes | 74 | C | — |
| 8 | Stripe | Payment | 74 | C | — |
| 8 | WorkOS | Auth | 74 | C | — |

Top 10 based on the latest completed Claude Code CLI end-to-end evaluation for each company.

## April 5, 2026

Added 2 new categories to cover APIs that didn't fit existing buckets, along with **Datadog** and **Agentmail**.
### New Categories

| Category | Description |
|----------|-------------|
| Observability | Monitoring, metrics, and event tracking APIs |
| Email | Programmatic email inbox and messaging APIs |

### New Companies

| Company | Category | Description | Score | Grade | Overall Rank | Category Rank |
|---------|----------|-------------|-------|-------|--------------|---------------|
| Datadog | Observability | Python SDK for submitting custom metrics and events | 83 | B | #3 of 72 | #1 of 1 |
| Agentmail | Email | Programmatic email inboxes, send/receive, and AI agent email | 75 | B | #17 of 72 | #1 of 1 |

### Top Rankings

| Rank | Company | Category | Score | Grade | Movement |
|------|---------|----------|-------|-------|----------|
| 1 | Firecrawl | Search | 86 | B | — |
| 2 | Jina AI | Search | 84 | B | — |
| 3 | Datadog | Observability | 83 | B | New |
| 4 | Auth0 | Auth | 82 | B | Down 1 |
| 4 | Prefect | Durable Workflow | 82 | B | Down 1 |
| 4 | Stripe | Payment | 82 | B | Down 1 |
| 7 | Tavily | Search | 81 | B | Down 1 |
| 7 | PayPal | Payment | 81 | B | Down 1 |
| 9 | You.com | Search | 80 | B | Down 1 |
| 9 | ElevenLabs | Voice TTS | 80 | B | Down 1 |

## March 2026

Launched the Devtool Arena leaderboard with **72 API companies** scored.

### Initial Launch

- **Inference** — Groq, TogetherAI, Perplexity, OpenRouter, Cerebras, Fireworks AI, DeepInfra, SambaNova
- **Auth** — Clerk, Auth0, Stytch, Scalekit, WorkOS, Descope
- **Payment** — Stripe, PayPal, Square, Adyen, Razorpay, Mollie, Paddle, Chargebee, LemonSqueezy, Airwallex
- **Search** — Exa, Tavily, You.com, Brave Search, Firecrawl, Jina AI
- **Voice STT** — Deepgram, AssemblyAI, OpenAI Whisper
- **Voice TTS** — ElevenLabs, Rime
- **Voice Infra** — LiveKit
- **Sandboxes** — Vercel, Cloudflare Workers, E2B, Daytona, Modal, Fly.io
- **Vector Databases** — Chroma, Pinecone, Weaviate, Qdrant, LanceDB, Zilliz Cloud
- **Stablecoin** — Stripe (Crypto), Coinbase Payments, Circle, Fireblocks, DFNS, Triple-A
- **Meeting Bot** — Recall.ai, Meeting BaaS, MeetGeek, Meetstream, CueMeet
- **Durable Workflow** — Temporal, Prefect, Cadence, Camunda, Netflix Conductor, AWS Step Functions, Akka
- **Cloud Hosting** — Vercel, Cloudflare, Netlify, Railway, Render, DigitalOcean

Canonical URL: https://usesapient.com/leaderboard/changelog
