LLM Leaderboards

cecli excels with LLMs skilled at writing and editing code, and uses benchmarks to evaluate an LLM’s ability to follow instructions and edit code successfully without human intervention. Aider’s polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust.

cecli polyglot coding leaderboard

	Model	Percent correct	Cost	Correct edit format	Edit Format
	gpt-5 (high)	88.0%	$29.08	91.6%	diff
Dirname : 2025-08-23-15-47-21--gpt-5-high Test cases : 225 Model : gpt-5 (high) Edit format : diff Commit hash : 32faf82 Reasoning effort : high Pass rate 1 : 52.0 Pass rate 2 : 88.0 Pass num 1 : 117 Pass num 2 : 198 Percent cases well formed : 91.6 Error outputs : 23 Num malformed responses : 22 Num with malformed responses : 19 User asks : 96 Lazy comments : 3 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2675561 Completion tokens : 2623429 Test timeouts : 3 Total tests : 225 Command : `aider --model openai/gpt-5` Date : 2025-08-23 Versions : 0.86.2.dev Seconds per case : 194.0 Total cost : 29.0829
	gpt-5 (medium)	86.7%	$17.69	88.4%	diff
Dirname : 2025-08-25-13-23-27--gpt-5-medium Test cases : 225 Model : gpt-5 (medium) Edit format : diff Commit hash : 32faf82 Reasoning effort : medium Pass rate 1 : 49.8 Pass rate 2 : 86.7 Pass num 1 : 112 Pass num 2 : 195 Percent cases well formed : 88.4 Error outputs : 40 Num malformed responses : 40 Num with malformed responses : 26 User asks : 102 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2827261 Completion tokens : 1468799 Test timeouts : 0 Total tests : 225 Command : `aider --model openai/gpt-5` Date : 2025-08-25 Versions : 0.86.2.dev Seconds per case : 118.7 Total cost : 17.693
	o3-pro (high)	84.9%	$146.32	97.8%	diff
Dirname : 2025-06-28-00-38-18--o3-pro-high Test cases : 225 Model : o3-pro (high) Edit format : diff Commit hash : 5318380 Reasoning effort : high Pass rate 1 : 43.6 Pass rate 2 : 84.9 Pass num 1 : 98 Pass num 2 : 191 Percent cases well formed : 97.8 Error outputs : 20 Num malformed responses : 8 Num with malformed responses : 5 User asks : 100 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2372636 Completion tokens : 1235902 Test timeouts : 1 Total tests : 225 Command : `aider --model o3-pro` Date : 2025-06-28 Versions : 0.85.1.dev Seconds per case : 449.0 Total cost : 146.3249
	gemini-2.5-pro-preview-06-05 (32k think)	83.1%	$49.88	99.6%	diff-fenced
Dirname : 2025-06-06-16-36-21--gemini0605-32k-think-diff-fenced Test cases : 225 Model : gemini-2.5-pro-preview-06-05 (32k think) Edit format : diff-fenced Commit hash : f827f22 Thinking tokens : 32768 Pass rate 1 : 46.2 Pass rate 2 : 83.1 Pass num 1 : 104 Pass num 2 : 187 Percent cases well formed : 99.6 Error outputs : 1 Num malformed responses : 1 Num with malformed responses : 1 User asks : 112 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2719961 Completion tokens : 4648227 Test timeouts : 0 Total tests : 225 Command : `aider --model gemini/gemini-2.5-pro-preview-06-05 --thinking-tokens 32k` Date : 2025-06-06 Versions : 0.84.1.dev Seconds per case : 200.3 Total cost : 49.8822
	gpt-5 (low)	81.3%	$10.37	86.7%	diff
Dirname : 2025-08-25-14-16-37--gpt-5-low Test cases : 225 Model : gpt-5 (low) Edit format : diff Commit hash : 32faf82 Reasoning effort : low Pass rate 1 : 43.1 Pass rate 2 : 81.3 Pass num 1 : 97 Pass num 2 : 183 Percent cases well formed : 86.7 Error outputs : 46 Num malformed responses : 46 Num with malformed responses : 30 User asks : 113 Lazy comments : 1 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2534059 Completion tokens : 779568 Test timeouts : 1 Total tests : 225 Command : `aider --model openai/gpt-5` Date : 2025-08-25 Versions : 0.86.2.dev Seconds per case : 62.4 Total cost : 10.3713
	o3 (high)	81.3%	$21.23	94.7%	diff
Dirname : 2025-06-25-21-04-24--o3-price-reduction-high Test cases : 225 Model : o3 (high) Edit format : diff Commit hash : c48fea6 Reasoning effort : high Pass rate 1 : 40.0 Pass rate 2 : 81.3 Pass num 1 : 90 Pass num 2 : 183 Percent cases well formed : 94.7 Error outputs : 25 Num malformed responses : 23 Num with malformed responses : 12 User asks : 116 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Prompt tokens : 3148932 Completion tokens : 2047615 Test timeouts : 2 Total tests : 225 Command : `aider --model o3 --reasoning-effort high` Date : 2025-06-25 Versions : 0.84.1.dev Seconds per case : 197.3 Total cost : 21.2259
	grok-4 (high)	79.6%	$59.62	97.3%	diff
Dirname : 2025-07-11-19-37-40--xai-or-grok4-high Test cases : 225 Model : grok-4 (high) Edit format : diff Commit hash : f7870b6-dirty Reasoning effort : high Pass rate 1 : 40.9 Pass rate 2 : 79.6 Pass num 1 : 92 Pass num 2 : 179 Percent cases well formed : 97.3 Error outputs : 11 Num malformed responses : 8 Num with malformed responses : 6 User asks : 133 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2815347 Completion tokens : 3411480 Test timeouts : 0 Total tests : 225 Command : `aider --model openrouter/x-ai/grok-4` Date : 2025-07-11 Versions : 0.85.2.dev Seconds per case : 403.2 Total cost : 59.6182
	gemini-2.5-pro-preview-06-05 (default think)	79.1%	$45.6	100.0%	diff-fenced
Dirname : 2025-06-06-18-38-56--gemini0605-diff-fenced Test cases : 225 Model : gemini-2.5-pro-preview-06-05 (default think) Edit format : diff-fenced Commit hash : 4c161f9-dirty Pass rate 1 : 44.9 Pass rate 2 : 79.1 Pass num 1 : 101 Pass num 2 : 178 Percent cases well formed : 100.0 Error outputs : 4 Num malformed responses : 0 Num with malformed responses : 0 User asks : 105 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 4 Prompt tokens : 2751296 Completion tokens : 4142197 Test timeouts : 1 Total tests : 225 Command : `aider --model gemini/gemini-2.5-pro-preview-06-05` Date : 2025-06-06 Versions : 0.84.1.dev Seconds per case : 175.2 Total cost : 45.5961
	o3 (high) + gpt-4.1	78.2%	$17.55	100.0%	architect
Dirname : 2025-06-27-23-53-57--o3-mini-high-diff-arch Test cases : 224 Model : o3 (high) + gpt-4.1 Edit format : architect Commit hash : 4f4f00f-dirty Editor model : gpt-4.1 Editor edit format : editor-diff Reasoning effort : high Pass rate 1 : 34.8 Pass rate 2 : 78.2 Pass num 1 : 78 Pass num 2 : 176 Percent cases well formed : 100.0 Error outputs : 18 Num malformed responses : 0 Num with malformed responses : 0 User asks : 172 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Prompt tokens : 1306877 Completion tokens : 1327154 Test timeouts : 1 Total tests : 225 Command : `aider --model o3` Date : 2025-06-27 Versions : 0.85.1.dev Seconds per case : 121.8 Total cost : 17.5518
	o3	76.9%	$13.75	93.8%	diff
Dirname : 2025-06-25-20-30-16--o3-price-reduction Test cases : 225 Model : o3 Edit format : diff Commit hash : c48fea6 Pass rate 1 : 40.9 Pass rate 2 : 76.9 Pass num 1 : 92 Pass num 2 : 173 Percent cases well formed : 93.8 Error outputs : 22 Num malformed responses : 22 Num with malformed responses : 14 User asks : 108 Lazy comments : 2 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2893189 Completion tokens : 1154767 Test timeouts : 1 Total tests : 225 Command : `aider --model o3` Date : 2025-06-25 Versions : 0.84.1.dev Seconds per case : 101.7 Total cost : 13.7517
	Gemini 2.5 Pro Preview 05-06	76.9%	$37.41	97.3%	diff-fenced
Dirname : 2025-05-07-19-32-40--gemini0506-diff-fenced-completion_cost Test cases : 225 Model : Gemini 2.5 Pro Preview 05-06 Edit format : diff-fenced Commit hash : 3b08327-dirty Pass rate 1 : 36.4 Pass rate 2 : 76.9 Pass num 1 : 82 Pass num 2 : 173 Percent cases well formed : 97.3 Error outputs : 15 Num malformed responses : 7 Num with malformed responses : 6 User asks : 105 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model gemini/gemini-2.5-pro-preview-05-06` Date : 2025-05-07 Versions : 0.82.4.dev Seconds per case : 165.3 Total cost : 37.4104
	DeepSeek-V3.2-Exp (Reasoner)	74.2%	$1.3	97.3%	diff
Dirname : 2025-10-03-09-45-34--deepseek-v3.2-reasoner Test cases : 225 Model : DeepSeek-V3.2-Exp (Reasoner) Edit format : diff Commit hash : cbb5376 Pass rate 1 : 39.6 Pass rate 2 : 74.2 Pass num 1 : 89 Pass num 2 : 167 Percent cases well formed : 97.3 Error outputs : 8 Num malformed responses : 6 Num with malformed responses : 6 User asks : 67 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Prompt tokens : 2191446 Completion tokens : 1645129 Test timeouts : 1 Total tests : 225 Command : `aider --model deepseek/deepseek-reasoner` Date : 2025-10-03 Versions : 0.86.2.dev Seconds per case : 291.2 Total cost : 1.3045
	Gemini 2.5 Pro Preview 03-25	72.9%		92.4%	diff-fenced
Dirname : 2025-04-12-04-55-50--gemini-25-pro-diff-fenced Test cases : 225 Model : Gemini 2.5 Pro Preview 03-25 Edit format : diff-fenced Commit hash : 0282574 Pass rate 1 : 40.9 Pass rate 2 : 72.9 Pass num 1 : 92 Pass num 2 : 164 Percent cases well formed : 92.4 Error outputs : 21 Num malformed responses : 21 Num with malformed responses : 17 User asks : 69 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model gemini/gemini-2.5-pro-preview-03-25` Date : 2025-04-12 Versions : 0.81.3.dev Seconds per case : 45.3 Total cost : 0
	claude-opus-4-20250514 (32k thinking)	72.0%	$65.75	97.3%	diff
Dirname : 2025-05-25-20-40-51--opus4-diff-exuser Test cases : 225 Model : claude-opus-4-20250514 (32k thinking) Edit format : diff Commit hash : 9ef3211 Thinking tokens : 32000 Pass rate 1 : 37.3 Pass rate 2 : 72.0 Pass num 1 : 84 Pass num 2 : 162 Percent cases well formed : 97.3 Error outputs : 10 Num malformed responses : 6 Num with malformed responses : 6 User asks : 97 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2567514 Completion tokens : 363142 Test timeouts : 4 Total tests : 225 Command : `aider --model claude-opus-4-20250514` Date : 2025-05-25 Versions : 0.83.3.dev Seconds per case : 44.1 Total cost : 65.7484
	o4-mini (high)	72.0%	$19.64	90.7%	diff
Dirname : 2025-04-16-22-01-58--o4-mini-high-diff-exsys Test cases : 225 Model : o4-mini (high) Edit format : diff Commit hash : b66901f-dirty Pass rate 1 : 19.6 Pass rate 2 : 72.0 Pass num 1 : 44 Pass num 2 : 162 Percent cases well formed : 90.7 Error outputs : 26 Num malformed responses : 24 Num with malformed responses : 21 User asks : 66 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 2 Total tests : 225 Command : `aider --model o4-mini` Date : 2025-04-16 Versions : 0.82.1.dev Seconds per case : 176.5 Total cost : 19.6399
	DeepSeek R1 (0528)	71.4%	$4.8	94.6%	diff
Dirname : 2025-06-06-16-47-07--r1-diff Test cases : 224 Model : DeepSeek R1 (0528) Edit format : diff Commit hash : 4c161f9-dirty Pass rate 1 : 34.4 Pass rate 2 : 71.4 Pass num 1 : 77 Pass num 2 : 160 Percent cases well formed : 94.6 Error outputs : 28 Num malformed responses : 15 Num with malformed responses : 12 User asks : 105 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2644169 Completion tokens : 1842168 Test timeouts : 2 Total tests : 225 Command : `aider --model deepseek/deepseek-reasoner` Date : 2025-06-06 Versions : 0.84.1.dev Seconds per case : 716.6 Total cost : 4.8016
	claude-opus-4-20250514 (no think)	70.7%	$68.63	98.7%	diff
Dirname : 2025-05-25-19-57-20--opus4-diff-exuser Test cases : 225 Model : claude-opus-4-20250514 (no think) Edit format : diff Commit hash : 9ef3211 Pass rate 1 : 32.9 Pass rate 2 : 70.7 Pass num 1 : 74 Pass num 2 : 159 Percent cases well formed : 98.7 Error outputs : 3 Num malformed responses : 3 Num with malformed responses : 3 User asks : 105 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2671437 Completion tokens : 380717 Test timeouts : 3 Total tests : 225 Command : `aider --model claude-opus-4-20250514` Date : 2025-05-25 Versions : 0.83.3.dev Seconds per case : 42.5 Total cost : 68.6253
	DeepSeek-V3.2-Exp (Chat)	70.2%	$0.88	98.2%	diff
Dirname : 2025-10-03-09-21-36--deepseek-v3.2-chat Test cases : 225 Model : DeepSeek-V3.2-Exp (Chat) Edit format : diff Commit hash : cbb5376 Pass rate 1 : 38.7 Pass rate 2 : 70.2 Pass num 1 : 87 Pass num 2 : 158 Percent cases well formed : 98.2 Error outputs : 6 Num malformed responses : 4 Num with malformed responses : 4 User asks : 60 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Prompt tokens : 2266868 Completion tokens : 573477 Test timeouts : 4 Total tests : 225 Command : `aider --model deepseek/deepseek-chat` Date : 2025-10-03 Versions : 0.86.2.dev Seconds per case : 104.0 Total cost : 0.8756
	claude-3-7-sonnet-20250219 (32k thinking tokens)	64.9%	$36.83	97.8%	diff
Dirname : 2025-02-24-21-47-23--sonnet37-diff-think-32k-64k Test cases : 225 Model : claude-3-7-sonnet-20250219 (32k thinking tokens) Edit format : diff Commit hash : 60d11a6, 93edbda Pass rate 1 : 29.3 Pass rate 2 : 64.9 Pass num 1 : 66 Pass num 2 : 146 Percent cases well formed : 97.8 Error outputs : 66 Num malformed responses : 5 Num with malformed responses : 5 User asks : 5 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 1 Total tests : 225 Command : `aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k` Date : 2025-02-24 Versions : 0.75.1.dev Seconds per case : 105.2 Total cost : 36.8343
	DeepSeek R1 + claude-3-5-sonnet-20241022	64.0%	$13.29	100.0%	architect
Dirname : 2025-01-23-19-14-48--r1-architect-sonnet Test cases : 225 Model : DeepSeek R1 + claude-3-5-sonnet-20241022 Edit format : architect Commit hash : 05a77c7 Editor model : claude-3-5-sonnet-20241022 Editor edit format : editor-diff Pass rate 1 : 27.1 Pass rate 2 : 64.0 Pass num 1 : 61 Pass num 2 : 144 Percent cases well formed : 100.0 Error outputs : 2 Num malformed responses : 0 Num with malformed responses : 0 User asks : 392 Lazy comments : 6 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 5 Total tests : 225 Command : `aider --architect --model r1 --editor-model sonnet` Date : 2025-01-23 Versions : 0.72.3.dev Seconds per case : 251.6 Total cost : 13.2933
	o1-2024-12-17 (high)	61.7%	$186.5	91.5%	diff
Dirname : 2024-12-21-19-23-03--polyglot-o1-hard-diff Test cases : 224 Model : o1-2024-12-17 (high) Edit format : diff Commit hash : a755079-dirty Pass rate 1 : 23.7 Pass rate 2 : 61.7 Pass num 1 : 53 Pass num 2 : 139 Percent cases well formed : 91.5 Error outputs : 25 Num malformed responses : 24 Num with malformed responses : 19 User asks : 16 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model openrouter/openai/o1` Date : 2024-12-21 Versions : 0.69.2.dev Seconds per case : 133.2 Total cost : 186.4958
	claude-sonnet-4-20250514 (32k thinking)	61.3%	$26.58	97.3%	diff
Dirname : 2025-05-24-22-10-36--sonnet4-diff-exuser-think32k Test cases : 225 Model : claude-sonnet-4-20250514 (32k thinking) Edit format : diff Commit hash : e3cb907 Thinking tokens : 32000 Pass rate 1 : 25.8 Pass rate 2 : 61.3 Pass num 1 : 58 Pass num 2 : 138 Percent cases well formed : 97.3 Error outputs : 10 Num malformed responses : 10 Num with malformed responses : 6 User asks : 111 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2863068 Completion tokens : 1271074 Test timeouts : 6 Total tests : 225 Command : `aider --model claude-sonnet-4-20250514` Date : 2025-05-24 Versions : 0.83.3.dev Seconds per case : 79.9 Total cost : 26.5755
	claude-3-7-sonnet-20250219 (no thinking)	60.4%	$17.72	93.3%	diff
Dirname : 2025-02-24-19-54-07--sonnet37-diff Test cases : 225 Model : claude-3-7-sonnet-20250219 (no thinking) Edit format : diff Commit hash : 75e9ee6 Pass rate 1 : 24.4 Pass rate 2 : 60.4 Pass num 1 : 55 Pass num 2 : 136 Percent cases well formed : 93.3 Error outputs : 16 Num malformed responses : 16 Num with malformed responses : 15 User asks : 12 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 0 Total tests : 225 Command : `aider --model sonnet` Date : 2025-02-24 Versions : 0.74.4.dev Seconds per case : 28.3 Total cost : 17.7191
	o3-mini (high)	60.4%	$18.16	93.3%	diff
Dirname : 2025-01-31-20-42-47--o3-mini-diff-high Test cases : 224 Model : o3-mini (high) Edit format : diff Commit hash : b0d58d1-dirty Pass rate 1 : 21.0 Pass rate 2 : 60.4 Pass num 1 : 47 Pass num 2 : 136 Percent cases well formed : 93.3 Error outputs : 26 Num malformed responses : 24 Num with malformed responses : 15 User asks : 19 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 7 Total tests : 225 Command : `aider --model o3-mini --reasoning-effort high` Date : 2025-01-31 Versions : 0.72.4.dev Seconds per case : 124.6 Total cost : 18.1584
	Qwen3 235B A22B diff, no think, Alibaba API	59.6%		92.9%	diff
Dirname : 2025-05-09-17-02-02--qwen3-235b-a22b.unthink_16k_diff Test cases : 225 Model : Qwen3 235B A22B diff, no think, Alibaba API Edit format : diff Commit hash : 91d7fbd-dirty Pass rate 1 : 28.9 Pass rate 2 : 59.6 Pass num 1 : 65 Pass num 2 : 134 Percent cases well formed : 92.9 Error outputs : 22 Num malformed responses : 22 Num with malformed responses : 16 User asks : 111 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2816192 Completion tokens : 342062 Test timeouts : 1 Total tests : 225 Command : `aider --model openai/qwen3-235b-a22b` Date : 2025-05-09 Versions : 0.82.4.dev Seconds per case : 45.4 Total cost : 0.0
	Kimi K2	59.1%	$1.24	92.9%	diff
Dirname : 2025-07-17-17-41-54--kimi-k2-diff-or-pricing Test cases : 225 Model : Kimi K2 Edit format : diff Commit hash : 915ebff-dirty Pass rate 1 : 20.4 Pass rate 2 : 59.1 Pass num 1 : 46 Pass num 2 : 133 Percent cases well formed : 92.9 Error outputs : 19 Num malformed responses : 19 Num with malformed responses : 16 User asks : 61 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 2355141 Completion tokens : 363846 Test timeouts : 4 Total tests : 225 Command : `aider --model openrouter/moonshotai/kimi-k2` Date : 2025-07-17 Versions : 0.85.2.dev Seconds per case : 67.6 Total cost : 1.2357
	DeepSeek R1	56.9%	$5.42	96.9%	diff
Dirname : 2025-01-20-19-11-38--ds-turns-upd-cur-msgs-fix-with-summarizer Test cases : 225 Model : DeepSeek R1 Edit format : diff Commit hash : 5650697-dirty Pass rate 1 : 26.7 Pass rate 2 : 56.9 Pass num 1 : 60 Pass num 2 : 128 Percent cases well formed : 96.9 Error outputs : 8 Num malformed responses : 7 Num with malformed responses : 7 User asks : 15 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 5 Total tests : 225 Command : `aider --model deepseek/deepseek-reasoner` Date : 2025-01-20 Versions : 0.71.2.dev Seconds per case : 113.7 Total cost : 5.4193
	claude-sonnet-4-20250514 (no thinking)	56.4%	$15.82	98.2%	diff
Dirname : 2025-05-24-21-17-54--sonnet4-diff-exuser Test cases : 225 Model : claude-sonnet-4-20250514 (no thinking) Edit format : diff Commit hash : ef3f8bb-dirty Pass rate 1 : 20.4 Pass rate 2 : 56.4 Pass num 1 : 46 Pass num 2 : 127 Percent cases well formed : 98.2 Error outputs : 6 Num malformed responses : 4 Num with malformed responses : 4 User asks : 129 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Prompt tokens : 3460663 Completion tokens : 433373 Test timeouts : 7 Total tests : 225 Command : `aider --model claude-sonnet-4-20250514` Date : 2025-05-24 Versions : 0.83.3.dev Seconds per case : 29.8 Total cost : 15.8155
	gemini-2.5-flash-preview-05-20 (24k think)	55.1%	$8.56	95.6%	diff
Dirname : 2025-05-25-22-58-44--flash25-05-20-24k-think Test cases : 225 Model : gemini-2.5-flash-preview-05-20 (24k think) Edit format : diff Commit hash : a8568c3-dirty Thinking tokens : 24576 Pass rate 1 : 26.2 Pass rate 2 : 55.1 Pass num 1 : 59 Pass num 2 : 124 Percent cases well formed : 95.6 Error outputs : 15 Num malformed responses : 15 Num with malformed responses : 10 User asks : 101 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 3666792 Completion tokens : 2703162 Test timeouts : 4 Total tests : 225 Command : `aider --model gemini/gemini-2.5-flash-preview-05-20` Date : 2025-05-25 Versions : 0.83.3.dev Seconds per case : 53.9 Total cost : 8.5625
	DeepSeek V3 (0324)	55.1%	$1.12	99.6%	diff
Dirname : 2025-03-24-15-41-33--deepseek-v3-0324-polyglot-diff Test cases : 225 Model : DeepSeek V3 (0324) Edit format : diff Commit hash : 502b863 Pass rate 1 : 28.0 Pass rate 2 : 55.1 Pass num 1 : 63 Pass num 2 : 124 Percent cases well formed : 99.6 Error outputs : 32 Num malformed responses : 1 Num with malformed responses : 1 User asks : 96 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 2 Test timeouts : 4 Total tests : 225 Command : `aider --model deepseek/deepseek-chat` Date : 2025-03-24 Versions : 0.78.1.dev Seconds per case : 290.0 Total cost : 1.1164
	Quasar Alpha	54.7%		98.2%	diff
Dirname : 2025-04-04-02-57-25--qalpha-diff-exsys Test cases : 225 Model : Quasar Alpha Edit format : diff Commit hash : 8a34a6c-dirty Pass rate 1 : 21.8 Pass rate 2 : 54.7 Pass num 1 : 49 Pass num 2 : 123 Percent cases well formed : 98.2 Error outputs : 4 Num malformed responses : 4 Num with malformed responses : 4 User asks : 187 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 4 Total tests : 225 Command : `aider --model openrouter/openrouter/quasar-alpha` Date : 2025-04-04 Versions : 0.80.5.dev Seconds per case : 14.8 Total cost : 0.0
	o3-mini (medium)	53.8%	$8.86	95.1%	diff
Dirname : 2025-01-31-20-27-46--o3-mini-diff2 Test cases : 225 Model : o3-mini (medium) Edit format : diff Commit hash : 2fb517b-dirty Pass rate 1 : 19.1 Pass rate 2 : 53.8 Pass num 1 : 43 Pass num 2 : 121 Percent cases well formed : 95.1 Error outputs : 28 Num malformed responses : 28 Num with malformed responses : 11 User asks : 17 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model o3-mini` Date : 2025-01-31 Versions : 0.72.4.dev Seconds per case : 47.2 Total cost : 8.8599
	Grok 3 Beta	53.3%	$11.03	99.6%	diff
Dirname : 2025-04-10-04-21-31--grok3-diff-exuser Test cases : 225 Model : Grok 3 Beta Edit format : diff Commit hash : 2dd40fc-dirty Pass rate 1 : 22.2 Pass rate 2 : 53.3 Pass num 1 : 50 Pass num 2 : 120 Percent cases well formed : 99.6 Error outputs : 1 Num malformed responses : 1 Num with malformed responses : 1 User asks : 68 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model openrouter/x-ai/grok-3-beta` Date : 2025-04-10 Versions : 0.81.2.dev Seconds per case : 15.3 Total cost : 11.0338
	Optimus Alpha	52.9%		97.3%	diff
Dirname : 2025-04-10-19-02-44--oalpha-diff-exsys Test cases : 225 Model : Optimus Alpha Edit format : diff Commit hash : 532bc45-dirty Pass rate 1 : 21.3 Pass rate 2 : 52.9 Pass num 1 : 48 Pass num 2 : 119 Percent cases well formed : 97.3 Error outputs : 7 Num malformed responses : 6 Num with malformed responses : 6 User asks : 182 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 3 Total tests : 225 Command : `aider --model openrouter/openrouter/optimus-alpha` Date : 2025-04-10 Versions : 0.81.2.dev Seconds per case : 18.4 Total cost : 0.0
	gpt-4.1	52.4%	$9.86	98.2%	diff
Dirname : 2025-04-14-21-05-54--gpt41-diff-exuser Test cases : 225 Model : gpt-4.1 Edit format : diff Commit hash : 7a87db5-dirty Pass rate 1 : 20.0 Pass rate 2 : 52.4 Pass num 1 : 45 Pass num 2 : 118 Percent cases well formed : 98.2 Error outputs : 6 Num malformed responses : 5 Num with malformed responses : 4 User asks : 171 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 5 Total tests : 225 Command : `aider --model gpt-4.1` Date : 2025-04-14 Versions : 0.81.4.dev Seconds per case : 20.5 Total cost : 9.8556
	claude-3-5-sonnet-20241022	51.6%	$14.41	99.6%	diff
Dirname : 2025-01-17-19-44-33--sonnet-baseline-jan-17 Test cases : 225 Model : claude-3-5-sonnet-20241022 Edit format : diff Commit hash : 6451d59 Pass rate 1 : 22.2 Pass rate 2 : 51.6 Pass num 1 : 50 Pass num 2 : 116 Percent cases well formed : 99.6 Error outputs : 2 Num malformed responses : 1 Num with malformed responses : 1 User asks : 11 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 8 Total tests : 225 Command : `aider --model claude-3-5-sonnet-20241022` Date : 2025-01-17 Versions : 0.71.2.dev Seconds per case : 21.4 Total cost : 14.4063
	Grok 3 Mini Beta (high)	49.3%	$0.73	99.6%	whole
Dirname : 2025-04-10-23-59-02--xai-grok3-mini-whole-high Test cases : 225 Model : Grok 3 Mini Beta (high) Edit format : whole Commit hash : 8ee33da-dirty Pass rate 1 : 17.3 Pass rate 2 : 49.3 Pass num 1 : 39 Pass num 2 : 111 Percent cases well formed : 99.6 Error outputs : 1 Num malformed responses : 1 Num with malformed responses : 1 User asks : 64 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 0 Total tests : 225 Command : `aider --model xai/grok-3-mini-beta --reasoning-effort high` Date : 2025-04-10 Versions : 0.81.3.dev Seconds per case : 79.1 Total cost : 0.7346
	DeepSeek Chat V3 (prev)	48.4%	$0.34	98.7%	diff
Dirname : 2024-12-25-13-31-51--deepseekv3preview-diff2 Test cases : 225 Model : DeepSeek Chat V3 (prev) Edit format : diff Commit hash : 0a23c4a-dirty Pass rate 1 : 22.7 Pass rate 2 : 48.4 Pass num 1 : 51 Pass num 2 : 109 Percent cases well formed : 98.7 Error outputs : 7 Num malformed responses : 7 Num with malformed responses : 3 User asks : 19 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 8 Total tests : 225 Command : `aider --model deepseek/deepseek-chat` Date : 2024-12-25 Versions : 0.69.2.dev Seconds per case : 34.8 Total cost : 0.3369
	gemini-2.5-flash-preview-04-17 (default)	47.1%	$1.85	85.3%	diff
Dirname : 2025-04-20-19-54-31--flash25-diff-no-think Test cases : 225 Model : gemini-2.5-flash-preview-04-17 (default) Edit format : diff Commit hash : 7fcce5d-dirty Pass rate 1 : 21.8 Pass rate 2 : 47.1 Pass num 1 : 49 Pass num 2 : 106 Percent cases well formed : 85.3 Error outputs : 60 Num malformed responses : 55 Num with malformed responses : 33 User asks : 82 Lazy comments : 1 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 5 Test timeouts : 4 Total tests : 225 Command : `aider --model gemini/gemini-2.5-flash-preview-04-17` Date : 2025-04-20 Versions : 0.82.3.dev Seconds per case : 50.1 Total cost : 1.8451
	chatgpt-4o-latest (2025-03-29)	45.3%	$19.74	64.4%	diff
Dirname : 2025-03-29-05-24-55--chatgpt4o-mar28-diff Test cases : 225 Model : chatgpt-4o-latest (2025-03-29) Edit format : diff Commit hash : 0decbad Pass rate 1 : 16.4 Pass rate 2 : 45.3 Pass num 1 : 37 Pass num 2 : 102 Percent cases well formed : 64.4 Error outputs : 85 Num malformed responses : 85 Num with malformed responses : 80 User asks : 174 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 4 Total tests : 225 Command : `aider --model chatgpt-4o-latest` Date : 2025-03-29 Versions : 0.79.3.dev Seconds per case : 10.3 Total cost : 19.7416
	gpt-4.5-preview	44.9%	$183.18	97.3%	diff
Dirname : 2025-02-27-20-26-15--gpt45-diff3 Test cases : 224 Model : gpt-4.5-preview Edit format : diff Commit hash : b462e55-dirty Pass rate 1 : 22.3 Pass rate 2 : 44.9 Pass num 1 : 50 Pass num 2 : 101 Percent cases well formed : 97.3 Error outputs : 10 Num malformed responses : 8 Num with malformed responses : 6 User asks : 15 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 2 Total tests : 225 Command : `aider --model openai/gpt-4.5-preview` Date : 2025-02-27 Versions : 0.75.2.dev Seconds per case : 113.5 Total cost : 183.1802
	gemini-2.5-flash-preview-05-20 (no think)	44.0%	$1.14	93.8%	diff
Dirname : 2025-05-26-15-56-31--flash25-05-20-24k-think Test cases : 225 Model : gemini-2.5-flash-preview-05-20 (no think) Edit format : diff Commit hash : 214b811-dirty Thinking tokens : 0 Pass rate 1 : 20.9 Pass rate 2 : 44.0 Pass num 1 : 47 Pass num 2 : 99 Percent cases well formed : 93.8 Error outputs : 16 Num malformed responses : 16 Num with malformed responses : 14 User asks : 79 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 5512458 Completion tokens : 514145 Test timeouts : 4 Total tests : 225 Command : `aider --model gemini/gemini-2.5-flash-preview-05-20` Date : 2025-05-26 Versions : 0.83.3.dev Seconds per case : 12.2 Total cost : 1.1354
	gpt-oss-120b (high)	41.8%	$0.74	79.1%	diff
Dirname : 2025-08-06-04-54-48--gpt-oss-120b-high-polyglot Test cases : 225 Model : gpt-oss-120b (high) Edit format : diff Commit hash : 1af0e59 Pass rate 1 : 13.8 Pass rate 2 : 41.8 Pass num 1 : 31 Pass num 2 : 94 Percent cases well formed : 79.1 Error outputs : 95 Num malformed responses : 77 Num with malformed responses : 47 User asks : 142 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Prompt tokens : 3123768 Completion tokens : 856495 Test timeouts : 4 Total tests : 225 Command : `aider --model openrouter/openai/gpt-oss-120b --reasoning-effort high` Date : 2025-08-06 Versions : 0.85.3.dev Seconds per case : 35.5 Total cost : 0.7406
	Qwen3 32B	40.0%	$0.76	83.6%	diff
Dirname : 2025-05-08-03-20-24--qwen3-32b-default Test cases : 225 Model : Qwen3 32B Edit format : diff Commit hash : aaacee5-dirty, aeaf259 Pass rate 1 : 14.2 Pass rate 2 : 40.0 Pass num 1 : 32 Pass num 2 : 90 Percent cases well formed : 83.6 Error outputs : 119 Num malformed responses : 50 Num with malformed responses : 37 User asks : 97 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 12 Prompt tokens : 317591 Completion tokens : 120418 Test timeouts : 5 Total tests : 225 Command : `aider --model openrouter/qwen/qwen3-32b` Date : 2025-05-08 Versions : 0.82.4.dev Seconds per case : 372.2 Total cost : 0.7603
	gemini-exp-1206	38.2%		98.2%	whole
Dirname : 2024-12-22-18-43-25--gemini-exp-1206-polyglot-whole-2 Test cases : 225 Model : gemini-exp-1206 Edit format : whole Commit hash : b1bc2f8 Pass rate 1 : 19.6 Pass rate 2 : 38.2 Pass num 1 : 44 Pass num 2 : 86 Percent cases well formed : 98.2 Error outputs : 8 Num malformed responses : 8 Num with malformed responses : 4 User asks : 32 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 9 Total tests : 225 Command : `aider --model gemini/gemini-exp-1206` Date : 2024-12-22 Versions : 0.69.2.dev Seconds per case : 45.5 Total cost : 0.0
	Gemini 2.0 Pro exp-02-05	35.6%		100.0%	whole
Dirname : 2025-02-25-20-23-07--gemini-pro Test cases : 225 Model : Gemini 2.0 Pro exp-02-05 Edit format : whole Commit hash : 2fccd47 Pass rate 1 : 20.4 Pass rate 2 : 35.6 Pass num 1 : 46 Pass num 2 : 80 Percent cases well formed : 100.0 Error outputs : 430 Num malformed responses : 0 Num with malformed responses : 0 User asks : 13 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 5 Total tests : 225 Command : `aider --model gemini/gemini-2.0-pro-exp-02-05` Date : 2025-02-25 Versions : 0.75.2.dev Seconds per case : 34.8 Total cost : 0.0
	Grok 3 Mini Beta (low)	34.7%	$0.79	100.0%	whole
Dirname : 2025-04-10-18-47-24--grok3-mini-whole-exuser Test cases : 225 Model : Grok 3 Mini Beta (low) Edit format : whole Commit hash : 14ffe77-dirty Pass rate 1 : 11.1 Pass rate 2 : 34.7 Pass num 1 : 25 Pass num 2 : 78 Percent cases well formed : 100.0 Error outputs : 3 Num malformed responses : 0 Num with malformed responses : 0 User asks : 73 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 5 Total tests : 225 Command : `aider --model openrouter/x-ai/grok-3-mini-beta` Date : 2025-04-10 Versions : 0.81.2.dev Seconds per case : 35.1 Total cost : 0.7856
	o1-mini-2024-09-12	32.9%	$18.58	96.9%	whole
Dirname : 2024-12-22-21-26-35--polyglot-o1mini-whole Test cases : 225 Model : o1-mini-2024-09-12 Edit format : whole Commit hash : 37df899 Pass rate 1 : 5.8 Pass rate 2 : 32.9 Pass num 1 : 13 Pass num 2 : 74 Percent cases well formed : 96.9 Error outputs : 8 Num malformed responses : 8 Num with malformed responses : 7 User asks : 27 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 3 Total tests : 225 Command : `aider --model o1-mini` Date : 2024-12-22 Versions : 0.69.2.dev Seconds per case : 34.7 Total cost : 18.577
	gpt-4.1-mini	32.4%	$1.99	92.4%	diff
Dirname : 2025-04-14-21-27-53--gpt41mini-diff Test cases : 225 Model : gpt-4.1-mini Edit format : diff Commit hash : ffb743e-dirty Pass rate 1 : 11.1 Pass rate 2 : 32.4 Pass num 1 : 25 Pass num 2 : 73 Percent cases well formed : 92.4 Error outputs : 64 Num malformed responses : 62 Num with malformed responses : 17 User asks : 159 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 2 Test timeouts : 2 Total tests : 225 Command : `aider --model gpt-4.1-mini` Date : 2025-04-14 Versions : 0.81.4.dev Seconds per case : 19.5 Total cost : 1.9918
	claude-3-5-haiku-20241022	28.0%	$6.06	91.1%	diff
Dirname : 2024-12-21-21-46-27--polyglot-haiku-diff Test cases : 225 Model : claude-3-5-haiku-20241022 Edit format : diff Commit hash : a755079-dirty Pass rate 1 : 7.1 Pass rate 2 : 28.0 Pass num 1 : 16 Pass num 2 : 63 Percent cases well formed : 91.1 Error outputs : 31 Num malformed responses : 30 Num with malformed responses : 20 User asks : 13 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 9 Total tests : 225 Command : `aider --model claude-3-5-haiku-20241022` Date : 2024-12-21 Versions : 0.69.2.dev Seconds per case : 31.8 Total cost : 6.0583
	chatgpt-4o-latest (2025-02-15)	27.1%	$14.37	93.3%	diff
Dirname : 2025-02-15-19-51-22--chatgpt4o-feb15-diff Test cases : 223 Model : chatgpt-4o-latest (2025-02-15) Edit format : diff Commit hash : 108ce18-dirty Pass rate 1 : 9.0 Pass rate 2 : 27.1 Pass num 1 : 20 Pass num 2 : 61 Percent cases well formed : 93.3 Error outputs : 66 Num malformed responses : 21 Num with malformed responses : 15 User asks : 57 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 2 Total tests : 225 Command : `aider --model chatgpt-4o-latest` Date : 2025-02-15 Versions : 0.74.3.dev Seconds per case : 12.4 Total cost : 14.3703
	QwQ-32B + Qwen 2.5 Coder Instruct	26.2%		100.0%	architect
Dirname : 2025-03-07-15-11-27--qwq32b-arch-temp-topp-again Test cases : 225 Model : QwQ-32B + Qwen 2.5 Coder Instruct Edit format : architect Commit hash : 52162a5 Editor model : fireworks_ai/accounts/fireworks/models/qwen2p5-coder-32b-instruct Editor edit format : editor-diff Pass rate 1 : 9.8 Pass rate 2 : 26.2 Pass num 1 : 22 Pass num 2 : 59 Percent cases well formed : 100.0 Error outputs : 122 Num malformed responses : 0 Num with malformed responses : 0 User asks : 489 Lazy comments : 8 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 2 Total tests : 225 Command : `aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect` Date : 2025-03-07 Versions : 0.75.3.dev Seconds per case : 137.4 Total cost : 0
	gpt-4o-2024-08-06	23.1%	$7.03	94.2%	diff
Dirname : 2024-12-30-20-44-54--gpt4o-ex-as-sys-clean-prompt Test cases : 225 Model : gpt-4o-2024-08-06 Edit format : diff Commit hash : 09ee197-dirty Pass rate 1 : 4.9 Pass rate 2 : 23.1 Pass num 1 : 11 Pass num 2 : 52 Percent cases well formed : 94.2 Error outputs : 21 Num malformed responses : 21 Num with malformed responses : 13 User asks : 65 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 3 Total tests : 225 Command : `aider --model gpt-4o-2024-08-06` Date : 2024-12-30 Versions : 0.70.1.dev Seconds per case : 16.0 Total cost : 7.0286
	gemini-2.0-flash-exp	22.2%		100.0%	whole
Dirname : 2024-12-22-20-08-13--gemini-2.0-flash-exp-polyglot-whole Test cases : 225 Model : gemini-2.0-flash-exp Edit format : whole Commit hash : b1bc2f8 Pass rate 1 : 11.6 Pass rate 2 : 22.2 Pass num 1 : 26 Pass num 2 : 50 Percent cases well formed : 100.0 Error outputs : 1 Num malformed responses : 0 Num with malformed responses : 0 User asks : 9 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 8 Total tests : 225 Command : `aider --model gemini/gemini-2.0-flash-exp` Date : 2024-12-22 Versions : 0.69.2.dev Seconds per case : 12.2 Total cost : 0.0
	qwen-max-2025-01-25	21.8%		90.2%	diff
Dirname : 2025-01-28-16-00-03--qwen-max-2025-01-25-polyglot-diff Test cases : 225 Model : qwen-max-2025-01-25 Edit format : diff Commit hash : ae7d459 Pass rate 1 : 9.3 Pass rate 2 : 21.8 Pass num 1 : 21 Pass num 2 : 49 Percent cases well formed : 90.2 Error outputs : 46 Num malformed responses : 44 Num with malformed responses : 22 User asks : 23 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 9 Total tests : 225 Command : `OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25` Date : 2025-01-28 Versions : 0.72.4.dev Seconds per case : 39.5
	QwQ-32B	20.9%		67.6%	diff
Dirname : 2025-03-06-17-40-24--qwq32b-diff-temp-topp-ex-sys-remind-user-for-real Test cases : 225 Model : QwQ-32B Edit format : diff Commit hash : 51d118f-dirty Pass rate 1 : 8.0 Pass rate 2 : 20.9 Pass num 1 : 18 Pass num 2 : 47 Percent cases well formed : 67.6 Error outputs : 145 Num malformed responses : 143 Num with malformed responses : 73 User asks : 17 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 4 Total tests : 225 Command : `aider --model fireworks_ai/accounts/fireworks/models/qwq-32b` Date : 2025-03-06 Versions : 0.75.3.dev Seconds per case : 228.6 Total cost : 0.0
	gemini-2.0-flash-thinking-exp-01-21	18.2%		77.8%	diff
Dirname : 2025-01-21-22-51-49--gemini-2.0-flash-thinking-exp-01-21-polyglot-diff Test cases : 225 Model : gemini-2.0-flash-thinking-exp-01-21 Edit format : diff Commit hash : 843720a Pass rate 1 : 5.8 Pass rate 2 : 18.2 Pass num 1 : 13 Pass num 2 : 41 Percent cases well formed : 77.8 Error outputs : 182 Num malformed responses : 180 Num with malformed responses : 50 User asks : 26 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 2 Test timeouts : 7 Total tests : 225 Command : `aider --model gemini/gemini-2.0-flash-thinking-exp-01-21` Date : 2025-01-21 Versions : 0.72.2.dev Seconds per case : 24.2 Total cost : 0.0
	gpt-4o-2024-11-20	18.2%	$6.74	95.1%	diff
Dirname : 2024-12-30-20-57-12--gpt-4o-2024-11-20-ex-as-sys Test cases : 225 Model : gpt-4o-2024-11-20 Edit format : diff Commit hash : 09ee197-dirty Pass rate 1 : 4.9 Pass rate 2 : 18.2 Pass num 1 : 11 Pass num 2 : 41 Percent cases well formed : 95.1 Error outputs : 12 Num malformed responses : 12 Num with malformed responses : 11 User asks : 53 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 12 Total tests : 225 Command : `aider --model gpt-4o-2024-11-20` Date : 2024-12-30 Versions : 0.70.1.dev Seconds per case : 12.1 Total cost : 6.7351
	DeepSeek Chat V2.5	17.8%	$0.51	92.9%	diff
Dirname : 2024-12-21-20-56-21--polyglot-deepseek-diff Test cases : 225 Model : DeepSeek Chat V2.5 Edit format : diff Commit hash : a755079-dirty Pass rate 1 : 5.3 Pass rate 2 : 17.8 Pass num 1 : 12 Pass num 2 : 40 Percent cases well formed : 92.9 Error outputs : 42 Num malformed responses : 37 Num with malformed responses : 16 User asks : 23 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 5 Test timeouts : 5 Total tests : 225 Command : `aider --model deepseek/deepseek-chat` Date : 2024-12-21 Versions : 0.69.2.dev Seconds per case : 184.0 Total cost : 0.5101
	Qwen2.5-Coder-32B-Instruct	16.4%		99.6%	whole
Dirname : 2024-12-26-00-55-20--Qwen2.5-Coder-32B-Instruct Test cases : 225 Model : Qwen2.5-Coder-32B-Instruct Edit format : whole Commit hash : b51768b0 Pass rate 1 : 4.9 Pass rate 2 : 16.4 Pass num 1 : 11 Pass num 2 : 37 Percent cases well formed : 99.6 Error outputs : 1 Num malformed responses : 1 Num with malformed responses : 1 User asks : 33 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 6 Total tests : 225 Command : `aider --model openai/Qwen2.5-Coder-32B-Instruct` Date : 2024-12-26 Versions : 0.69.2.dev Seconds per case : 42.0 Total cost : 0.0
	Llama 4 Maverick	15.6%		99.1%	whole
Dirname : 2025-04-06-08-39-52--llama-4-maverick-17b-128e-instruct-polyglot-whole Test cases : 225 Model : Llama 4 Maverick Edit format : whole Commit hash : 9445a31 Pass rate 1 : 4.4 Pass rate 2 : 15.6 Pass num 1 : 10 Pass num 2 : 35 Percent cases well formed : 99.1 Error outputs : 12 Num malformed responses : 2 Num with malformed responses : 2 User asks : 248 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 4 Total tests : 225 Command : `aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct` Date : 2025-04-06 Versions : 0.81.2.dev Seconds per case : 20.5 Total cost : 0.0
	yi-lightning	12.9%		92.9%	whole
Dirname : 2024-12-23-01-11-56--yi-test Test cases : 225 Model : yi-lightning Edit format : whole Commit hash : 2b1625e Pass rate 1 : 5.8 Pass rate 2 : 12.9 Pass num 1 : 13 Pass num 2 : 29 Percent cases well formed : 92.9 Error outputs : 87 Num malformed responses : 72 Num with malformed responses : 16 User asks : 107 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 6 Total tests : 225 Command : `aider --model openai/yi-lightning` Date : 2024-12-23 Versions : 0.69.2.dev Seconds per case : 146.7 Total cost : 0.0
	command-a-03-2025-quality	12.0%		99.6%	whole
Dirname : 2025-03-14-23-40-00--cmda-quality-whole2 Test cases : 225 Model : command-a-03-2025-quality Edit format : whole Commit hash : a1aa63f Pass rate 1 : 2.2 Pass rate 2 : 12.0 Pass num 1 : 5 Pass num 2 : 27 Percent cases well formed : 99.6 Error outputs : 2 Num malformed responses : 1 Num with malformed responses : 1 User asks : 215 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 4 Total tests : 225 Command : `OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality` Date : 2025-03-14 Versions : 0.77.1.dev Seconds per case : 85.1 Total cost : 0.0
	Codestral 25.01	11.1%	$1.98	100.0%	whole
Dirname : 2025-01-13-18-17-25--codestral-whole2 Test cases : 225 Model : Codestral 25.01 Edit format : whole Commit hash : 0cba898-dirty Pass rate 1 : 4.0 Pass rate 2 : 11.1 Pass num 1 : 9 Pass num 2 : 25 Percent cases well formed : 100.0 Error outputs : 0 Num malformed responses : 0 Num with malformed responses : 0 User asks : 47 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 4 Total tests : 225 Command : `aider --model mistral/codestral-latest` Date : 2025-01-13 Versions : 0.71.2.dev Seconds per case : 9.3 Total cost : 1.9834
	openhands-lm-32b-v0.1	10.2%		95.1%	whole
Dirname : 2025-04-19-14-43-04--o4-mini-patch Test cases : 225 Model : openhands-lm-32b-v0.1 Edit format : whole Commit hash : c08336f Pass rate 1 : 4.0 Pass rate 2 : 10.2 Pass num 1 : 9 Pass num 2 : 23 Percent cases well formed : 95.1 Error outputs : 55 Num malformed responses : 41 Num with malformed responses : 11 User asks : 166 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 11 Total tests : 225 Command : `aider --model openrouter/all-hands/openhands-lm-32b-v0.1` Date : 2025-04-19 Versions : 0.82.2.dev Seconds per case : 195.6 Total cost : 0.0
	gpt-4.1-nano	8.9%	$0.43	94.2%	whole
Dirname : 2025-04-14-22-46-01--gpt41nano-diff Test cases : 225 Model : gpt-4.1-nano Edit format : whole Commit hash : 71d1591-dirty Pass rate 1 : 3.1 Pass rate 2 : 8.9 Pass num 1 : 7 Pass num 2 : 20 Percent cases well formed : 94.2 Error outputs : 20 Num malformed responses : 20 Num with malformed responses : 13 User asks : 316 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 8 Total tests : 225 Command : `aider --model gpt-4.1-nano` Date : 2025-04-14 Versions : 0.81.4.dev Seconds per case : 12.0 Total cost : 0.4281
	Qwen2.5-Coder-32B-Instruct	8.0%		71.6%	diff
Dirname : 2024-12-22-13-22-32--polyglot-qwen-diff Test cases : 225 Model : Qwen2.5-Coder-32B-Instruct Edit format : diff Commit hash : 6d7e8be-dirty Pass rate 1 : 4.4 Pass rate 2 : 8.0 Pass num 1 : 10 Pass num 2 : 18 Percent cases well formed : 71.6 Error outputs : 158 Num malformed responses : 148 Num with malformed responses : 64 User asks : 132 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 2 Total tests : 225 Command : `aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic` Date : 2024-12-22 Versions : 0.69.2.dev Seconds per case : 84.4 Total cost : 0.0
	gemma-3-27b-it	4.9%		100.0%	whole
Dirname : 2025-03-15-01-21-24--gemma3-27b-or Test cases : 225 Model : gemma-3-27b-it Edit format : whole Commit hash : fd21f51-dirty Pass rate 1 : 1.8 Pass rate 2 : 4.9 Pass num 1 : 4 Pass num 2 : 11 Percent cases well formed : 100.0 Error outputs : 3 Num malformed responses : 0 Num with malformed responses : 0 User asks : 181 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 1 Test timeouts : 3 Total tests : 225 Command : `aider --model openrouter/google/gemma-3-27b-it` Date : 2025-03-15 Versions : 0.77.1.dev Seconds per case : 79.7 Total cost : 0.0
	gpt-4o-mini-2024-07-18	3.6%	$0.32	100.0%	whole
Dirname : 2024-12-21-18-41-18--polyglot-gpt-4o-mini Test cases : 225 Model : gpt-4o-mini-2024-07-18 Edit format : whole Commit hash : a755079-dirty Pass rate 1 : 0.9 Pass rate 2 : 3.6 Pass num 1 : 2 Pass num 2 : 8 Percent cases well formed : 100.0 Error outputs : 0 Num malformed responses : 0 Num with malformed responses : 0 User asks : 36 Lazy comments : 0 Syntax errors : 0 Indentation errors : 0 Exhausted context windows : 0 Test timeouts : 3 Total tests : 225 Command : `aider --model gpt-4o-mini-2024-07-18` Date : 2024-12-21 Versions : 0.69.2.dev Seconds per case : 17.3 Total cost : 0.3236

By Paul Gauthier, last updated November 20, 2025.

LLM Leaderboards

cecli polyglot coding leaderboard

Table of contents