ChemSets Leaderboard
How Capable Are Chemistry LLMs?
⚠️ This leaderboard is outdated
Please check out the new leaderboard for the updated benchmark MolecularIQ:
https://huggingface.co/spaces/ml-jku/molecularIQ_leaderboard
| Rank | Model | Size | Reasoning | SymMolic | ChemIQ | Ether0 |
|---|---|---|---|---|---|---|
| 🥇 | GPT-oss-120b-high | 120B (A5B) | Yes | 57.200 | 65.600 | 18.900 |
| 🥈 | GPT-oss-20b-high | 20B (A4B) | Yes | 51.100 | 47.400 | 13.500 |
| 🥉 | Qwen3-Think-235B | 235B (A22B) | Yes | 50.100 | 65.500 | 9.200 |
| 4 | GPT-oss-120b-medium | 120B (A5B) | Yes | 43.100 | 36.900 | 15.900 |
| 5 | Qwen3-Think-30B | 30B (A3B) | Yes | 34.800 | 31.700 | 4.100 |
| 6 | GPT-oss-20b-medium | 20B (A4B) | Yes | 33.800 | 20.800 | 10.000 |
| 7 | Qwen3-32b | 32B | Yes | 28.200 | 22.600 | 2.800 |
| 8 | Qwen3-14b | 14B | Yes | 24.900 | 12.200 | 3.700 |
| 9 | Qwen3-8b | 8B | Yes | 19.300 | 12.000 | 4.100 |
| 10 | Llama-molinst | 8B | No | 8.900 | 3.200 | 0.600 |
| 11 | LlaSMol-Mistral | 7B | No | 3.600 | 1.600 | 0.400 |
| 12 | ChemDFM-8B | 8B | No | 3.300 | 1.100 | 1.900 |
| 13 | Txgemma-27b | 27B | No | 3.000 | 4.000 | 3.000 |
| 14 | ChemLLM-7B | 7B | No | 2.400 | 0.700 | 0.400 |
| 15 | Ether0 | 24B | Yes | 2.400 | 13.100 | 45.900 |
| 16 | ChemDFM-13B | 13B | No | 2.300 | 1.400 | 0.900 |
| 17 | Txgemma-9b | 9B | No | 0.700 | 2.600 | 3.900 |
Scoring: Models receive a binary reward (1 for correct, 0 for incorrect) for each question. The final score per question is the average across three rollouts. The column values shown represent the average of these scores across all questions in that category.
Rank: Based on SymMolic score (descending)