back to top

OpenChat’s latest model surpasses both ChatGPT and Grok

OpenChat has introduced a state-of-the-art open-source 7B LLM, surpassing both ChatGPT (as of March) and Grok-1 in performance across various benchmarks.

Key Improvements:

1. Improved training approach.
2. Enhanced learning abilities within a specific context.
3. Advanced coding capabilities surpassing previous versions.

Benchmarks

| Model                 | # Params | Average  | MT-Bench     | HumanEval       | BBH MC   | AGIEval  | TruthfulQA    | MMLU         | GSM8K        | BBH CoT     |
|-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
| **OpenChat-3.5-0106** | **7B**   | **64.5** | 7.8          | **71.3**        | **51.5** | **49.1** | 61.0          | 65.8         | **77.4**     | 62.2        |
| OpenChat-3.5-1210     | **7B**   | 63.8     | 7.76         | 68.9            | 49.5     | 48.0     | **61.8**      | 65.3         | 77.3         | 61.8        |
| OpenChat-3.5          | **7B**   | 61.6     | 7.81         | 55.5            | 47.6     | 47.4     | 59.1          | 64.3         | 77.3         | 63.5        |
| ChatGPT (March)*      | ???B     | 61.5     | **7.94**     | 48.1            | 47.6     | 47.1     | 57.7          | **67.3**     | 74.9         | **70.1**    |
|                       |          |          |              |                 |          |          |               |              |              |             |
| OpenHermes 2.5        | 7B       | 59.3     | 7.54         | 48.2            | 49.4     | 46.5     | 57.5          | 63.8         | 73.5         | 59.9        |
| OpenOrca Mistral      | 7B       | 52.7     | 6.86         | 38.4            | 49.4     | 42.9     | 45.9          | 59.3         | 59.1         | 58.1        |
| Zephyr-β^             | 7B       | 34.6     | 7.34         | 22.0            | 40.6     | 39.0     | 40.8          | 39.8         | 5.1          | 16.0        |
| Mistral               | 7B       | -        | 6.84         | 30.5            | 39.0     | 38.0     | -             | 60.1         | 52.2         | -           |
| Open-source SOTA**    | 13B-70B  | 61.4     | 7.71         | 73.2            | 49.7     | 41.7     | 62.3          | 63.7         | 82.3         | 41.4        |
|                       |          |          | WizardLM 70B | WizardCoder 34B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | MetaMath 70B | Flan-T5 11B |

Comparison with X.AI Grok

|                       | License     | # Param | Average  | MMLU   | HumanEval | MATH     | GSM8k    |
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
| **OpenChat-3.5-0106** | Apache-2.0  | **7B**  | **61.0** | 65.8   | **71.3**  | **29.3** | **77.4** |
| OpenChat-3.5-1210     | Apache-2.0  | **7B**  | 60.1     | 65.3   | 68.9      | 28.9     | 77.3     |
| OpenChat-3.5          | Apache-2.0  | **7B**  | 56.4     | 64.3   | 55.5      | 28.6     | 77.3     |
| Grok-0                | Proprietary | 33B     | 44.5     | 65.7   | 39.7      | 15.7     | 56.8     |
| Grok-1                | Proprietary | ???B    | 55.8     | **73** | 63.2      | 23.9     | 62.9     |

The presence of another open-source LLM at the ChatGPT (and Grok) level holds great importance, regardless of the intriguing Grok comparison in research. This achievement marks another significant triumph for open-source AI.

You can check-out the GitHub repo here.

What do you think?
+1
0
+1
0
+1
0
+1
0
+1
0

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Alice Büşra Alçınar
Alice Büşra Alçınarhttps://geekberry.net/
Computer & Software Engineer | GreyHat Hacker | Translator & Teacher in 9 languages | TouristGuide | Cook | Gamer | Writer | ATC Specialist | Delegate of Rissho Uni ⛩ #WomenInTech 📧Contact: [email protected]

Popular

spot_img

More from author

The New Diablo IV Expansion and New Class Paladin

The New Diablo IV Expansion and New Class Paladin HATRED UNLEASHES ON APRIL 28, 2026 A new expansion, new campaign, two new classes, and a final...

Google plans to construct data centers in space by 2027

Google plans to construct data centers in space by 2027

Sateliot teams with ESA

Sateliot teams with ESA. Sateliot launches a project with the European Space Agency to break GPS dependency and open its satellite IoT to Defense. ...

TEAMGROUP Unveils NV5000 M.2 PCIe 4.0 SSD

TEAMGROUP Unveils NV5000 M.2 PCIe 4.0 SSD High-Speed Performance for Entry-Level Upgrades, Ideal for Work and Entertainment August 7, 2025, Taipei As a global leader in...