OpenChat's Latest Model Surpasses Both ChatGPT And Grok

OpenChat has introduced a state-of-the-art open-source 7B LLM, surpassing both ChatGPT (as of March) and Grok-1 in performance across various benchmarks.

Key Improvements:

1. Improved training approach.
2. Enhanced learning abilities within a specific context.
3. Advanced coding capabilities surpassing previous versions.

Benchmarks

| Model                 | # Params | Average  | MT-Bench     | HumanEval       | BBH MC   | AGIEval  | TruthfulQA    | MMLU         | GSM8K        | BBH CoT     |
|-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
| **OpenChat-3.5-0106** | **7B**   | **64.5** | 7.8          | **71.3**        | **51.5** | **49.1** | 61.0          | 65.8         | **77.4**     | 62.2        |
| OpenChat-3.5-1210     | **7B**   | 63.8     | 7.76         | 68.9            | 49.5     | 48.0     | **61.8**      | 65.3         | 77.3         | 61.8        |
| OpenChat-3.5          | **7B**   | 61.6     | 7.81         | 55.5            | 47.6     | 47.4     | 59.1          | 64.3         | 77.3         | 63.5        |
| ChatGPT (March)*      | ???B     | 61.5     | **7.94**     | 48.1            | 47.6     | 47.1     | 57.7          | **67.3**     | 74.9         | **70.1**    |
|                       |          |          |              |                 |          |          |               |              |              |             |
| OpenHermes 2.5        | 7B       | 59.3     | 7.54         | 48.2            | 49.4     | 46.5     | 57.5          | 63.8         | 73.5         | 59.9        |
| OpenOrca Mistral      | 7B       | 52.7     | 6.86         | 38.4            | 49.4     | 42.9     | 45.9          | 59.3         | 59.1         | 58.1        |
| Zephyr-β^             | 7B       | 34.6     | 7.34         | 22.0            | 40.6     | 39.0     | 40.8          | 39.8         | 5.1          | 16.0        |
| Mistral               | 7B       | -        | 6.84         | 30.5            | 39.0     | 38.0     | -             | 60.1         | 52.2         | -           |
| Open-source SOTA**    | 13B-70B  | 61.4     | 7.71         | 73.2            | 49.7     | 41.7     | 62.3          | 63.7         | 82.3         | 41.4        |
|                       |          |          | WizardLM 70B | WizardCoder 34B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | MetaMath 70B | Flan-T5 11B |

Comparison with X.AI Grok

|                       | License     | # Param | Average  | MMLU   | HumanEval | MATH     | GSM8k    |
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
| **OpenChat-3.5-0106** | Apache-2.0  | **7B**  | **61.0** | 65.8   | **71.3**  | **29.3** | **77.4** |
| OpenChat-3.5-1210     | Apache-2.0  | **7B**  | 60.1     | 65.3   | 68.9      | 28.9     | 77.3     |
| OpenChat-3.5          | Apache-2.0  | **7B**  | 56.4     | 64.3   | 55.5      | 28.6     | 77.3     |
| Grok-0                | Proprietary | 33B     | 44.5     | 65.7   | 39.7      | 15.7     | 56.8     |
| Grok-1                | Proprietary | ???B    | 55.8     | **73** | 63.2      | 23.9     | 62.9     |

The presence of another open-source LLM at the ChatGPT (and Grok) level holds great importance, regardless of the intriguing Grok comparison in research. This achievement marks another significant triumph for open-source AI.

You can check-out the GitHub repo here.

What do you think?

+1

0

+1

0

+1

0

+1

0

+1

0

OpenChat’s latest model surpasses both ChatGPT and Grok

Key Improvements:

Benchmarks

Comparison with X.AI Grok

LEAVE A REPLY Cancel reply

Popular

More from author

Stay in touch

Quick Links

Must Read

Popular Articles