OpenChat’s latest model surpasses both ChatGPT and Grok

February 1, 2024 Modified date: February 1, 2024

OpenChat has introduced a state-of-the-art open-source 7B LLM, surpassing both ChatGPT (as of March) and Grok-1 in performance across various benchmarks.

Key Improvements:

1. Improved training approach.
2. Enhanced learning abilities within a specific context.
3. Advanced coding capabilities surpassing previous versions.

Benchmarks

| Model                 | # Params | Average  | MT-Bench     | HumanEval       | BBH MC   | AGIEval  | TruthfulQA    | MMLU         | GSM8K        | BBH CoT     |
|-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------|
| **OpenChat-3.5-0106** | **7B**   | **64.5** | 7.8          | **71.3**        | **51.5** | **49.1** | 61.0          | 65.8         | **77.4**     | 62.2        |
| OpenChat-3.5-1210     | **7B**   | 63.8     | 7.76         | 68.9            | 49.5     | 48.0     | **61.8**      | 65.3         | 77.3         | 61.8        |
| OpenChat-3.5          | **7B**   | 61.6     | 7.81         | 55.5            | 47.6     | 47.4     | 59.1          | 64.3         | 77.3         | 63.5        |
| ChatGPT (March)*      | ???B     | 61.5     | **7.94**     | 48.1            | 47.6     | 47.1     | 57.7          | **67.3**     | 74.9         | **70.1**    |
|                       |          |          |              |                 |          |          |               |              |              |             |
| OpenHermes 2.5        | 7B       | 59.3     | 7.54         | 48.2            | 49.4     | 46.5     | 57.5          | 63.8         | 73.5         | 59.9        |
| OpenOrca Mistral      | 7B       | 52.7     | 6.86         | 38.4            | 49.4     | 42.9     | 45.9          | 59.3         | 59.1         | 58.1        |
| Zephyr-β^             | 7B       | 34.6     | 7.34         | 22.0            | 40.6     | 39.0     | 40.8          | 39.8         | 5.1          | 16.0        |
| Mistral               | 7B       | -        | 6.84         | 30.5            | 39.0     | 38.0     | -             | 60.1         | 52.2         | -           |
| Open-source SOTA**    | 13B-70B  | 61.4     | 7.71         | 73.2            | 49.7     | 41.7     | 62.3          | 63.7         | 82.3         | 41.4        |
|                       |          |          | WizardLM 70B | WizardCoder 34B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | MetaMath 70B | Flan-T5 11B |

Comparison with X.AI Grok

|                       | License     | # Param | Average  | MMLU   | HumanEval | MATH     | GSM8k    |
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
| **OpenChat-3.5-0106** | Apache-2.0  | **7B**  | **61.0** | 65.8   | **71.3**  | **29.3** | **77.4** |
| OpenChat-3.5-1210     | Apache-2.0  | **7B**  | 60.1     | 65.3   | 68.9      | 28.9     | 77.3     |
| OpenChat-3.5          | Apache-2.0  | **7B**  | 56.4     | 64.3   | 55.5      | 28.6     | 77.3     |
| Grok-0                | Proprietary | 33B     | 44.5     | 65.7   | 39.7      | 15.7     | 56.8     |
| Grok-1                | Proprietary | ???B    | 55.8     | **73** | 63.2      | 23.9     | 62.9     |

The presence of another open-source LLM at the ChatGPT (and Grok) level holds great importance, regardless of the intriguing Grok comparison in research. This achievement marks another significant triumph for open-source AI.

You can check-out the GitHub repo here.

What do you think?

{{post_title}}

OpenChat’s latest model surpasses both ChatGPT and Grok

Key Improvements:

Benchmarks

Comparison with X.AI Grok

NO COMMENTS

LEAVE A REPLY

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

Key Improvements:

Benchmarks

Comparison with X.AI Grok

RELATED ARTICLES

The New Diablo IV Expansion and New Class Paladin

Google plans to construct data centers in space by 2027

Sateliot teams with ESA

NO COMMENTS

LEAVE A REPLY Cancel reply

LEAVE A REPLY