OpenChat has introduced a state-of-the-art open-source 7B LLM, surpassing both ChatGPT (as of March) and Grok-1 in performance across various benchmarks.
Key Improvements:
1. Improved training approach.
2. Enhanced learning abilities within a specific context.
3. Advanced coding capabilities surpassing previous versions.
Benchmarks
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT | |-----------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------| | **OpenChat-3.5-0106** | **7B** | **64.5** | 7.8 | **71.3** | **51.5** | **49.1** | 61.0 | 65.8 | **77.4** | 62.2 | | OpenChat-3.5-1210 | **7B** | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | **61.8** | 65.3 | 77.3 | 61.8 | | OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 | | ChatGPT (March)* | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** | | | | | | | | | | | | | | OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 | | OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 | | Zephyr-β^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 | | Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - | | Open-source SOTA** | 13B-70B | 61.4 | 7.71 | 73.2 | 49.7 | 41.7 | 62.3 | 63.7 | 82.3 | 41.4 | | | | | WizardLM 70B | WizardCoder 34B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | MetaMath 70B | Flan-T5 11B |
Comparison with X.AI Grok
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k | |-----------------------|-------------|---------|----------|--------|-----------|----------|----------| | **OpenChat-3.5-0106** | Apache-2.0 | **7B** | **61.0** | 65.8 | **71.3** | **29.3** | **77.4** | | OpenChat-3.5-1210 | Apache-2.0 | **7B** | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 | | OpenChat-3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 | | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 | | Grok-1 | Proprietary | ???B | 55.8 | **73** | 63.2 | 23.9 | 62.9 |
The presence of another open-source LLM at the ChatGPT (and Grok) level holds great importance, regardless of the intriguing Grok comparison in research. This achievement marks another significant triumph for open-source AI.
You can check-out the GitHub repo here.
What do you think?
+1
+1
+1
+1
+1


