Error Bar Chart

LLM Benchmark Comparison

Large language model accuracy comparison across NLP tasks.

Output

Python

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

tasks = ['Text\nClassification', 'Named\nEntity', 'Sentiment', 'Question\nAnswering', 'Summarization']
gpt4 = np.array([96.2, 94.5, 95.8, 92.3, 88.5])
claude = np.array([95.8, 93.8, 96.2, 91.8, 89.2])
llama = np.array([92.5, 90.2, 93.5, 88.5, 85.2])
err = np.array([1.2, 1.5, 1.0, 1.8, 2.2])

fig, ax = plt.subplots(figsize=(11, 6), facecolor='#ffffff')
ax.set_facecolor('#ffffff')

x = np.arange(len(tasks))
width = 0.25

ax.bar(x - width, gpt4, width, yerr=err, label='GPT-4',
       color='#6CF527', edgecolor='white', linewidth=1.5, capsize=4,
       error_kw={'ecolor': '#374151', 'elinewidth': 1.5})
ax.bar(x, claude, width, yerr=err, label='Claude 3',
       color='#F5B027', edgecolor='white', linewidth=1.5, capsize=4,
       error_kw={'ecolor': '#374151', 'elinewidth': 1.5})
ax.bar(x + width, llama, width, yerr=err, label='Llama 3',
       color='#5314E6', edgecolor='white', linewidth=1.5, capsize=4,
       error_kw={'ecolor': '#374151', 'elinewidth': 1.5})

ax.axhline(y=90, color='#F5276C', linestyle='--', linewidth=2, alpha=0.7)

ax.set_xlabel('NLP Task', fontsize=12, color='#374151', fontweight='600')
ax.set_ylabel('Accuracy (%)', fontsize=12, color='#374151', fontweight='600')
ax.set_title('Large Language Model Benchmark Comparison', fontsize=15, 
             color='#1f2937', fontweight='bold', pad=20)

ax.set_xticks(x)
ax.set_xticklabels(tasks, fontsize=10)
ax.legend(facecolor='#ffffff', edgecolor='#e5e7eb', fontsize=10)
ax.tick_params(colors='#6b7280', labelsize=10)
ax.set_ylim(80, 100)
ax.grid(True, axis='y', alpha=0.4, color='#e5e7eb')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#d1d5db')
ax.spines['bottom'].set_color('#d1d5db')

plt.tight_layout()
plt.show()

Library

Matplotlib

More Error Bar Chart examples

Mattress Sleep Quality Study

Home Espresso Machine Study

Robot Vacuum Performance Test

Programming Language Career Analysis

☕

Did this help you?

Support PyLucid to keep it free & growing

Support