Inference Engine Arena is an open-source platform designed to help you benchmark and compare different LLM inference engines. With the rapid proliferation of inference engines like vLLM, SGLang, Ollama, and others(even including some frameworks like Dynamo and vllm production stack), it can be challenging to determine which one performs best for your specific use case.
We handle the complexity of logging and comparing experiments across different engines running on various hardware with different models against diverse workloads. This frees individuals and enterprises from tedious logging and pipeline work, allowing them to focus on their business logic instead.
Inference Engine Arena helps you find the most cost-effective and performant inference engine for your workload. With Inference Engine Arena, everyone can start inference benchmarking and find the most suitable configuration for their use case within minutes instead of hours. Enterprises can find the best configuration within hours not weeks.
Inference Engine Arena consists of two major components:
Arena Logging System: Think of it as the “Postman for inference benchmarking” — a powerful all-in-one tool that simplifies complex workflows. It helps users start, manage, configure, stop, and monitor inference engines, and execute experiments quickly. It enables:
Using predefined benchmarks or configuring custom workflows
Running batch experiments with different engine parameters and benchmark configurations
Storing results in a well-organized manner
Displaying results in a dashboard and local leaderboard for easy comparison and analysis
Eliminating the need for scattered spreadsheets to track experiment results
Generating visualizations, reports, and reproducible commands
Arena Leaderboard: The “ChatBot Arena” for inference engines — a community-driven ranking system that helps everyone identify the best performers. It provides references of various engines running on different hardware with different models against different benchmarks:
Each record represents a specific benchmark sub-run with particular hardware, model, and engine parameters
Community-uploaded benchmark results
Filtering capabilities to focus on relevant metrics
Detailed configuration information for each record
One-command reproduction of results using command line or YAML files