Test Suites & Leaderboard

The Test Suites module allows users to configure, manage, execute, and compare evaluation scenarios for LLM-powered systems. It provides a structured way to test performance across multiple categories and maintain versioned evaluation records.

LLM Endpoints

Before creating or running a Test Suite, at least one LLM endpoint must be configured. Test Suites are executed for these configured LLMs.

Configure an LLM Endpoint

Navigate to Settings → LLM Endpoints.
Click Add and provide the required details:
- Endpoint URL
- SDK configuration
- Secret / API Key
Assign the systems that are permitted to use this LLM.

Configure LLM Endpoint

Create, Update, Clone, and Delete Test Suites

The Test Suites section (under the System parent node) is where all existing test configurations are stored and managed. Selecting this option opens the Test Suite page, which displays all test suites accessible to the user.

Create a Test Suite

Navigate to the Test Suite page.
Click the Create button to open the Create Test Config form.
Provide the following details:
- Name (mandatory)
- Description (optional)
- Select the LLM
- Specify the number of instances
Select the required test categories using the checkboxes on the right side of the screen.
Click Save to complete the configuration.

Create Test Suite

Once saved, the Test Suite becomes available for execution.

Update a Test Suite

Go to the Test Suite page.
Click the kebab menu (three dots) next to the desired Test Suite.
Select Update Test Config.
Make the necessary changes and save.

Clone a Test Suite

Cloning allows users to duplicate an existing configuration, which is useful when creating variations of a test setup.

Click the kebab menu next to the Test Suite.
Select Clone Test Suite. This will clone the configuration of the existing test suite and create a new one.

Delete a Test Suite

Click the kebab menu next to the Test Suite.
Select Delete.

Publish Scores to Model Card

Users can publish Test Suite scores to the Model Card:

Click the kebab menu next to the Test Suite.
Select Publish in Model Card.

Run and Compare Tests

After a Test Suite is created, users can execute evaluations against the configured LLM.

Run a Test Suite

Click the Evaluate button to start a test run.
A new Test Run ID is generated for each execution.
Test Suites can be executed multiple times.

Run Evaluation

Each run is tracked independently, enabling performance comparison over time.

Monitor Test Runs

The status and progress of each run are displayed in the Test Suite page.
Click on the Run ID to view live progress (for running evaluations) or scores and metrics (for completed runs).

Monitor Test Runs

Compare Test Runs

Users can compare results across multiple runs:

Expand the relevant Test Suite.
Select the test runs to compare.
Click the Compare button at the top-right of the screen.

Compare Test Runs

This comparison view highlights differences in metrics across selected runs.

Leaderboard

The Leaderboard provides a consolidated view of performance across test subcategories.

Displays scores for each individual test subcategory.
If multiple test runs exist within a Test Suite, only the latest run is considered.
Users can apply filters to refine the displayed results.
The highest score among test runs is displayed in bold green for quick identification.

Leaderboard

The Leaderboard helps stakeholders quickly assess model performance and identify top-performing configurations across evaluation categories.