Skip to content

Test Suites & Leaderboard

The Test Suites module allows users to configure, manage, execute, and compare evaluation scenarios for LLM-powered systems. It provides a structured way to test performance across multiple categories and maintain versioned evaluation records.


Before creating or running a Test Suite, at least one LLM endpoint must be configured. Test Suites are executed for these configured LLMs.

  1. Navigate to Settings → LLM Endpoints.
  2. Click Add and provide the required details:
    • Endpoint URL
    • SDK configuration
    • Secret / API Key
  3. Assign the systems that are permitted to use this LLM.

Configure LLM Endpoint


Create, Update, Clone, and Delete Test Suites

Section titled “Create, Update, Clone, and Delete Test Suites”

The Test Suites section (under the System parent node) is where all existing test configurations are stored and managed. Selecting this option opens the Test Suite page, which displays all test suites accessible to the user.

  1. Navigate to the Test Suite page.
  2. Click the Create button to open the Create Test Config form.
  3. Provide the following details:
    • Name (mandatory)
    • Description (optional)
    • Select the LLM
    • Specify the number of instances
  4. Select the required test categories using the checkboxes on the right side of the screen.
  5. Click Save to complete the configuration.

Create Test Suite

Once saved, the Test Suite becomes available for execution.

  1. Go to the Test Suite page.
  2. Click the kebab menu (three dots) next to the desired Test Suite.
  3. Select Update Test Config.
  4. Make the necessary changes and save.

Cloning allows users to duplicate an existing configuration, which is useful when creating variations of a test setup.

  1. Click the kebab menu next to the Test Suite.
  2. Select Clone Test Suite. This will clone the configuration of the existing test suite and create a new one.
  1. Click the kebab menu next to the Test Suite.
  2. Select Delete.

Users can publish Test Suite scores to the Model Card:

  1. Click the kebab menu next to the Test Suite.
  2. Select Publish in Model Card.

After a Test Suite is created, users can execute evaluations against the configured LLM.

  • Click the Evaluate button to start a test run.
  • A new Test Run ID is generated for each execution.
  • Test Suites can be executed multiple times.

Run Evaluation

Each run is tracked independently, enabling performance comparison over time.

  • The status and progress of each run are displayed in the Test Suite page.
  • Click on the Run ID to view live progress (for running evaluations) or scores and metrics (for completed runs).

Monitor Test Runs

Users can compare results across multiple runs:

  1. Expand the relevant Test Suite.
  2. Select the test runs to compare.
  3. Click the Compare button at the top-right of the screen.

Compare Test Runs

This comparison view highlights differences in metrics across selected runs.


The Leaderboard provides a consolidated view of performance across test subcategories.

  • Displays scores for each individual test subcategory.
  • If multiple test runs exist within a Test Suite, only the latest run is considered.
  • Users can apply filters to refine the displayed results.
  • The highest score among test runs is displayed in bold green for quick identification.

Leaderboard

The Leaderboard helps stakeholders quickly assess model performance and identify top-performing configurations across evaluation categories.