13. Dashboard User Guide
This comprehensive guide will walk you through the LangEval Dashboard, detailing how to configure AI Agents, set up Models, design Scenarios, and execute Evaluations.
📑 Table of Contents
- Model Configuration
- AI Agent Configuration
- Scenario Management
- Evaluation Execution & Reports
- Workspace Settings
1. Model Configuration
Before creating an AI Agent or running auto-evaluations, you must configure the underlying Language Models (LLMs).
1.1 Adding a New Provider
- Navigate to Settings > Models from the left sidebar.
- Click the Enable Provider or Add Connection button.
- Select the Provider from the list (e.g., OpenAI, Anthropic, Google Gemini, Azure, Local/Custom).
1.2 Configuring Model Credentials
- API Key: Enter the API Key provided by your LLM provider. This is stored securely via Vault/KMS.
- Base URL: (Optional) If you are using a proxy or a local model (like Ollama or vLLM), enter the custom endpoint here.
- Save: Click Save Connection. The system will perform a quick health check to verify credentials.
2. AI Agent Configuration
An "Agent" in LangEval represents the AI application or target bot you want to evaluate. It can be a simple chatbot, a Retrieval-Augmented Generation (RAG) pipeline, or a complex multi-agent system. Configuration accuracy is crucial because LangEval needs to know how to communicate with your agent during simulations.
2.1 Basic Agent Profile
- Go to the Agents tab in the main navigation.
- Click Create Agent.
- Name: A required, recognizable name for your agent (e.g.,
Customer Support Bot v2). - Description: (Optional) A brief outline of the agent's purpose, scope, and expected capabilities.
- Type: Categorize your agent. Normal values are
RAG Chatbot,Rule-based Bot, orGenerative Agent. - Version: Track your revisions (e.g.,
v1.0.0,v1.1.0-beta). - Status: Set the operational state:
active(ready to test),maintenance(temporary disabled), ordeprecated. - Repository URL: (Optional) A link to your Git repository holding the agent's source code for cross-reference.
2.2 Endpoint & Connection Properties
This is how the LangEval Orchestrator sends test cases to your Agent.
- Endpoint URL (Required): The absolute HTTP/HTTPS URL where your agent receives requests (e.g.,
https://api.my-agent.com/v1/chat).- LangEval validates this URL structure upon saving. Local IPs or mock hostnames are allowed if your deployment supports them.
- API Key / Authentication: If your agent's API is protected, enter the secret token here.
- Security Note: This Key is immediately encrypted at rest in the DB (
api_key_encrypted) and only decrypted in-memory during test payload delivery via Authorization Headers.
- Security Note: This Key is immediately encrypted at rest in the DB (
- Integration Mode (Meta-data): You can pass custom JSON in the
meta_datafield to instruct LangEval on how to parse your Agent's HTTP response schema.- Example
meta_data:{"payload_format": "openai_compatible", "provider": "OpenAI", "model": "gpt-4o"}.
- Example
2.3 Observability (Langfuse Integration)
If your agent logic is complex (like multi-turn reasoning or tool calling) and you want LangEval's Trace View to show step-by-step internal executions:
- Enable the Langfuse Integration toggle.
- Provide your project's Langfuse credentials:
Project ID,Public Key,Secret Key, and optionally a customHost URL. - During evaluation, LangEval will link the generated output directly to the Langfuse execution Trace ID, giving you an X-ray view into why your Agent responded the way it did.
3. Scenario Management
A Scenario is your test suite. It contains the dataset, the expected outcomes, and the metrics used to score the agent.
3.1 Creating a New Scenario
- Navigate to Scenarios and click New Scenario.
- Name & Tags: Enter a name (e.g.,
Financial FAQ Testing) and assign tags for easy filtering.
3.2 Defining Scenario Attributes & Dataset
- Import Dataset: You can upload a CSV, JSON file, or connect to an external source.
- Each row should represent a single "Test Case".
- Required columns typically include:
input(the user query). - Optional columns:
expected_output(for exact match or semantic similarity),context(for RAG evaluation).
- Data Mapping: Ensure the columns in your dataset map correctly to LangEval's internal variables.
3.3 Configuring Evaluation Metrics
- Within the Scenario editor, go to the Metrics tab.
- Click Add Metric.
- Choose from pre-built AI-assisted metrics:
- Faithfulness: Checks if the answer is grounded in the provided context (RAG).
- Answer Relevance: Checks if the answer addresses the user's prompt.
- Toxicity/Bias: Checks for harmful content.
- Custom Code/Deterministic: Write Python/JS scripts for exact regex matching or JSON schema validation.
- Scoring Thresholds: Set the passing score (e.g., > 0.8 / 1.0) for each metric.
4. Evaluation Execution & Reports
4.1 Running a Scenario
- In the Scenario view, click Run Evaluation.
- Select the Target Agent you configured earlier.
- Click Start. The LangGraph Orchestrator will distribute the workload to evaluation workers.
4.2 Interpreting the Dashboard Reports
- Navigate to Reports or click on the finished Run ID.
- Overview: View aggregate scores, pass/fail rates, and latency percentiles.
- Trace View: Dive into individual test cases. See the exact prompt sent, the agent's response, and the rationales generated by the LLM-as-a-Judge.
- Export: Export reports to PDF or CSV for compliance teams.
5. Workspace Settings
- Members & Roles: Invite team members and assign roles (Admin, Evaluator, Viewer).
- API Keys: Generate LangEval API Keys to trigger evaluations from your CI/CD pipelines (e.g., GitHub Actions, GitLab CI).