Core Concepts and Architecture

This page introduces the foundational principles and structural components of the Azure AI Evaluation SDK. Understanding these concepts is essential for effectively designing and running evaluation workflows.

1. Evaluation Pipeline Structure

Stages: Data ingestion → Preprocessing → Model inference → Metric computation → Reporting
Modular Design: Each stage is customizable and can be extended with user-defined components.
Pipeline Orchestration: Supports sequential and parallel evaluation flows.
📘 Local Evaluation with Azure AI Evaluation SDK

2. Metrics and Evaluation Types

Built-in Metrics:
Accuracy, Precision, Recall, F1 Score
Latency and throughput
Fairness and bias detection
Custom Metrics:
Plug-in architecture for user-defined metrics
Support for domain-specific evaluation logic
📘 Creating Custom Evaluators
📘 Custom Aggregate Metrics Discussion

3. Dataset Handling

Supported Formats: JSONL, CSV, Parquet, and Azure ML datasets
Annotation Schemas: Standardized formats for ground truth and predictions
Data Splitting: Train/test/validation partitioning with configurable ratios
📘 Azure ML Dataset Creation Guide
📘 Azure Open Datasets

4. Integration with Azure AI Services

Azure ML Integration:
Seamless connection to Azure ML pipelines and workspaces
Logging and tracking via Azure ML Experiments
Model Hosting Compatibility:
Works with models deployed on Azure OpenAI, Azure ML endpoints, and custom REST APIs
📘 Cloud Evaluation with Azure AI Foundry SDK
📘 Evaluating RAG Applications with AzureML

5. Configuration and Extensibility

YAML/JSON Config Support: Define evaluation parameters and pipeline settings
Plugin System: Add new evaluators, preprocessors, and reporters
Logging and Monitoring: Built-in telemetry hooks for observability
📘 Azure AI Evaluation SDK GitHub Reference