Core Concepts and Architecture
This page introduces the foundational principles and structural components of the Azure AI Evaluation SDK. Understanding these concepts is essential for effectively designing and running evaluation workflows.
1. Evaluation Pipeline Structure
- Stages: Data ingestion → Preprocessing → Model inference → Metric computation → Reporting
- Modular Design: Each stage is customizable and can be extended with user-defined components.
- Pipeline Orchestration: Supports sequential and parallel evaluation flows.
- 📘 Local Evaluation with Azure AI Evaluation SDK
2. Metrics and Evaluation Types
- Built-in Metrics:
- Accuracy, Precision, Recall, F1 Score
- Latency and throughput
- Fairness and bias detection
- Custom Metrics:
- Plug-in architecture for user-defined metrics
- Support for domain-specific evaluation logic
- 📘 Creating Custom Evaluators
- 📘 Custom Aggregate Metrics Discussion
3. Dataset Handling
- Supported Formats: JSONL, CSV, Parquet, and Azure ML datasets
- Annotation Schemas: Standardized formats for ground truth and predictions
- Data Splitting: Train/test/validation partitioning with configurable ratios
- 📘 Azure ML Dataset Creation Guide
- 📘 Azure Open Datasets
4. Integration with Azure AI Services
- Azure ML Integration:
- Seamless connection to Azure ML pipelines and workspaces
- Logging and tracking via Azure ML Experiments
- Model Hosting Compatibility:
- Works with models deployed on Azure OpenAI, Azure ML endpoints, and custom REST APIs
- 📘 Cloud Evaluation with Azure AI Foundry SDK
- 📘 Evaluating RAG Applications with AzureML
5. Configuration and Extensibility
- YAML/JSON Config Support: Define evaluation parameters and pipeline settings
- Plugin System: Add new evaluators, preprocessors, and reporters
- Logging and Monitoring: Built-in telemetry hooks for observability
- 📘 Azure AI Evaluation SDK GitHub Reference