08. Operations & Cost Analysis

1. Cost Model Estimation

System operational costs consist of two main components: Infrastructure and LLM API Usage (the largest portion).

Assuming the "GPT-4o" model is used for evaluation (Judge).

Estimation for 10M requests/month:

High cost! Optimization is required.

To reduce LLM evaluation costs to an acceptable level.

100% evaluation of traffic is not required.

Strategy: Randomly sample 10% of traffic or Smart Sample (evaluate only longer/complex conversations).
Impact: Reduces core costs 10x -> $5,750 / month.

Use cheaper models for simple metrics and expensive ones for complex metrics.

Regex/Keyword Metrics: Free (Python code).
Toxicity/Sentiment: Use Small Models (Haiku/GPT-3.5) or Specialized BERT -> Extremely low cost.
Hallucination/Logic: Reserved for GPT-4o.

Allocation: 80% cheap metrics, 20% expensive metrics.

If a question/response has been evaluated before (duplicate), reuse the result from the cache.

Alert Symptom: Avg Eval Time > 30s.
Causes:
1. LLM Provider API lag (e.g., OpenAI outage).
2. Kafka Consumer congestion (high lag).
Actions:
1. Check Dashboard: OpenAI Status.
2. If Kafka Lag is found: Manually scale HPA or add partitions. kubectl scale deployment scoring-worker --replicas=20

Alert Symptom: Daily Spend > $500.
Causes: A specific project is spamming requests or an attack loop is occurring.
Actions:
1. Identify the TenantID with the highest usage.
2. Temporarily disable ingestion for that tenant or enable Aggressive Sampling (1%).
3. Contact the user for verification.

Weekly: Review metrics performance report. Update prompt configs if LLM model versions change.
Monthly: Rotate API Keys. Review Database disk usage (ClickHouse retention).
Quarterly: Penetration Test (Security Audit).