Browse docs sections

Limits, Monitoring, and Error Tracking

How usage limits, cost controls, and Sentry observability work in production.

Operational reliability depends on tracking speed, spend, and failure patterns together instead of in isolation.

Usage Monitoring

Core Signals

Canoma tracks request latency, token usage, and estimated model cost to support capacity planning and spend controls.

Monitoring Cadence

  • Watch p95 latency and error rate during peak query windows.
  • Review token-cost trends at daily and weekly intervals.
  • Flag sudden traffic spikes before they hit hard limits.

Limits

Limit Types

  • Per-user and per-org request budgets.
  • Burst rate limits for high-cost query patterns.
  • Graceful 429 responses when limits are exceeded.

Handling Strategy

  • Back off and retry only after reducing query breadth.
  • Prefer concise mode during sustained load periods.
  • Escalate recurring 429 events with usage context.

Sentry Integration

What Is Captured

Sentry captures exceptions and traces across user-facing and service workflows with sensitive fields redacted.

Baseline Alerting

Operational baseline

Monitor error-rate spikes together with latency and token-cost anomalies to identify root cause faster.