Limits, Monitoring, and Error Tracking
How usage limits, cost controls, and Sentry observability work in production.
Operational reliability depends on tracking speed, spend, and failure patterns together instead of in isolation.
Usage Monitoring
Core Signals
Canoma tracks request latency, token usage, and estimated model cost to support capacity planning and spend controls.
Monitoring Cadence
- Watch p95 latency and error rate during peak query windows.
- Review token-cost trends at daily and weekly intervals.
- Flag sudden traffic spikes before they hit hard limits.
Limits
Limit Types
- Per-user and per-org request budgets.
- Burst rate limits for high-cost query patterns.
- Graceful
429responses when limits are exceeded.
Handling Strategy
- Back off and retry only after reducing query breadth.
- Prefer concise mode during sustained load periods.
- Escalate recurring
429events with usage context.
Sentry Integration
What Is Captured
Sentry captures exceptions and traces across user-facing and service workflows with sensitive fields redacted.
Baseline Alerting
Operational baseline
Monitor error-rate spikes together with latency and token-cost anomalies to identify root cause faster.