Rate limits
Every call to the v1 API counts against two rate-limit buckets:
- A per-token limit (the service account’s own usage).
- A per-organization limit (everything happening across all tokens issued by that workspace).
When either bucket is exhausted, the API responds with 429 Too Many Requests until the bucket refills.
Default limits
Default limits suit normal integration usage. They are deliberately conservative to protect the workspace from runaway integrations.
| Bucket | Default |
|---|---|
| Per-token, read endpoints | 600 requests per minute |
| Per-token, write endpoints | 60 requests per minute |
| Per-organization, all endpoints | 3000 requests per minute (sum across all tokens) |
Integrations doing routine sync work do not come close to these. Bursty bulk backfills can hit them; coordinate large backfills with your account team in advance.
Headers
Every response (success or error) includes these headers so you can see where you stand:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | The bucket’s maximum size. |
X-RateLimit-Remaining | Requests remaining in the current window. When zero, the next call may be rate-limited. |
X-RateLimit-Reset | Unix timestamp when the bucket refills. |
X-RateLimit-Bucket | Which bucket the limit applies to (token-read, token-write, or org). |
Retry-After | Present on 429 responses only. Seconds to wait before retrying. |
Inspect X-RateLimit-Remaining to throttle proactively rather than waiting for 429.
Handling 429
When you get 429:
- Honor the
Retry-Afterheader. Sleep for at least that many seconds before the next call. - Back off exponentially if you hit 429 repeatedly. Treat
Retry-Afteras a minimum, not the only delay. - Pair retries with idempotency keys on write endpoints, so a retried call doesn’t accidentally double-process.
attempt = 1
while attempt <= 5:
response = call_api(idempotency_key=key)
if response.status != 429:
return response
delay = max(response.headers["Retry-After"], 2^attempt) + jitter
sleep(delay)
attempt += 1Proactive throttling
A well-behaved integration treats X-RateLimit-Remaining as the steering signal, not 429:
- If
Remainingis below 10% ofLimit, slow down or batch. - If
Remainingis zero, do not call again untilX-RateLimit-Reset.
This costs nothing and avoids the noise of 429 retries in your logs.
Limit increases
Most integrations should not need to ask for higher limits. If your use case genuinely requires more:
- Document the integration purpose, expected steady-state RPS, and burst characteristics.
- Talk to your account team. Limit increases are negotiated per-customer with the workspace’s commercial contact, not adjusted ad hoc.
What is and isn’t rate-limited
| Rate-limited | |
|---|---|
GET reads on the v1 surface | yes, against token-read and org |
POST / PATCH writes on the v1 surface | yes, against token-write and org |
Token issuance (POST /api/v1/auth/token) | yes, separately, much tighter (typically 5/min/account) |
| Webhook deliveries from Novantra outbound | not rate-limited as part of your buckets; see Webhooks for delivery semantics |
Heavy backfills
For one-time backfills:
- Time-spread by inserting deliberate sleeps between batches so you don’t burst against the per-minute limit.
- Chunk by resource so each request stays small.
- Use idempotency keys so a partial backfill can resume from the last successful item.
- Talk to your account team beforehand if the backfill is large enough to be visible in workspace metrics; we’d rather hear about it than throttle blindly.
Rate limits exist to protect the workspace as much as to protect Novantra. A misconfigured integration that loops uncontrolled can blow up its own per-token bucket within seconds; the per-org cap ensures one integration cannot starve others.
Next
- Pagination
- Webhooks - for push-style integration that avoids polling rate limits entirely.