Skip to Content
Welcome to the Novantra documentation.

Rate limits

Every call to the v1 API counts against two rate-limit buckets:

  • A per-token limit (the service account’s own usage).
  • A per-organization limit (everything happening across all tokens issued by that workspace).

When either bucket is exhausted, the API responds with 429 Too Many Requests until the bucket refills.

Default limits

Default limits suit normal integration usage. They are deliberately conservative to protect the workspace from runaway integrations.

BucketDefault
Per-token, read endpoints600 requests per minute
Per-token, write endpoints60 requests per minute
Per-organization, all endpoints3000 requests per minute (sum across all tokens)

Integrations doing routine sync work do not come close to these. Bursty bulk backfills can hit them; coordinate large backfills with your account team in advance.

Headers

Every response (success or error) includes these headers so you can see where you stand:

HeaderMeaning
X-RateLimit-LimitThe bucket’s maximum size.
X-RateLimit-RemainingRequests remaining in the current window. When zero, the next call may be rate-limited.
X-RateLimit-ResetUnix timestamp when the bucket refills.
X-RateLimit-BucketWhich bucket the limit applies to (token-read, token-write, or org).
Retry-AfterPresent on 429 responses only. Seconds to wait before retrying.

Inspect X-RateLimit-Remaining to throttle proactively rather than waiting for 429.

Handling 429

When you get 429:

  1. Honor the Retry-After header. Sleep for at least that many seconds before the next call.
  2. Back off exponentially if you hit 429 repeatedly. Treat Retry-After as a minimum, not the only delay.
  3. Pair retries with idempotency keys on write endpoints, so a retried call doesn’t accidentally double-process.
attempt = 1 while attempt <= 5: response = call_api(idempotency_key=key) if response.status != 429: return response delay = max(response.headers["Retry-After"], 2^attempt) + jitter sleep(delay) attempt += 1

Proactive throttling

A well-behaved integration treats X-RateLimit-Remaining as the steering signal, not 429:

  • If Remaining is below 10% of Limit, slow down or batch.
  • If Remaining is zero, do not call again until X-RateLimit-Reset.

This costs nothing and avoids the noise of 429 retries in your logs.

Limit increases

Most integrations should not need to ask for higher limits. If your use case genuinely requires more:

  • Document the integration purpose, expected steady-state RPS, and burst characteristics.
  • Talk to your account team. Limit increases are negotiated per-customer with the workspace’s commercial contact, not adjusted ad hoc.

What is and isn’t rate-limited

Rate-limited
GET reads on the v1 surfaceyes, against token-read and org
POST / PATCH writes on the v1 surfaceyes, against token-write and org
Token issuance (POST /api/v1/auth/token)yes, separately, much tighter (typically 5/min/account)
Webhook deliveries from Novantra outboundnot rate-limited as part of your buckets; see Webhooks for delivery semantics

Heavy backfills

For one-time backfills:

  • Time-spread by inserting deliberate sleeps between batches so you don’t burst against the per-minute limit.
  • Chunk by resource so each request stays small.
  • Use idempotency keys so a partial backfill can resume from the last successful item.
  • Talk to your account team beforehand if the backfill is large enough to be visible in workspace metrics; we’d rather hear about it than throttle blindly.

Rate limits exist to protect the workspace as much as to protect Novantra. A misconfigured integration that loops uncontrolled can blow up its own per-token bucket within seconds; the per-org cap ensures one integration cannot starve others.

Next

  • Pagination
  • Webhooks - for push-style integration that avoids polling rate limits entirely.
Last updated on