Rate limits

Authenticated calls to the v1 API count against two rate-limit buckets:

A per-token limit (the service account’s own usage).
A per-organization limit (everything happening across all tokens issued by that workspace).

When either bucket is exhausted, the API responds with 429 Too Many Requests until the bucket refills.

Default limits

Default limits suit normal integration usage. They are deliberately conservative to protect the workspace from runaway integrations.

Bucket	Default
Per-token, read endpoints	600 requests per minute
Per-token, write endpoints	60 requests per minute
Per-organization, all endpoints	3000 requests per minute (sum across all tokens)

Integrations doing routine sync work do not come close to these. Bursty bulk backfills can hit them; coordinate large backfills with your account team in advance.

Headers

Responses attributed to a service account or token-issuance bucket include these headers so you can see where you stand:

Header	Meaning
`X-RateLimit-Limit`	The bucket’s maximum size.
`X-RateLimit-Remaining`	Requests remaining in the current window. When zero, the next call may be rate-limited.
`X-RateLimit-Reset`	Unix timestamp when the bucket refills.
`X-RateLimit-Bucket`	Which bucket the limit applies to (`token-read`, `token-write`, or `org`).
`Retry-After`	Present on `429` responses only. Seconds to wait before retrying.

Inspect X-RateLimit-Remaining to throttle proactively rather than waiting for 429.

Handling 429

When you get 429:

Honor the Retry-After header. Sleep for at least that many seconds before the next call.
Back off exponentially if you hit 429 repeatedly. Treat Retry-After as a minimum, not the only delay.
Pair retries with idempotency keys on write endpoints, so a retried call doesn’t accidentally double-process.


attempt = 1
while attempt <= 5:
  response = call_api(idempotency_key=key)
  if response.status != 429:
    return response
  delay = max(response.headers["Retry-After"], 2^attempt) + jitter
  sleep(delay)
  attempt += 1

Proactive throttling

A well-behaved integration treats X-RateLimit-Remaining as the steering signal, not 429:

If Remaining is below 10% of Limit, slow down or batch.
If Remaining is zero, do not call again until X-RateLimit-Reset.

This costs nothing and avoids the noise of 429 retries in your logs.

Limit increases

Most integrations should not need to ask for higher limits. If your use case genuinely requires more:

Document the integration purpose, expected steady-state RPS, and burst characteristics.
Talk to your account team. Limit increases are negotiated per-customer with the workspace’s commercial contact, not adjusted ad hoc.

What is and isn’t rate-limited

	Rate-limited
`GET` reads on the v1 surface	yes, against `token-read` and `org`
`POST` / `PATCH` writes on the v1 surface	yes, against `token-write` and `org`
Token issuance (`POST /api/v1/auth/token`)	yes, separately, much tighter (typically 5/min/account)
Webhook deliveries from Novantra outbound	not rate-limited as part of your buckets; see Webhooks for delivery semantics

Heavy backfills

For one-time backfills:

Time-spread by inserting deliberate sleeps between batches so you don’t burst against the per-minute limit.
Chunk by resource so each request stays small.
Use idempotency keys so a partial backfill can resume from the last successful item.
Talk to your account team beforehand if the backfill is large enough to be visible in workspace metrics; we’d rather hear about it than throttle blindly.

Rate limits exist to protect the workspace as much as to protect Novantra. A misconfigured integration that loops uncontrolled can blow up its own per-token bucket within seconds; the per-org cap ensures one integration cannot starve others.

Pagination
Webhooks - for push-style integration that avoids polling rate limits entirely.