Skip to content

Disaster Recovery

Design Principles

Recivr's calculation engine is designed for resilience:

  • Stateless computation — the API performs no database writes during fee calculation, eliminating write-path failures as a latency or availability risk
  • Multi-zone deployment — the service runs across geographically separated availability zones with independent infrastructure
  • Automatic failover — if the primary zone returns errors or becomes unreachable, traffic is rerouted to the secondary zone within seconds
  • Edge-level request archival — every inbound request is durably stored at the network edge before processing begins, ensuring a complete audit trail even if downstream services are temporarily unavailable

Failover Behavior

All traffic enters through a global edge layer that handles routing and failover:

  1. Requests are forwarded to the primary zone
  2. If the primary returns a server error or times out, the request is automatically retried on the secondary zone
  3. If all zones are unavailable, the API returns 503 Service Unavailable with a Retry-After header

Failover is transparent to the client — no configuration change is needed. The response includes headers indicating which zone served the request.

Direct Fallback Endpoint

In the unlikely event that the global edge routing layer itself is impaired, a direct fallback endpoint is available:

https://api-do.recivr.com/v1/calculate

This endpoint uses independent DNS infrastructure and connects directly to the compute layer, bypassing the primary routing. Same API, same authentication — use only as emergency fallback.

Data Durability

After fee calculation, results are published to a durable message stream. A dedicated worker consumes from the stream and persists data to the analytics database in batches. This ensures:

  • The API never blocks on database writes
  • Transaction data is preserved even if the analytics database is temporarily unavailable
  • Status lifecycle transitions are tracked with a full audit trail

Audit Trail

Every raw request is archived at the network edge with:

  • 7-year retention for regulatory compliance
  • Replay capability — historical requests can be re-processed if fee rules change retroactively
  • Zero latency impact — archival happens asynchronously at the edge

Preventive Measures

  • Automated health monitoring with alerting at multiple severity levels
  • Proactive scaling — capacity is provisioned ahead of projected load
  • Hardened infrastructure — all systems follow security best practices including least-privilege access, encrypted transit, and regular patching
  • Incident response — on-call engineering with defined escalation procedures and <5 minute detection-to-response time

Recovery Objectives

MetricTarget
RTO (Recovery Time Objective)< 30 seconds (automatic failover)
RPO (Recovery Point Objective)0 — no data loss (edge archival + stream buffering)

Recivr — The intelligence layer of the payments stack.