Priority Queue for Microsoft 365 API Requests

Use Cases

Handling Strict Rate Limits – Microsoft Graph enforces a global limit of 130,000 requests per 10 seconds across all tenants. Prioritizing requests prevents API rejections.

Managing Service-Specific Limits – Different Microsoft 365 services have individual API limits, such as:
- Outlook: 10,000 API requests per 10 minutes per mailbox.
- Assignments: 500 requests per 10 seconds per app per tenant.
Application-Level Rate Management – Microsoft enforces quotas per Application ID, meaning API consumers must manage multiple independent limits.

Preventing API Throttling with Prioritization – High-priority requests (e.g., user actions) should be processed first, while lower-priority requests (e.g., background jobs) can be queued.

Retrying Throttled Requests – When a 429 Too Many Requests response occurs, the system should delay retries based on the Retry-After header.

Step-by-Step Breakdown

Request Categorization per Application ID – Classify API calls into high-priority (e.g., real-time user actions) and low-priority (e.g., background syncs, batch processing).
Priority Queue Implementation – Assign priority levels to different API calls based on urgency and impact. High-priority requests get processed first.
Rate Limit Monitoring – Continuously track rate limits for each Application ID using response headers like RateLimit-Remaining.
Queue-Based Rate Management – When requests exceed a service’s limit, lower-priority requests are queued instead of being rejected.
Retry on Throttling (429) – If an API call is throttled, it should be retried using an exponential backoff strategy, leveraging the Retry-After header.

How It Works Together

Instead of rejecting excess API calls outright, the priority queue flow ensures that:

Urgent API calls (e.g., user-driven actions) are processed immediately.
Less critical calls (e.g., batch data sync) are queued and delayed if rate limits are close to being exceeded.
Throttled requests (429 errors) are automatically retried after the cooldown period, preventing unnecessary failures.

A Concrete Example

The Problem: A cloud document management system integrates with Microsoft 365 APIs to fetch user emails, calendar events, and OneDrive files.

Challenge 1: The API has strict rate limits, and exceeding them results in rejected requests.
Challenge 2: Application IDs are assigned independent rate quotas, meaning some apps may exceed limits while others remain underutilized.
Challenge 3: Background processes compete with user-driven requests, causing performance bottlenecks.

The Solution: Implement a priority queue system that enforces quotas per Application ID.

High-priority requests (e.g., user actions) bypass the queue when possible.
Low-priority requests (e.g., scheduled syncs) are queued and processed when rate limits allow.
Retry logic ensures that 429 responses do not cause failures but instead get retried based on the API’s Retry-After header.

Priority Queue Flow YAML Configuration

yaml
quotas:
  - name: microsoft_graph_quota
    provider: microsoft
    limits:
      - type: global
        value: 130000  # Microsoft Graph global limit: 130,000 requests per 10 sec
      - type: service_specific
        services:
          - name: outlook
            value: 10000  # 10,000 requests per 10 min per mailbox
          - name: assignments
            value: 500  # 500 requests per 10 sec per app per tenant
          - name: general_tenant
            value: 1000  # 1,000 requests per 10 sec per tenant for all apps

endpoints:
  - url: "https://graph.microsoft.com/*"
    method: ANY
    remedies:
      - name: Priority Queue
        enabled: true
        config:
          application_id_routing: true  # Ensure each Application ID is handled separately
          priority_levels:
            - high: ["user-actions", "real-time-requests"]
            - medium: ["background-processing"]
            - low: ["bulk-imports", "non-critical-jobs"]
          queue_capacity: 10000  # Max requests queued per Application ID
          max_wait_time_seconds: 30  # Requests wait up to 30s before being processed or dropped

      - name: Retry
        enabled: true
        config:
          max_retries: 3
          retry_on_status_codes: [429]
          retry_after_header: "Retry-After"
          backoff_strategy: exponential  # Increases delay between retries

      - name: API Usage Optimization
        enabled: true
        config:
          fetch_only_required_data: true  # Reduces unnecessary API calls
          enable_batching: true  # Uses Microsoft Graph batch requests
          size

Final Thoughts

Microsoft 365 API enforces strict, distributed throttling that varies across services and Application IDs. By implementing a priority queue system, developers can:
✅ Ensure critical requests are never delayed.
✅ Queue non-essential API calls instead of exceeding rate limits.
✅ Use dynamic retry strategies to handle 429 errors effectively.

This ensures stable API performance, prevents rate limit violations, and optimizes API usage while remaining compliant with Microsoft’s consumption rules.

About Lunar.dev:

Lunar.dev is your go to solution for Egress API controls and API consumption management at scale.
With Lunar.dev, engineering teams of any size gain instant unified controls to effortlessly manage, orchestrate, and scale API egress traffic across environments— all without the need for code changes.
Lunar.dev is agnostic to any API provider and enables full egress traffic observability, real-time controls for cost spikes or issues in production, all through an egress proxy, an SDK installation, and a user-friendly UI management layer.
Lunar.dev offers solutions for quota management across environments, prioritizing API calls, centralizing API credentials management, and mitigating rate limit issues.

Use Now