Skip to content

Escalation Policies

Escalation Policies define how Pingward notifies your team when issues occur. They provide a structured, multi-tier notification chain that ensures the right people are alerted at the right time -- and that no critical issue goes unnoticed because a single notification was missed.

Overview

When a routing rule fires and an issue is linked to an escalation policy, the policy determines who gets notified and when. Notifications start at Tier 1 (typically the on-call engineer or a Slack channel). If no one acknowledges the issue within the tier's timeout window, the policy automatically escalates to the next tier (a senior engineer, a team lead, or a broader channel). This continues through all configured tiers until the issue is acknowledged or resolved.

The escalation flow looks like this:

Issue Detected
  -> Routing Rule Matches
    -> Issue linked to Escalation Policy
      -> Tier 1 Notified (e.g., on-call engineer via Slack + SMS)
        -> No acknowledgment within 5 minutes
          -> Tier 2 Notified (e.g., team lead via phone call)
            -> No acknowledgment within 15 minutes
              -> Tier 3 Notified (e.g., VP Engineering via phone + email)
                -> Repeat behavior applies (stop, repeat last, or restart)

How Escalation Works

Step-by-Step Flow

  1. Issue detected -- A test fails and the issue manager classifies the error and assigns severity.
  2. Routing rule evaluates -- The issue matches a routing rule's conditions (severity, error category, tags, etc.).
  3. Escalation started -- If the routing rule has an escalation policy linked, the issue is linked to that policy and escalation begins.
  4. Tier 1 notified -- All targets in the first escalation tier receive notifications immediately. This could be an integration (Slack channel, email list, webhook) or an on-call schedule (which resolves to the currently on-call user and their contact methods).
  5. Timeout countdown -- The system waits for the tier's configured timeout (e.g., 5 minutes).
  6. Acknowledgment check -- If someone acknowledges the issue during the timeout, escalation stops. If not, escalation continues.
  7. Next tier notified -- The next tier's targets are notified, and the timeout countdown resets for that tier.
  8. All tiers exhausted -- Once all tiers have been notified without acknowledgment, the repeat behavior determines what happens next.

Acknowledgment Stops Escalation

When any team member acknowledges an issue, escalation halts immediately. The issue moves to "Acknowledged" status and no further tiers are notified. This is the primary mechanism for preventing unnecessary alert noise.

Resolution Clears Everything

Resolving an issue clears all escalation state. If the underlying problem recovers on its own and the issue auto-resolves, no further escalation occurs.

Creating an Escalation Policy

Via Dashboard

  1. Navigate to Escalation Policies in the sidebar.
  2. Click + New Policy.
  3. Fill in the form:

Basic Information:

Policy Name: Critical Issue Escalation
Description: Multi-tier escalation for production-critical alerts
Repeat Behavior: Repeat all tiers

Tier 1 (immediate response):

Timeout: 5 minutes
Targets:
  - Integration: #ops-alerts (Slack)
  - On-Call: Primary On-Call Schedule

Tier 2 (secondary response):

Timeout: 15 minutes
Targets:
  - Integration: Engineering Leads (Email)
  - On-Call: Management On-Call Schedule

Tier 3 (executive escalation):

Timeout: 30 minutes
Targets:
  - Integration: Critical Alerts (SMS)
  - Integration: Incident Webhook (Webhook)
  1. Click Create.

Via API

http
POST /api/escalation-policies
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Critical Issue Escalation",
  "description": "Multi-tier escalation for production-critical alerts",
  "repeatBehavior": "RepeatAll",
  "tiers": [
    {
      "level": 0,
      "timeoutMinutes": 5,
      "targets": [
        { "type": "integration", "id": "<slack-integration-id>" },
        { "type": "oncall_schedule", "id": "<primary-schedule-id>" }
      ]
    },
    {
      "level": 1,
      "timeoutMinutes": 15,
      "targets": [
        { "type": "integration", "id": "<email-integration-id>" },
        { "type": "oncall_schedule", "id": "<management-schedule-id>" }
      ]
    },
    {
      "level": 2,
      "timeoutMinutes": 30,
      "targets": [
        { "type": "integration", "id": "<sms-integration-id>" },
        { "type": "integration", "id": "<webhook-integration-id>" }
      ]
    }
  ]
}

Configuration

Policy Settings

Each escalation policy has the following top-level settings:

SettingRequiredDescription
NameYesA descriptive name for the policy (max 255 characters). Referenced in routing rules to link alerts to escalation workflows.
DescriptionNoOptional notes about when this policy should be used or what services it covers (max 1000 characters).
Repeat BehaviorNoControls what happens after all tiers have been exhausted. Defaults to "Stop after last tier".

Escalation Tiers

Tiers define the escalation chain. Each tier specifies:

  • Level -- The tier's position in the chain (0-indexed internally, displayed as Level 1, Level 2, etc.). Tier 1 is always notified first.
  • Timeout (minutes) -- How long to wait for acknowledgment before escalating to the next tier. Minimum 1 minute, maximum 1440 minutes (24 hours).
  • Targets -- One or more notification targets (integrations or on-call schedules) to notify at this tier.

Key rules:

  • At least one tier is required.
  • At least one target must be configured across all tiers.
  • Tiers are processed in order (Level 1 first, then Level 2, etc.).
  • Multiple targets within the same tier are notified simultaneously.

Default values:

  • First tier: 5-minute timeout
  • Additional tiers: 15-minute timeout

Targets

Each tier can contain two types of targets:

Integrations

Integrations are pre-configured notification channels (Slack, Email, SMS, Webhook). When a tier fires, the integration sends a notification through its configured channel.

Use when:

  • You want to notify a team channel (Slack)
  • You want to email a distribution list
  • You want to trigger an external webhook (PagerDuty, OpsGenie, custom)
  • You want to send SMS to a fixed set of numbers

Example targets:

Integration: #ops-alerts (Slack)
Integration: engineering-leads@company.com (Email)
Integration: Critical SMS Group (SMS)

On-Call Schedules

On-call schedules resolve to the currently on-call user at the time the tier fires. The on-call user is then notified via the integrations configured in the same escalation tier (e.g., Slack, Email, SMS).

Use when:

  • You want to notify whoever is currently on-call
  • You have rotating on-call schedules
  • You want notifications to follow the on-call rotation automatically

Example targets:

On-Call: Primary On-Call Schedule -> resolves to Jane (on-call this week)
On-Call: Backend Team Schedule   -> resolves to Bob (on-call today)

How on-call target notification works:

  1. Tier fires and on-call schedule target is evaluated.
  2. System looks up the currently on-call user for that schedule.
  3. The on-call user is notified via the integration targets configured in the same tier (e.g., Slack, Email, SMS).
  4. Pair on-call schedule targets with integration targets in the same tier to control how on-call users are reached.

Repeat Behavior

Repeat behavior controls what happens after all tiers have been exhausted without acknowledgment. There are three options:

Stop after last tier

Tier 1 -> Tier 2 -> Tier 3 -> (done)

Escalation stops entirely. No further notifications are sent. The issue remains in "Open" status until someone manually acknowledges or resolves it.

Use when:

  • You have a final-tier escalation that always gets attention (e.g., VP-level phone call)
  • You don't want notification fatigue from repeated alerts
  • You have other monitoring systems as a backstop

Repeat last tier

Tier 1 -> Tier 2 -> Tier 3 -> Tier 3 -> Tier 3 -> ...

After all tiers have been exhausted, the last tier is re-notified on its timeout interval. This continues indefinitely until the issue is acknowledged or resolved.

Use when:

  • The last tier represents the most senior responders
  • You want persistent notification until someone responds
  • The final tier has a longer timeout (e.g., 30 minutes) to avoid excessive noise

Repeat all tiers

Tier 1 -> Tier 2 -> Tier 3 -> Tier 1 -> Tier 2 -> Tier 3 -> ...

After all tiers have been exhausted, the entire escalation chain restarts from Tier 1. This cycles through all tiers again indefinitely until the issue is acknowledged or resolved.

Use when:

  • On-call rotations may have changed since the first cycle
  • You want maximum coverage across all response levels
  • You're dealing with critical incidents that absolutely must be addressed

Linking to Routing Rules

Escalation policies are connected to routing rules. A routing rule defines when to escalate (based on conditions like severity, error category, tags), and the escalation policy defines who to notify and how to escalate.

Connecting a Policy to a Routing Rule

  1. Navigate to Routing Rules in the sidebar.
  2. Create or edit a routing rule.
  3. In the Escalation section, select an escalation policy from the dropdown.
  4. Save the routing rule.

When the routing rule fires:

  • The rule's direct integration actions execute immediately (e.g., post to Slack).
  • The issue is linked to the selected escalation policy.
  • The escalation policy's Tier 1 is notified.
  • Escalation proceeds through tiers until acknowledgment.

Rules Without Escalation Policies

Routing rules without an escalation policy still fire their integration actions (Slack, email, etc.) but do not trigger multi-tier escalation. This is appropriate for low-severity alerts that don't need escalation tracking.

Multiple Rules, Same Policy

Multiple routing rules can reference the same escalation policy. For example, you might have separate routing rules for "Critical API Errors" and "Critical Database Errors" that both use the "Critical Issue Escalation" policy.

Examples

Example 1: Critical Production Alerts

Scenario: Your production API is monitored with high-frequency checks. Any critical failure needs immediate response with aggressive escalation.

Policy: Critical Production Issue Escalation
Repeat: Repeat all tiers

Tier 1 (immediate, 5 min timeout):
  - Integration: #prod-alerts (Slack)
  - On-Call: Primary SRE On-Call

Tier 2 (secondary, 10 min timeout):
  - Integration: SRE Team Lead (Email)
  - Integration: Critical SMS Alert (SMS)

Tier 3 (executive, 30 min timeout):
  - Integration: Engineering VP (Email)
  - Integration: Alert Webhook -> PagerDuty

Linked routing rule conditions:

Severity: Critical
Tags: production

Result:

  • Critical production issues immediately alert the Slack channel and on-call SRE.
  • If unacknowledged after 5 minutes, the SRE team lead gets an email and SMS.
  • If still unacknowledged after 10 more minutes, the VP of Engineering is emailed and the webhook fires.
  • After all tiers, the cycle restarts to ensure coverage.

Example 2: Business Hours Support

Scenario: Your internal tools are monitored during business hours. Failures need attention but not urgent after-hours escalation.

Policy: Business Hours Escalation
Repeat: Stop after last tier

Tier 1 (initial, 15 min timeout):
  - Integration: #internal-tools-alerts (Slack)

Tier 2 (follow-up, 30 min timeout):
  - On-Call: Internal Tools Team On-Call
  - Integration: tools-team@company.com (Email)

Linked routing rule conditions:

Tags: internal-tools
Importance: Internal, Development

Result:

  • Internal tool failures post to the Slack channel first.
  • If no one responds in 15 minutes, the on-call tools team member and email list are notified.
  • Escalation stops after Tier 2 -- these are not production-critical.

Example 3: Multi-Service with Separate Escalation Paths

Scenario: Your platform has separate teams for payments and user services. Each needs its own escalation chain.

Payment Service Policy:

Policy: Payment Service Escalation
Repeat: Repeat last tier

Tier 1 (5 min timeout):
  - On-Call: Payment Team On-Call
  - Integration: #payment-alerts (Slack)

Tier 2 (10 min timeout):
  - Integration: Payment Team Lead (SMS)
  - Integration: Finance Team (Email)

User Service Policy:

Policy: User Service Escalation
Repeat: Repeat last tier

Tier 1 (10 min timeout):
  - On-Call: Platform Team On-Call
  - Integration: #platform-alerts (Slack)

Tier 2 (20 min timeout):
  - Integration: Platform Engineering Lead (Email)

Routing rules:

  • Rule 1: Tags: payment-service -> Payment Service Escalation
  • Rule 2: Tags: user-service -> User Service Escalation

Best Practices

Tier Count

Recommendation: 2-3 tiers for most policies.

  • 1 tier -- Suitable for non-critical alerts where a single notification is sufficient.
  • 2 tiers -- Good default for most services. First responder + backup.
  • 3 tiers -- Appropriate for production-critical systems. First responder + team lead + executive.
  • 4+ tiers -- Rarely needed. Consider whether the extra tiers add value or just complexity.

Timeout Values

Recommendation: Start short, get longer.

TierRecommended TimeoutRationale
Tier 13-5 minutesFirst responders should be monitoring actively
Tier 210-15 minutesSecondary responders may need time to context-switch
Tier 320-30 minutesExecutive escalation gives broader response window

Warning: Avoid very short timeouts (1-2 minutes) on multiple tiers. This can cause alert fatigue when the on-call engineer is already investigating but hasn't acknowledged yet. Give people reasonable time to assess before escalating.

Target Redundancy

Recommendation: Use both integrations and on-call schedules in early tiers.

Good: Tier 1 has Slack channel + On-Call Schedule
Bad:  Tier 1 has only Slack channel (what if they're away from Slack?)

Having both a channel notification (for team visibility) and an on-call target (for personal notification) increases the likelihood of a fast response.

Naming Conventions

Use descriptive names that include the scope and severity level:

Good examples:

  • "Critical Production Escalation"
  • "Payment Service - High Priority"
  • "Internal Tools - Business Hours"
  • "Database Alerts - All Severity"

Bad examples:

  • "Escalation 1"
  • "Policy"
  • "Team notifications"

Descriptions

Use the description field to document:

  • Which services this policy covers
  • Which routing rules reference it
  • Any special considerations (e.g., "Do not modify without notifying the SRE team")

Review Regularly

Escalation policies should be reviewed periodically:

  • Monthly: Verify on-call schedules referenced by policies still exist and have participants.
  • Quarterly: Review timeout values based on actual acknowledgment times from incident history.
  • After escalations: Update policies if post-mortems reveal gaps in the escalation chain.

Deactivation vs. Deletion

Policies can be deactivated (set to inactive) without deleting them. This is useful when:

  • Temporarily disabling escalation during a known outage
  • Testing a new policy before removing the old one
  • Preserving configuration for seasonal services

Inactive policies are not available for selection in routing rules.

API Reference

List Escalation Policies

http
GET /api/escalation-policies
Authorization: Bearer <your-jwt-token>

Returns all escalation policies for the current workspace.

Get Escalation Policy

http
GET /api/escalation-policies/{id}
Authorization: Bearer <your-jwt-token>

Create Escalation Policy

http
POST /api/escalation-policies
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "string (required, max 255 chars)",
  "description": "string (optional, max 1000 chars)",
  "repeatBehavior": "Stop | RepeatLast | RepeatAll",
  "tiers": [
    {
      "level": 0,
      "timeoutMinutes": 5,
      "targets": [
        {
          "type": "integration | oncall_schedule",
          "id": "target-guid"
        }
      ]
    }
  ]
}

Update Escalation Policy

http
PUT /api/escalation-policies/{id}
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Updated name",
  "description": "Updated description",
  "isActive": true,
  "repeatBehavior": "RepeatAll",
  "tiers": [...]
}

All fields are optional. Only provided fields are updated.

Delete Escalation Policy

http
DELETE /api/escalation-policies/{id}
Authorization: Bearer <your-jwt-token>

Returns 204 No Content on success.

Troubleshooting

Escalation Not Triggering

Symptoms: Routing rule fires but no escalation occurs on the issue.

Possible causes:

  1. No escalation policy linked -- The routing rule does not have an escalation policy selected.
  2. Policy is inactive -- The linked escalation policy has been deactivated.
  3. No targets configured -- The policy's tiers have no targets.

Resolution:

  1. Edit the routing rule and verify an active escalation policy is selected in the "Escalation" section.
  2. Check the escalation policy's status badge (Active vs. Inactive).
  3. Verify at least one tier has at least one target.

On-Call User Not Receiving Notifications

Symptoms: Escalation triggers but the on-call person doesn't get notified.

Possible causes:

  1. No integration targets in the tier -- The escalation tier has an on-call schedule target but no integration targets to deliver the notification.
  2. Integration misconfigured -- The integration in the tier is inactive or has invalid configuration.
  3. Schedule misconfigured -- The on-call schedule does not have the expected user on-call.

Resolution:

  1. Verify the escalation tier has both an on-call schedule target and at least one integration target (e.g., Email, Slack, SMS).
  2. Check that the integration targets are active and properly configured.
  3. Check the on-call schedule to confirm who is currently on-call.
  4. Review the issue's activity log for escalation events and any delivery errors.

Escalation Repeating Unexpectedly

Symptoms: Notifications keep coming after all tiers have fired.

Cause: The policy's repeat behavior is set to "Repeat last tier" or "Repeat all tiers".

Resolution:

  1. Acknowledge the issue to stop escalation immediately.
  2. If the repeat behavior is not desired, edit the policy and change it to "Stop after last tier".

Pingward - API Monitoring Made Simple