Skip to content

Heartbeat Monitors

Heartbeat Monitors provide inbound monitoring for scheduled jobs, cron tasks, background workers, and any process that runs on a schedule. Unlike HTTP tests that actively probe your endpoints, heartbeat monitors passively wait for your services to check in. If a service misses its expected check-in, Pingward alerts you.

Overview

Pingward supports two complementary monitoring approaches:

ApproachDirectionHow it worksBest for
HTTP TestsOutboundPingward sends requests to your endpointsAPIs, websites, health checks
Heartbeat MonitorsInboundYour services send pings to PingwardCron jobs, workers, pipelines

With HTTP tests, Pingward is the initiator — it actively checks whether your service responds. With heartbeat monitors, the relationship is reversed: your service is the initiator, and Pingward watches for missed check-ins.

This inbound approach is essential for monitoring processes that are not directly accessible via HTTP, or that run on a schedule and need to report completion.

Use Cases

Cron Jobs and Scheduled Tasks

Scenario: Nightly database backup runs at 2:00 AM
Expected interval: 24 hours
Problem solved: Know immediately if the backup didn't run

Without heartbeat monitoring, a failed cron job may go unnoticed for days or weeks — until someone needs a backup that doesn't exist.

CI/CD Pipelines

Scenario: Deployment pipeline runs on every merge to main
Expected interval: 60 minutes (during business hours)
Problem solved: Detect stuck or failing pipelines

Background Workers and Queue Processors

Scenario: Order processing worker runs every 5 minutes
Expected interval: 5 minutes
Problem solved: Detect worker crashes or queue stalls

Data Pipelines and ETL Jobs

Scenario: ETL pipeline syncs data from warehouse every hour
Expected interval: 60 minutes
Problem solved: Catch pipeline failures before downstream systems are affected

Health Check Reporters

Scenario: Internal service reports health every minute
Expected interval: 1 minute
Problem solved: Monitor services behind firewalls that can't be reached externally

How It Works

Heartbeat monitoring follows a four-step process:

1. Create Monitor    →  Configure name, interval, and grace period
2. Get Ping URL      →  Receive a unique URL like /ping/abc123def456
3. Integrate         →  Add the ping URL to your job (curl, HTTP call, etc.)
4. Monitor           →  Pingward watches for missed heartbeats and alerts you

Step 1: You create a heartbeat monitor in the Pingward dashboard or via the API, specifying how often your job should check in.

Step 2: Pingward generates a unique ping URL containing a cryptographically random key. This URL requires no authentication — anyone with the URL can send a ping.

Step 3: You add an HTTP request to your job that hits the ping URL on each successful run. This can be as simple as curl -X POST https://your-pingward-instance/ping/abc123.

Step 4: Pingward tracks when each ping arrives. If a ping is late (beyond the expected interval plus the grace period), the monitor transitions to Overdue and then Missing, triggering alerts through your configured routing rules.

Monitor States

Heartbeat monitors move through four states based on ping activity:

Waiting

Initial state after creation. No pings have been received yet.

The monitor is created but has never received a heartbeat. This is the starting state for all new monitors. No alerts are triggered while in this state — Pingward is waiting for the first ping to establish a baseline.

Healthy

Last ping received within the expected interval. Everything is working normally.

The most recent ping arrived on time. The monitor transitions to Healthy on every successful ping, regardless of the previous state. If the monitor was previously Overdue or Missing, the event log records a recovery.

Overdue

Expected ping time has passed, including the grace period. The job may be delayed.

The monitor expected a ping by a certain time (last ping + interval + grace period), and that deadline has passed. This is a warning state — the job is late but may still arrive. An alert is triggered when transitioning to Overdue.

Missing

Significantly past the expected time. The job has likely failed.

The monitor has been overdue for an extended period. This is a critical state indicating the monitored process has probably failed. A higher-severity alert is triggered.

State transitions:

                    ┌─── ping received ───┐
                    v                     │
  [Waiting] ──ping──> [Healthy] ──timeout──> [Overdue] ──timeout──> [Missing]
                    ^                                                   │
                    └──────────── ping received ────────────────────────┘

Any state transitions back to Healthy when a ping is received. The monitor starts in Waiting and moves to Healthy on the first ping.

Creating a Heartbeat Monitor

Via Dashboard

  1. Navigate to Heartbeat Monitors in the sidebar
  2. Click + New Monitor
  3. Fill in the form:

Name (required):

Nightly Database Backup

Use a descriptive name that identifies the job or process being monitored.

Expected Interval (required):

Every 60 minutes

Select how often your job should send a ping. Available options:

  • Every 1 minute
  • Every 5 minutes
  • Every 15 minutes
  • Every 30 minutes
  • Every 60 minutes

Grace Period (minutes):

5

Extra buffer time after the expected interval before the monitor is marked as overdue. Default is 5 minutes. Minimum is 0, maximum is 1440 (24 hours).

Tags (optional):

production, backup, critical

Comma-separated tags for organizing monitors.

  1. Click Create Monitor
  2. You are redirected to the monitor detail page, which displays the Ping URL

Via API

http
POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Nightly Database Backup",
  "expectedIntervalMinutes": 60,
  "gracePeriodMinutes": 10,
  "tags": "production, backup"
}

Response (201 Created):

json
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "name": "Nightly Database Backup",
  "pingKey": "R4nD0mUrl-S4f3_K3y12345678",
  "pingUrl": "https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678",
  "status": "Waiting",
  "expectedIntervalMinutes": 60,
  "gracePeriodMinutes": 10,
  "lastPingAt": null,
  "nextExpectedAt": null,
  "isPaused": false,
  "tags": "production, backup",
  "createdAt": "2024-01-15T10:00:00Z",
  "updatedAt": "2024-01-15T10:00:00Z"
}

Configuration

Expected Interval

The expected interval defines how often your job should send a heartbeat ping. This is specified in minutes.

ValueUse case
1 minuteHigh-frequency health checks, real-time processors
5 minutesQueue workers, frequent batch jobs
15 minutesPeriodic sync tasks, data fetchers
30 minutesLess frequent jobs, report generators
60 minutesHourly jobs, ETL pipelines, backups

API range: 1 to 1440 minutes (1 minute to 24 hours). The dashboard provides preset options, but any value within this range can be set via the API.

Choosing the right interval: Set this to match your job's actual schedule. If your cron job runs every 15 minutes, set the expected interval to 15 minutes. If it runs once per hour, use 60 minutes.

Grace Period

The grace period is extra buffer time added after the expected interval before the monitor transitions to Overdue. It accounts for normal variation in job execution time.

Default: 5 minutes Range: 0 to 60 minutes

How it works:

Deadline = Last Ping Time + Expected Interval + Grace Period

Example:
  Last ping:          14:00
  Expected interval:  60 minutes
  Grace period:       10 minutes
  Deadline:           15:10 (overdue if no ping by this time)

When to increase the grace period:

  • Jobs with variable execution times (e.g., backup duration depends on data volume)
  • Jobs running on shared infrastructure where scheduling may be delayed
  • Jobs with dependencies that can cause cascading delays

When to decrease the grace period (or set to 0):

  • Critical jobs where even minor delays need immediate attention
  • High-frequency jobs where you want tight monitoring
  • Jobs with very predictable execution times

Tags

Tags are optional comma-separated labels for organizing your heartbeat monitors. Tags enable:

  • Grouping: View related monitors together (e.g., all "production" monitors)
  • Routing rules: Route alerts from tagged monitors to specific channels or teams
  • Maintenance windows: Apply maintenance windows to all monitors with a given tag

Example tags:

production, backup, critical
staging, etl, data-team
payment-service, queue-worker

Tags are stored as a comma-separated string. When configuring routing rules or maintenance windows, you can target monitors by their tags.

Sending Heartbeats

After creating a monitor, you receive a unique Ping URL. Send an HTTP request (GET or POST) to this URL each time your job completes successfully.

Ping URL format:

https://your-pingward-instance/ping/{pingKey}

The ping endpoint requires no authentication — the cryptographically generated pingKey in the URL serves as the identifier. This makes integration simple and avoids the need to manage API tokens in cron jobs.

cURL

The simplest integration. Add to the end of your script or cron job:

bash
# POST request (recommended)
curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678

# GET request (also works)
curl https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678

Cron Job

Append the ping to your cron command:

bash
# Run backup, then ping on success
0 2 * * * /usr/local/bin/backup.sh && curl -fsS -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 > /dev/null

The && operator ensures the ping is only sent if the backup script exits with status 0 (success). The -fsS flags make curl fail silently on HTTP errors but show errors on connection failures.

Python

python
import requests

def run_etl_pipeline():
    # ... your ETL logic here ...
    pass

if __name__ == "__main__":
    run_etl_pipeline()

    # Report success to Pingward
    requests.post("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678")

Node.js

javascript
async function processQueue() {
  // ... your queue processing logic ...
}

await processQueue();

// Report success to Pingward
await fetch("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678", {
  method: "POST",
});

With a Payload

You can optionally include a JSON payload with the ping to record additional context (e.g., job metrics, processed count):

bash
curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 \
  -H "Content-Type: application/json" \
  -d '{"payload": "Processed 1,523 records in 45s"}'

The payload is stored with the ping record and visible in the Ping History table on the monitor's detail page.

Ping Response

A successful ping returns:

json
{
  "status": "ok",
  "receivedAt": "2024-01-15T14:00:00Z",
  "monitorName": "Nightly Database Backup"
}

If the monitor is paused, the status is "paused" instead of "ok". Pings are still recorded when the monitor is paused.

Pausing and Resuming

You can pause a heartbeat monitor to temporarily stop monitoring without deleting it. While paused:

  • Pings are still accepted and recorded
  • The monitor does not transition to Overdue or Missing
  • No alerts are triggered
  • The dashboard shows a "Paused" badge

When to pause:

  • During planned maintenance of the monitored service
  • When temporarily disabling a scheduled job
  • While debugging a job's ping integration

Resuming a monitor recalculates the next expected deadline from the current time (not from the last ping), preventing an immediate Overdue state after resuming.

Via Dashboard

On the heartbeat monitor list or detail page, click Pause or Resume.

Via API

http
# Pause
POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>

# Resume
POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>

Alerts

Heartbeat monitors integrate with Pingward's issue and alert system. When a monitor misses its expected ping, the system creates an issue and routes it through your configured alert integrations.

When Alerts Trigger

TransitionSeverityDescription
Healthy to OverdueWarningJob is late — may still arrive
Overdue to MissingCriticalJob has failed — requires attention
Any to HealthyRecoveryJob recovered — informational

Issue Integration

Missed heartbeats create issues in the same system as HTTP test failures. This means:

  • Issues appear on the Issues page alongside test failures
  • Routing rules apply — route heartbeat alerts to specific Slack channels, email groups, or PagerDuty services
  • Maintenance windows apply — suppress heartbeat alerts during planned downtime using tag-based scoping
  • Escalation policies apply — escalate unresolved heartbeat issues through your on-call rotation

Event Log

Every state transition is recorded in the monitor's event log, accessible from the monitor detail page. Event types include:

Event TypeDescription
CreatedMonitor was created
StatusChangedStatus transitioned (e.g., Waiting to Healthy, Healthy to Overdue)
PausedMonitoring was paused
ResumedMonitoring was resumed

Each event records the previous status, new status, timestamp, and a human-readable description.

Best Practices

Naming Conventions

Use descriptive names that identify both the job and its environment:

Good examples:

  • "Nightly Database Backup"
  • "Order Processing Queue Worker"
  • "ETL Pipeline - Customer Data Sync"
  • "CI/CD Deploy Pipeline (Production)"

Bad examples:

  • "Monitor 1"
  • "Cron"
  • "Heartbeat"
  • "Test"

Grace Period Sizing

Choose a grace period that accounts for normal variation without masking real failures:

Job typeRecommended grace period
Sub-minute health checks1-2 minutes
5-minute queue workers2-5 minutes
15-minute sync tasks5 minutes
Hourly batch jobs10-15 minutes
Jobs with highly variable duration15-30 minutes

Rule of thumb: Set the grace period to 10-25% of the expected interval, with a minimum of 1-2 minutes.

Only Ping on Success

Send the heartbeat ping only when the job completes successfully. If your job fails, the absence of a ping is the signal:

bash
# Correct: ping only on success
/usr/local/bin/backup.sh && curl -X POST https://your-pingward-instance/ping/...

# Wrong: ping regardless of outcome
/usr/local/bin/backup.sh; curl -X POST https://your-pingward-instance/ping/...

Using && ensures the ping is sent only if the preceding command exits with code 0. Using ; would send the ping even if the job failed, defeating the purpose of heartbeat monitoring.

Use Tags to Organize

Tag your monitors consistently to enable effective routing and maintenance:

# By environment
production, staging, development

# By team
backend-team, data-team, platform-team

# By criticality
critical, high, low

# By service
payment-service, user-service, analytics

Monitor Critical Jobs First

Prioritize heartbeat monitors for jobs where failure has the highest impact:

  1. Data backups — undetected backup failures are catastrophic
  2. Financial processing — payment batch jobs, reconciliation
  3. Compliance jobs — regulatory reporting, audit log exports
  4. Queue workers — stuck queues cause cascading failures
  5. Data pipelines — downstream systems depend on fresh data

Troubleshooting

Monitor Stuck in "Waiting"

Symptoms: Monitor was created but never transitions to Healthy.

Possible causes:

  1. Job has not run yet — ping has never been sent
  2. Ping URL is incorrect — verify the URL in the monitor detail page
  3. Network issue — job cannot reach the Pingward API
  4. Firewall blocking — outbound HTTP from the job host is blocked

Resolution:

  1. Copy the ping URL from the monitor detail page
  2. Test manually: curl -v -X POST <ping-url>
  3. Verify the job is configured to call the correct URL
  4. Check that the job host can reach Pingward (DNS, firewall, proxy)

False Overdue Alerts

Symptoms: Monitor goes Overdue even though the job is running.

Possible causes:

  1. Expected interval is too short for the job's actual schedule
  2. Grace period is too small for the job's execution time variance
  3. Job is pinging before completion (ping sent at start instead of end)
  4. Clock skew between job host and Pingward

Resolution:

  1. Review the job's actual execution schedule and adjust the interval
  2. Increase the grace period to account for normal variance
  3. Move the ping call to the very end of the job, after all work is done
  4. Verify the job host's system clock is synchronized (NTP)

Ping Returns 404

Symptoms: curl or HTTP call to the ping URL returns 404 Not Found.

Possible causes:

  1. Ping key is incorrect or truncated
  2. Monitor was deleted
  3. Wrong base URL

Resolution:

  1. Copy the ping URL directly from the dashboard (use the "Copy" button)
  2. Verify the monitor exists in the Heartbeat Monitors list
  3. Check that the base URL matches your Pingward instance

Monitor Not Alerting

Symptoms: Monitor shows Overdue or Missing but no alerts are received.

Possible causes:

  1. No alert integrations configured
  2. Routing rules exclude heartbeat monitors
  3. Monitor is paused
  4. Active maintenance window suppressing alerts

Resolution:

  1. Check Integrations page for configured alert channels
  2. Review Routing Rules to ensure they include heartbeat monitor tags
  3. Verify the monitor is not paused (check the status badge)
  4. Check Maintenance Windows for active windows affecting this monitor's tags

API Reference

Create Heartbeat Monitor

http
POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "string (required, 1-255 characters)",
  "expectedIntervalMinutes": "integer (1-1440, default: 5)",
  "gracePeriodMinutes": "integer (0-60, default: 5)",
  "tags": "string (optional, comma-separated)"
}

List Heartbeat Monitors

http
GET /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?status=Waiting | Healthy | Overdue | Missing
  ?search=<name search>
  ?limit=50&offset=0

When limit or offset is provided, the response is paginated:

json
{
  "items": [...],
  "total": 42,
  "limit": 50,
  "offset": 0
}

Without pagination parameters, the response is a flat array of monitors.

Get Heartbeat Monitor

http
GET /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>

Update Heartbeat Monitor

http
PUT /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Updated name",
  "expectedIntervalMinutes": 30,
  "gracePeriodMinutes": 10,
  "tags": "updated, tags"
}

All fields are optional — only provided fields are updated. If the expected interval changes and the monitor has been pinged, the next expected deadline is recalculated.

Delete Heartbeat Monitor

http
DELETE /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>

Returns 204 No Content on success.

Pause / Resume

http
POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>

POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>

Get Ping History

http
GET /api/heartbeat-monitors/{id}/pings
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?limit=50&offset=0

Get Event Log

http
GET /api/heartbeat-monitors/{id}/events
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?limit=50&offset=0

Send a Ping (Public)

http
POST /ping/{pingKey}
GET /ping/{pingKey}

Optional body (POST only):
{
  "payload": "string (optional context)"
}

No authentication required. Rate limited under the "public" rate limit policy.

Pingward - API Monitoring Made Simple