Heartbeat Monitors

Heartbeat Monitors provide inbound monitoring for scheduled jobs, cron tasks, background workers, and any process that runs on a schedule. Unlike HTTP tests that actively probe your endpoints, heartbeat monitors passively wait for your services to check in. If a service misses its expected check-in, Pingward alerts you.

Overview

Pingward supports two complementary monitoring approaches:

Approach	Direction	How it works	Best for
HTTP Tests	Outbound	Pingward sends requests to your endpoints	APIs, websites, health checks
Heartbeat Monitors	Inbound	Your services send pings to Pingward	Cron jobs, workers, pipelines

With HTTP tests, Pingward is the initiator — it actively checks whether your service responds. With heartbeat monitors, the relationship is reversed: your service is the initiator, and Pingward watches for missed check-ins.

This inbound approach is essential for monitoring processes that are not directly accessible via HTTP, or that run on a schedule and need to report completion.

Use Cases

Cron Jobs and Scheduled Tasks

Scenario: Nightly database backup runs at 2:00 AM
Expected interval: 24 hours
Problem solved: Know immediately if the backup didn't run

Without heartbeat monitoring, a failed cron job may go unnoticed for days or weeks — until someone needs a backup that doesn't exist.

CI/CD Pipelines

Scenario: Deployment pipeline runs on every merge to main
Expected interval: 60 minutes (during business hours)
Problem solved: Detect stuck or failing pipelines

Background Workers and Queue Processors

Scenario: Order processing worker runs every 5 minutes
Expected interval: 5 minutes
Problem solved: Detect worker crashes or queue stalls

Data Pipelines and ETL Jobs

Scenario: ETL pipeline syncs data from warehouse every hour
Expected interval: 60 minutes
Problem solved: Catch pipeline failures before downstream systems are affected

Health Check Reporters

Scenario: Internal service reports health every minute
Expected interval: 1 minute
Problem solved: Monitor services behind firewalls that can't be reached externally

How It Works

Heartbeat monitoring follows a four-step process:

1. Create Monitor    →  Configure name, interval, and grace period
2. Get Ping URL      →  Receive a unique URL like /ping/abc123def456
3. Integrate         →  Add the ping URL to your job (curl, HTTP call, etc.)
4. Monitor           →  Pingward watches for missed heartbeats and alerts you

Step 1: You create a heartbeat monitor in the Pingward dashboard or via the API, specifying how often your job should check in.

Step 2: Pingward generates a unique ping URL containing a cryptographically random key. This URL requires no authentication — anyone with the URL can send a ping.

Step 3: You add an HTTP request to your job that hits the ping URL on each successful run. This can be as simple as curl -X POST https://your-pingward-instance/ping/abc123.

Step 4: Pingward tracks when each ping arrives. If a ping is late (beyond the expected interval plus the grace period), the monitor transitions to Overdue and then Missing, triggering alerts through your configured routing rules.

Monitor States

Heartbeat monitors move through four states based on ping activity:

Waiting

Initial state after creation. No pings have been received yet.

The monitor is created but has never received a heartbeat. This is the starting state for all new monitors. No alerts are triggered while in this state — Pingward is waiting for the first ping to establish a baseline.

Healthy

Last ping received within the expected interval. Everything is working normally.

The most recent ping arrived on time. The monitor transitions to Healthy on every successful ping, regardless of the previous state. If the monitor was previously Overdue or Missing, the event log records a recovery.

Overdue

Expected ping time has passed, including the grace period. The job may be delayed.

The monitor expected a ping by a certain time (last ping + interval + grace period), and that deadline has passed. This is a warning state — the job is late but may still arrive. An alert is triggered when transitioning to Overdue.

Missing

Significantly past the expected time. The job has likely failed.

The monitor has been overdue for an extended period. This is a critical state indicating the monitored process has probably failed. A higher-severity alert is triggered.

State transitions:

                    ┌─── ping received ───┐
                    v                     │
  [Waiting] ──ping──> [Healthy] ──timeout──> [Overdue] ──timeout──> [Missing]
                    ^                                                   │
                    └──────────── ping received ────────────────────────┘

Any state transitions back to Healthy when a ping is received. The monitor starts in Waiting and moves to Healthy on the first ping.

Creating a Heartbeat Monitor

Via Dashboard

Navigate to Heartbeat Monitors in the sidebar
Click + New Monitor
Fill in the form:

Name (required):

Nightly Database Backup

Use a descriptive name that identifies the job or process being monitored.

Expected Interval (required):

Every 60 minutes

Select how often your job should send a ping. Available options:

Every 1 minute
Every 5 minutes
Every 15 minutes
Every 30 minutes
Every 60 minutes

Grace Period (minutes):

Extra buffer time after the expected interval before the monitor is marked as overdue. Default is 5 minutes. Minimum is 0, maximum is 1440 (24 hours).

Tags (optional):

production, backup, critical

Comma-separated tags for organizing monitors.

Click Create Monitor
You are redirected to the monitor detail page, which displays the Ping URL

Via API

http

POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Nightly Database Backup",
  "expectedIntervalMinutes": 60,
  "gracePeriodMinutes": 10,
  "tags": "production, backup"
}

Response (201 Created):

json

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "name": "Nightly Database Backup",
  "pingKey": "R4nD0mUrl-S4f3_K3y12345678",
  "pingUrl": "https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678",
  "status": "Waiting",
  "expectedIntervalMinutes": 60,
  "gracePeriodMinutes": 10,
  "lastPingAt": null,
  "nextExpectedAt": null,
  "isPaused": false,
  "tags": "production, backup",
  "createdAt": "2024-01-15T10:00:00Z",
  "updatedAt": "2024-01-15T10:00:00Z"
}

Configuration

Expected Interval

The expected interval defines how often your job should send a heartbeat ping. This is specified in minutes.

Value	Use case
1 minute	High-frequency health checks, real-time processors
5 minutes	Queue workers, frequent batch jobs
15 minutes	Periodic sync tasks, data fetchers
30 minutes	Less frequent jobs, report generators
60 minutes	Hourly jobs, ETL pipelines, backups

API range: 1 to 1440 minutes (1 minute to 24 hours). The dashboard provides preset options, but any value within this range can be set via the API.

Choosing the right interval: Set this to match your job's actual schedule. If your cron job runs every 15 minutes, set the expected interval to 15 minutes. If it runs once per hour, use 60 minutes.

Grace Period

The grace period is extra buffer time added after the expected interval before the monitor transitions to Overdue. It accounts for normal variation in job execution time.

Default: 5 minutes Range: 0 to 60 minutes

How it works:

Deadline = Last Ping Time + Expected Interval + Grace Period

Example:
  Last ping:          14:00
  Expected interval:  60 minutes
  Grace period:       10 minutes
  Deadline:           15:10 (overdue if no ping by this time)

When to increase the grace period:

Jobs with variable execution times (e.g., backup duration depends on data volume)
Jobs running on shared infrastructure where scheduling may be delayed
Jobs with dependencies that can cause cascading delays

When to decrease the grace period (or set to 0):

Critical jobs where even minor delays need immediate attention
High-frequency jobs where you want tight monitoring
Jobs with very predictable execution times

Sending Heartbeats

After creating a monitor, you receive a unique Ping URL. Send an HTTP request (GET or POST) to this URL each time your job completes successfully.

Ping URL format:

https://your-pingward-instance/ping/{pingKey}

The ping endpoint requires no authentication — the cryptographically generated pingKey in the URL serves as the identifier. This makes integration simple and avoids the need to manage API tokens in cron jobs.

cURL

The simplest integration. Add to the end of your script or cron job:

bash

# POST request (recommended)
curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678

# GET request (also works)
curl https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678

Cron Job

Append the ping to your cron command:

bash

# Run backup, then ping on success
0 2 * * * /usr/local/bin/backup.sh && curl -fsS -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 > /dev/null

The && operator ensures the ping is only sent if the backup script exits with status 0 (success). The -fsS flags make curl fail silently on HTTP errors but show errors on connection failures.

Python

python

import requests

def run_etl_pipeline():
    # ... your ETL logic here ...
    pass

if __name__ == "__main__":
    run_etl_pipeline()

    # Report success to Pingward
    requests.post("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678")

Node.js

javascript

async function processQueue() {
  // ... your queue processing logic ...
}

await processQueue();

// Report success to Pingward
await fetch("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678", {
  method: "POST",
});

With a Payload

You can optionally include a JSON payload with the ping to record additional context (e.g., job metrics, processed count):

bash

curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 \
  -H "Content-Type: application/json" \
  -d '{"payload": "Processed 1,523 records in 45s"}'

The payload is stored with the ping record and visible in the Ping History table on the monitor's detail page.

Ping Response

A successful ping returns:

json

{
  "status": "ok",
  "receivedAt": "2024-01-15T14:00:00Z",
  "monitorName": "Nightly Database Backup"
}

If the monitor is paused, the status is "paused" instead of "ok". Pings are still recorded when the monitor is paused.

Pausing and Resuming

You can pause a heartbeat monitor to temporarily stop monitoring without deleting it. While paused:

Pings are still accepted and recorded
The monitor does not transition to Overdue or Missing
No alerts are triggered
The dashboard shows a "Paused" badge

When to pause:

During planned maintenance of the monitored service
When temporarily disabling a scheduled job
While debugging a job's ping integration

Resuming a monitor recalculates the next expected deadline from the current time (not from the last ping), preventing an immediate Overdue state after resuming.

Via Dashboard

On the heartbeat monitor list or detail page, click Pause or Resume.

Via API

http

# Pause
POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>

# Resume
POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>

Alerts

Heartbeat monitors integrate with Pingward's issue and alert system. When a monitor misses its expected ping, the system creates an issue and routes it through your configured alert integrations.

When Alerts Trigger

Transition	Severity	Description
Healthy to Overdue	Warning	Job is late — may still arrive
Overdue to Missing	Critical	Job has failed — requires attention
Any to Healthy	Recovery	Job recovered — informational

Issue Integration

Missed heartbeats create issues in the same system as HTTP test failures. This means:

Issues appear on the Issues page alongside test failures
Routing rules apply — route heartbeat alerts to specific Slack channels, email groups, or PagerDuty services
Maintenance windows apply — suppress heartbeat alerts during planned downtime using tag-based scoping
Escalation policies apply — escalate unresolved heartbeat issues through your on-call rotation

Event Log

Every state transition is recorded in the monitor's event log, accessible from the monitor detail page. Event types include:

Event Type	Description
Created	Monitor was created
StatusChanged	Status transitioned (e.g., Waiting to Healthy, Healthy to Overdue)
Paused	Monitoring was paused
Resumed	Monitoring was resumed

Each event records the previous status, new status, timestamp, and a human-readable description.

Best Practices

Naming Conventions

Use descriptive names that identify both the job and its environment:

Good examples:

"Nightly Database Backup"
"Order Processing Queue Worker"
"ETL Pipeline - Customer Data Sync"
"CI/CD Deploy Pipeline (Production)"

Bad examples:

"Monitor 1"
"Cron"
"Heartbeat"
"Test"

Grace Period Sizing

Choose a grace period that accounts for normal variation without masking real failures:

Job type	Recommended grace period
Sub-minute health checks	1-2 minutes
5-minute queue workers	2-5 minutes
15-minute sync tasks	5 minutes
Hourly batch jobs	10-15 minutes
Jobs with highly variable duration	15-30 minutes

Rule of thumb: Set the grace period to 10-25% of the expected interval, with a minimum of 1-2 minutes.

Only Ping on Success

Send the heartbeat ping only when the job completes successfully. If your job fails, the absence of a ping is the signal:

bash

# Correct: ping only on success
/usr/local/bin/backup.sh && curl -X POST https://your-pingward-instance/ping/...

# Wrong: ping regardless of outcome
/usr/local/bin/backup.sh; curl -X POST https://your-pingward-instance/ping/...

Using && ensures the ping is sent only if the preceding command exits with code 0. Using ; would send the ping even if the job failed, defeating the purpose of heartbeat monitoring.

Use Tags to Organize

Tag your monitors consistently to enable effective routing and maintenance:

# By environment
production, staging, development

# By team
backend-team, data-team, platform-team

# By criticality
critical, high, low

# By service
payment-service, user-service, analytics

Monitor Critical Jobs First

Prioritize heartbeat monitors for jobs where failure has the highest impact:

Data backups — undetected backup failures are catastrophic
Financial processing — payment batch jobs, reconciliation
Compliance jobs — regulatory reporting, audit log exports
Queue workers — stuck queues cause cascading failures
Data pipelines — downstream systems depend on fresh data

Troubleshooting

Monitor Stuck in "Waiting"

Symptoms: Monitor was created but never transitions to Healthy.

Possible causes:

Job has not run yet — ping has never been sent
Ping URL is incorrect — verify the URL in the monitor detail page
Network issue — job cannot reach the Pingward API
Firewall blocking — outbound HTTP from the job host is blocked

Resolution:

Copy the ping URL from the monitor detail page
Test manually: curl -v -X POST <ping-url>
Verify the job is configured to call the correct URL
Check that the job host can reach Pingward (DNS, firewall, proxy)

False Overdue Alerts

Symptoms: Monitor goes Overdue even though the job is running.

Possible causes:

Expected interval is too short for the job's actual schedule
Grace period is too small for the job's execution time variance
Job is pinging before completion (ping sent at start instead of end)
Clock skew between job host and Pingward

Resolution:

Review the job's actual execution schedule and adjust the interval
Increase the grace period to account for normal variance
Move the ping call to the very end of the job, after all work is done
Verify the job host's system clock is synchronized (NTP)

Ping Returns 404

Symptoms: curl or HTTP call to the ping URL returns 404 Not Found.

Possible causes:

Ping key is incorrect or truncated
Monitor was deleted
Wrong base URL

Resolution:

Copy the ping URL directly from the dashboard (use the "Copy" button)
Verify the monitor exists in the Heartbeat Monitors list
Check that the base URL matches your Pingward instance

Monitor Not Alerting

Symptoms: Monitor shows Overdue or Missing but no alerts are received.

Possible causes:

No alert integrations configured
Routing rules exclude heartbeat monitors
Monitor is paused
Active maintenance window suppressing alerts

Resolution:

Check Integrations page for configured alert channels
Review Routing Rules to ensure they include heartbeat monitor tags
Verify the monitor is not paused (check the status badge)
Check Maintenance Windows for active windows affecting this monitor's tags

API Reference

Create Heartbeat Monitor

http

POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "string (required, 1-255 characters)",
  "expectedIntervalMinutes": "integer (1-1440, default: 5)",
  "gracePeriodMinutes": "integer (0-60, default: 5)",
  "tags": "string (optional, comma-separated)"
}

List Heartbeat Monitors

http

GET /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?status=Waiting | Healthy | Overdue | Missing
  ?search=<name search>
  ?limit=50&offset=0

When limit or offset is provided, the response is paginated:

json

{
  "items": [...],
  "total": 42,
  "limit": 50,
  "offset": 0
}

Without pagination parameters, the response is a flat array of monitors.

Get Heartbeat Monitor

http

GET /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>

Update Heartbeat Monitor

http

PUT /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>
Content-Type: application/json

{
  "name": "Updated name",
  "expectedIntervalMinutes": 30,
  "gracePeriodMinutes": 10,
  "tags": "updated, tags"
}

All fields are optional — only provided fields are updated. If the expected interval changes and the monitor has been pinged, the next expected deadline is recalculated.

Delete Heartbeat Monitor

http

DELETE /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>

Returns 204 No Content on success.

Pause / Resume

http

POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>

POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>

Get Ping History

http

GET /api/heartbeat-monitors/{id}/pings
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?limit=50&offset=0

Get Event Log

http

GET /api/heartbeat-monitors/{id}/events
Authorization: Bearer <your-jwt-token>

Query parameters:
  ?limit=50&offset=0

Send a Ping (Public)

http

POST /ping/{pingKey}
GET /ping/{pingKey}

Optional body (POST only):
{
  "payload": "string (optional context)"
}

No authentication required. Rate limited under the "public" rate limit policy.

Issue Management - How issues are created and resolved when heartbeats are missed
Alert Routing - Configure routing rules to direct heartbeat alerts to specific channels
Maintenance Windows - Suppress heartbeat alerts during planned downtime
Test Configuration - Set up HTTP tests and tags for organizing monitors

Heartbeat Monitors ​

Overview ​

Use Cases ​

Cron Jobs and Scheduled Tasks ​

CI/CD Pipelines ​

Background Workers and Queue Processors ​

Data Pipelines and ETL Jobs ​

Health Check Reporters ​

How It Works ​

Monitor States ​

Waiting ​

Healthy ​

Overdue ​

Missing ​

Creating a Heartbeat Monitor ​

Via Dashboard ​

Via API ​

Configuration ​

Expected Interval ​

Grace Period ​

Tags ​

Sending Heartbeats ​

cURL ​

Cron Job ​

Python ​

Node.js ​

With a Payload ​

Ping Response ​

Pausing and Resuming ​

Via Dashboard ​

Via API ​

Alerts ​

When Alerts Trigger ​

Issue Integration ​

Event Log ​

Best Practices ​

Naming Conventions ​

Grace Period Sizing ​

Only Ping on Success ​

Use Tags to Organize ​

Monitor Critical Jobs First ​

Troubleshooting ​

Monitor Stuck in "Waiting" ​

False Overdue Alerts ​

Ping Returns 404 ​

Monitor Not Alerting ​

API Reference ​

Create Heartbeat Monitor ​

List Heartbeat Monitors ​

Get Heartbeat Monitor ​

Update Heartbeat Monitor ​

Delete Heartbeat Monitor ​

Pause / Resume ​

Get Ping History ​

Get Event Log ​

Send a Ping (Public) ​

Related Documentation ​

Heartbeat Monitors

Overview

Use Cases

Cron Jobs and Scheduled Tasks

CI/CD Pipelines

Background Workers and Queue Processors

Data Pipelines and ETL Jobs

Health Check Reporters

How It Works

Monitor States

Waiting

Healthy

Overdue

Missing

Creating a Heartbeat Monitor

Via Dashboard

Via API

Configuration

Expected Interval

Grace Period

Tags

Sending Heartbeats

cURL

Cron Job

Python

Node.js

With a Payload

Ping Response

Pausing and Resuming

Via Dashboard

Via API

Alerts

When Alerts Trigger

Issue Integration

Event Log

Best Practices

Naming Conventions

Grace Period Sizing

Only Ping on Success

Use Tags to Organize

Monitor Critical Jobs First

Troubleshooting

Monitor Stuck in "Waiting"

False Overdue Alerts

Ping Returns 404

Monitor Not Alerting

API Reference

Create Heartbeat Monitor

List Heartbeat Monitors

Get Heartbeat Monitor

Update Heartbeat Monitor

Delete Heartbeat Monitor

Pause / Resume

Get Ping History

Get Event Log

Send a Ping (Public)

Related Documentation