Appearance
Heartbeat Monitors
Heartbeat Monitors provide inbound monitoring for scheduled jobs, cron tasks, background workers, and any process that runs on a schedule. Unlike HTTP tests that actively probe your endpoints, heartbeat monitors passively wait for your services to check in. If a service misses its expected check-in, Pingward alerts you.
Overview
Pingward supports two complementary monitoring approaches:
| Approach | Direction | How it works | Best for |
|---|---|---|---|
| HTTP Tests | Outbound | Pingward sends requests to your endpoints | APIs, websites, health checks |
| Heartbeat Monitors | Inbound | Your services send pings to Pingward | Cron jobs, workers, pipelines |
With HTTP tests, Pingward is the initiator — it actively checks whether your service responds. With heartbeat monitors, the relationship is reversed: your service is the initiator, and Pingward watches for missed check-ins.
This inbound approach is essential for monitoring processes that are not directly accessible via HTTP, or that run on a schedule and need to report completion.
Use Cases
Cron Jobs and Scheduled Tasks
Scenario: Nightly database backup runs at 2:00 AM
Expected interval: 24 hours
Problem solved: Know immediately if the backup didn't runWithout heartbeat monitoring, a failed cron job may go unnoticed for days or weeks — until someone needs a backup that doesn't exist.
CI/CD Pipelines
Scenario: Deployment pipeline runs on every merge to main
Expected interval: 60 minutes (during business hours)
Problem solved: Detect stuck or failing pipelinesBackground Workers and Queue Processors
Scenario: Order processing worker runs every 5 minutes
Expected interval: 5 minutes
Problem solved: Detect worker crashes or queue stallsData Pipelines and ETL Jobs
Scenario: ETL pipeline syncs data from warehouse every hour
Expected interval: 60 minutes
Problem solved: Catch pipeline failures before downstream systems are affectedHealth Check Reporters
Scenario: Internal service reports health every minute
Expected interval: 1 minute
Problem solved: Monitor services behind firewalls that can't be reached externallyHow It Works
Heartbeat monitoring follows a four-step process:
1. Create Monitor → Configure name, interval, and grace period
2. Get Ping URL → Receive a unique URL like /ping/abc123def456
3. Integrate → Add the ping URL to your job (curl, HTTP call, etc.)
4. Monitor → Pingward watches for missed heartbeats and alerts youStep 1: You create a heartbeat monitor in the Pingward dashboard or via the API, specifying how often your job should check in.
Step 2: Pingward generates a unique ping URL containing a cryptographically random key. This URL requires no authentication — anyone with the URL can send a ping.
Step 3: You add an HTTP request to your job that hits the ping URL on each successful run. This can be as simple as curl -X POST https://your-pingward-instance/ping/abc123.
Step 4: Pingward tracks when each ping arrives. If a ping is late (beyond the expected interval plus the grace period), the monitor transitions to Overdue and then Missing, triggering alerts through your configured routing rules.
Monitor States
Heartbeat monitors move through four states based on ping activity:
Waiting
Initial state after creation. No pings have been received yet.The monitor is created but has never received a heartbeat. This is the starting state for all new monitors. No alerts are triggered while in this state — Pingward is waiting for the first ping to establish a baseline.
Healthy
Last ping received within the expected interval. Everything is working normally.The most recent ping arrived on time. The monitor transitions to Healthy on every successful ping, regardless of the previous state. If the monitor was previously Overdue or Missing, the event log records a recovery.
Overdue
Expected ping time has passed, including the grace period. The job may be delayed.The monitor expected a ping by a certain time (last ping + interval + grace period), and that deadline has passed. This is a warning state — the job is late but may still arrive. An alert is triggered when transitioning to Overdue.
Missing
Significantly past the expected time. The job has likely failed.The monitor has been overdue for an extended period. This is a critical state indicating the monitored process has probably failed. A higher-severity alert is triggered.
State transitions:
┌─── ping received ───┐
v │
[Waiting] ──ping──> [Healthy] ──timeout──> [Overdue] ──timeout──> [Missing]
^ │
└──────────── ping received ────────────────────────┘Any state transitions back to Healthy when a ping is received. The monitor starts in Waiting and moves to Healthy on the first ping.
Creating a Heartbeat Monitor
Via Dashboard
- Navigate to Heartbeat Monitors in the sidebar
- Click + New Monitor
- Fill in the form:
Name (required):
Nightly Database BackupUse a descriptive name that identifies the job or process being monitored.
Expected Interval (required):
Every 60 minutesSelect how often your job should send a ping. Available options:
- Every 1 minute
- Every 5 minutes
- Every 15 minutes
- Every 30 minutes
- Every 60 minutes
Grace Period (minutes):
5Extra buffer time after the expected interval before the monitor is marked as overdue. Default is 5 minutes. Minimum is 0, maximum is 1440 (24 hours).
Tags (optional):
production, backup, criticalComma-separated tags for organizing monitors.
- Click Create Monitor
- You are redirected to the monitor detail page, which displays the Ping URL
Via API
http
POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json
{
"name": "Nightly Database Backup",
"expectedIntervalMinutes": 60,
"gracePeriodMinutes": 10,
"tags": "production, backup"
}Response (201 Created):
json
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Nightly Database Backup",
"pingKey": "R4nD0mUrl-S4f3_K3y12345678",
"pingUrl": "https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678",
"status": "Waiting",
"expectedIntervalMinutes": 60,
"gracePeriodMinutes": 10,
"lastPingAt": null,
"nextExpectedAt": null,
"isPaused": false,
"tags": "production, backup",
"createdAt": "2024-01-15T10:00:00Z",
"updatedAt": "2024-01-15T10:00:00Z"
}Configuration
Expected Interval
The expected interval defines how often your job should send a heartbeat ping. This is specified in minutes.
| Value | Use case |
|---|---|
| 1 minute | High-frequency health checks, real-time processors |
| 5 minutes | Queue workers, frequent batch jobs |
| 15 minutes | Periodic sync tasks, data fetchers |
| 30 minutes | Less frequent jobs, report generators |
| 60 minutes | Hourly jobs, ETL pipelines, backups |
API range: 1 to 1440 minutes (1 minute to 24 hours). The dashboard provides preset options, but any value within this range can be set via the API.
Choosing the right interval: Set this to match your job's actual schedule. If your cron job runs every 15 minutes, set the expected interval to 15 minutes. If it runs once per hour, use 60 minutes.
Grace Period
The grace period is extra buffer time added after the expected interval before the monitor transitions to Overdue. It accounts for normal variation in job execution time.
Default: 5 minutes Range: 0 to 60 minutes
How it works:
Deadline = Last Ping Time + Expected Interval + Grace Period
Example:
Last ping: 14:00
Expected interval: 60 minutes
Grace period: 10 minutes
Deadline: 15:10 (overdue if no ping by this time)When to increase the grace period:
- Jobs with variable execution times (e.g., backup duration depends on data volume)
- Jobs running on shared infrastructure where scheduling may be delayed
- Jobs with dependencies that can cause cascading delays
When to decrease the grace period (or set to 0):
- Critical jobs where even minor delays need immediate attention
- High-frequency jobs where you want tight monitoring
- Jobs with very predictable execution times
Tags
Tags are optional comma-separated labels for organizing your heartbeat monitors. Tags enable:
- Grouping: View related monitors together (e.g., all "production" monitors)
- Routing rules: Route alerts from tagged monitors to specific channels or teams
- Maintenance windows: Apply maintenance windows to all monitors with a given tag
Example tags:
production, backup, critical
staging, etl, data-team
payment-service, queue-workerTags are stored as a comma-separated string. When configuring routing rules or maintenance windows, you can target monitors by their tags.
Sending Heartbeats
After creating a monitor, you receive a unique Ping URL. Send an HTTP request (GET or POST) to this URL each time your job completes successfully.
Ping URL format:
https://your-pingward-instance/ping/{pingKey}The ping endpoint requires no authentication — the cryptographically generated pingKey in the URL serves as the identifier. This makes integration simple and avoids the need to manage API tokens in cron jobs.
cURL
The simplest integration. Add to the end of your script or cron job:
bash
# POST request (recommended)
curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678
# GET request (also works)
curl https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678Cron Job
Append the ping to your cron command:
bash
# Run backup, then ping on success
0 2 * * * /usr/local/bin/backup.sh && curl -fsS -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 > /dev/nullThe && operator ensures the ping is only sent if the backup script exits with status 0 (success). The -fsS flags make curl fail silently on HTTP errors but show errors on connection failures.
Python
python
import requests
def run_etl_pipeline():
# ... your ETL logic here ...
pass
if __name__ == "__main__":
run_etl_pipeline()
# Report success to Pingward
requests.post("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678")Node.js
javascript
async function processQueue() {
// ... your queue processing logic ...
}
await processQueue();
// Report success to Pingward
await fetch("https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678", {
method: "POST",
});With a Payload
You can optionally include a JSON payload with the ping to record additional context (e.g., job metrics, processed count):
bash
curl -X POST https://your-pingward-instance/ping/R4nD0mUrl-S4f3_K3y12345678 \
-H "Content-Type: application/json" \
-d '{"payload": "Processed 1,523 records in 45s"}'The payload is stored with the ping record and visible in the Ping History table on the monitor's detail page.
Ping Response
A successful ping returns:
json
{
"status": "ok",
"receivedAt": "2024-01-15T14:00:00Z",
"monitorName": "Nightly Database Backup"
}If the monitor is paused, the status is "paused" instead of "ok". Pings are still recorded when the monitor is paused.
Pausing and Resuming
You can pause a heartbeat monitor to temporarily stop monitoring without deleting it. While paused:
- Pings are still accepted and recorded
- The monitor does not transition to Overdue or Missing
- No alerts are triggered
- The dashboard shows a "Paused" badge
When to pause:
- During planned maintenance of the monitored service
- When temporarily disabling a scheduled job
- While debugging a job's ping integration
Resuming a monitor recalculates the next expected deadline from the current time (not from the last ping), preventing an immediate Overdue state after resuming.
Via Dashboard
On the heartbeat monitor list or detail page, click Pause or Resume.
Via API
http
# Pause
POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>
# Resume
POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>Alerts
Heartbeat monitors integrate with Pingward's issue and alert system. When a monitor misses its expected ping, the system creates an issue and routes it through your configured alert integrations.
When Alerts Trigger
| Transition | Severity | Description |
|---|---|---|
| Healthy to Overdue | Warning | Job is late — may still arrive |
| Overdue to Missing | Critical | Job has failed — requires attention |
| Any to Healthy | Recovery | Job recovered — informational |
Issue Integration
Missed heartbeats create issues in the same system as HTTP test failures. This means:
- Issues appear on the Issues page alongside test failures
- Routing rules apply — route heartbeat alerts to specific Slack channels, email groups, or PagerDuty services
- Maintenance windows apply — suppress heartbeat alerts during planned downtime using tag-based scoping
- Escalation policies apply — escalate unresolved heartbeat issues through your on-call rotation
Event Log
Every state transition is recorded in the monitor's event log, accessible from the monitor detail page. Event types include:
| Event Type | Description |
|---|---|
| Created | Monitor was created |
| StatusChanged | Status transitioned (e.g., Waiting to Healthy, Healthy to Overdue) |
| Paused | Monitoring was paused |
| Resumed | Monitoring was resumed |
Each event records the previous status, new status, timestamp, and a human-readable description.
Best Practices
Naming Conventions
Use descriptive names that identify both the job and its environment:
Good examples:
- "Nightly Database Backup"
- "Order Processing Queue Worker"
- "ETL Pipeline - Customer Data Sync"
- "CI/CD Deploy Pipeline (Production)"
Bad examples:
- "Monitor 1"
- "Cron"
- "Heartbeat"
- "Test"
Grace Period Sizing
Choose a grace period that accounts for normal variation without masking real failures:
| Job type | Recommended grace period |
|---|---|
| Sub-minute health checks | 1-2 minutes |
| 5-minute queue workers | 2-5 minutes |
| 15-minute sync tasks | 5 minutes |
| Hourly batch jobs | 10-15 minutes |
| Jobs with highly variable duration | 15-30 minutes |
Rule of thumb: Set the grace period to 10-25% of the expected interval, with a minimum of 1-2 minutes.
Only Ping on Success
Send the heartbeat ping only when the job completes successfully. If your job fails, the absence of a ping is the signal:
bash
# Correct: ping only on success
/usr/local/bin/backup.sh && curl -X POST https://your-pingward-instance/ping/...
# Wrong: ping regardless of outcome
/usr/local/bin/backup.sh; curl -X POST https://your-pingward-instance/ping/...Using && ensures the ping is sent only if the preceding command exits with code 0. Using ; would send the ping even if the job failed, defeating the purpose of heartbeat monitoring.
Use Tags to Organize
Tag your monitors consistently to enable effective routing and maintenance:
# By environment
production, staging, development
# By team
backend-team, data-team, platform-team
# By criticality
critical, high, low
# By service
payment-service, user-service, analyticsMonitor Critical Jobs First
Prioritize heartbeat monitors for jobs where failure has the highest impact:
- Data backups — undetected backup failures are catastrophic
- Financial processing — payment batch jobs, reconciliation
- Compliance jobs — regulatory reporting, audit log exports
- Queue workers — stuck queues cause cascading failures
- Data pipelines — downstream systems depend on fresh data
Troubleshooting
Monitor Stuck in "Waiting"
Symptoms: Monitor was created but never transitions to Healthy.
Possible causes:
- Job has not run yet — ping has never been sent
- Ping URL is incorrect — verify the URL in the monitor detail page
- Network issue — job cannot reach the Pingward API
- Firewall blocking — outbound HTTP from the job host is blocked
Resolution:
- Copy the ping URL from the monitor detail page
- Test manually:
curl -v -X POST <ping-url> - Verify the job is configured to call the correct URL
- Check that the job host can reach Pingward (DNS, firewall, proxy)
False Overdue Alerts
Symptoms: Monitor goes Overdue even though the job is running.
Possible causes:
- Expected interval is too short for the job's actual schedule
- Grace period is too small for the job's execution time variance
- Job is pinging before completion (ping sent at start instead of end)
- Clock skew between job host and Pingward
Resolution:
- Review the job's actual execution schedule and adjust the interval
- Increase the grace period to account for normal variance
- Move the ping call to the very end of the job, after all work is done
- Verify the job host's system clock is synchronized (NTP)
Ping Returns 404
Symptoms: curl or HTTP call to the ping URL returns 404 Not Found.
Possible causes:
- Ping key is incorrect or truncated
- Monitor was deleted
- Wrong base URL
Resolution:
- Copy the ping URL directly from the dashboard (use the "Copy" button)
- Verify the monitor exists in the Heartbeat Monitors list
- Check that the base URL matches your Pingward instance
Monitor Not Alerting
Symptoms: Monitor shows Overdue or Missing but no alerts are received.
Possible causes:
- No alert integrations configured
- Routing rules exclude heartbeat monitors
- Monitor is paused
- Active maintenance window suppressing alerts
Resolution:
- Check Integrations page for configured alert channels
- Review Routing Rules to ensure they include heartbeat monitor tags
- Verify the monitor is not paused (check the status badge)
- Check Maintenance Windows for active windows affecting this monitor's tags
API Reference
Create Heartbeat Monitor
http
POST /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Content-Type: application/json
{
"name": "string (required, 1-255 characters)",
"expectedIntervalMinutes": "integer (1-1440, default: 5)",
"gracePeriodMinutes": "integer (0-60, default: 5)",
"tags": "string (optional, comma-separated)"
}List Heartbeat Monitors
http
GET /api/heartbeat-monitors
Authorization: Bearer <your-jwt-token>
Query parameters:
?status=Waiting | Healthy | Overdue | Missing
?search=<name search>
?limit=50&offset=0When limit or offset is provided, the response is paginated:
json
{
"items": [...],
"total": 42,
"limit": 50,
"offset": 0
}Without pagination parameters, the response is a flat array of monitors.
Get Heartbeat Monitor
http
GET /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>Update Heartbeat Monitor
http
PUT /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>
Content-Type: application/json
{
"name": "Updated name",
"expectedIntervalMinutes": 30,
"gracePeriodMinutes": 10,
"tags": "updated, tags"
}All fields are optional — only provided fields are updated. If the expected interval changes and the monitor has been pinged, the next expected deadline is recalculated.
Delete Heartbeat Monitor
http
DELETE /api/heartbeat-monitors/{id}
Authorization: Bearer <your-jwt-token>Returns 204 No Content on success.
Pause / Resume
http
POST /api/heartbeat-monitors/{id}/pause
Authorization: Bearer <your-jwt-token>
POST /api/heartbeat-monitors/{id}/resume
Authorization: Bearer <your-jwt-token>Get Ping History
http
GET /api/heartbeat-monitors/{id}/pings
Authorization: Bearer <your-jwt-token>
Query parameters:
?limit=50&offset=0Get Event Log
http
GET /api/heartbeat-monitors/{id}/events
Authorization: Bearer <your-jwt-token>
Query parameters:
?limit=50&offset=0Send a Ping (Public)
http
POST /ping/{pingKey}
GET /ping/{pingKey}
Optional body (POST only):
{
"payload": "string (optional context)"
}No authentication required. Rate limited under the "public" rate limit policy.
Related Documentation
- Issue Management - How issues are created and resolved when heartbeats are missed
- Alert Routing - Configure routing rules to direct heartbeat alerts to specific channels
- Maintenance Windows - Suppress heartbeat alerts during planned downtime
- Test Configuration - Set up HTTP tests and tags for organizing monitors