PostgreSQL – how to detect and solve Checkpoints Ratio issue

This topic describes how you can detect and solve PostgreSQLcheckpoints ratio issues

What happened?

To detect checkpoints ratio issues, and have an ability to look in some historical metrics of this – you must have some kind of monitoring solution. Today, there is a rich selection of monitoring solutions – you can use any you already have, or install another one. Here we will discuss samples based on Awide management and monitoring solution.

This alert description includes the following terms: WALand Checkpoint. You can click on links to see and familiarize with those terms.

Checkpoint operation forces a transaction log checkpoint, flushing all dirty data pages to disk and writing a special checkpoint record to the WAL file. It’s done to minimize the amount of WAL REDO in the process of crash recovery.

There is two checkpoints types:

Automatic (or scheduled) – It’s a desirable event type
Required – It’s a problematic event type

Often a checkpoint of required type can cause a significant I/O load, so the system measures the ratio of checkpoints.

If 50% of checkpoints in the measured period were required type – system will raise warning alert. If this value exceeds 75% the system will raise a problem alert.

Why did it happen?

Checkpoint tile located at right side of instance overview.