← Build logs
BackendJune 5, 2026

Resolving PostgreSQL Scheduler Hangs: Linking pg_locks and Connection Management

PostgreSQL Scheduler Hangs and Errors, Solved: Interfacing with pg_locks and Connection Management

If you've been spending time debugging a scheduler that suddenly stops or throws errors, this post might help. I encountered an issue where a previously working scheduler would permanently hang under specific circumstances. Ultimately, the root cause was found in lock management and connection handling.

Attempts and Pitfalls

Initially, to improve the scheduler's stability, I changed the existing advisory lock mechanism to a leader election method using a lease table. I also introduced a singleton pattern to add a self-healing feature, allowing the scheduler to detect and recover from issues on its own.

However, lock-related problems still occurred intermittently, and the scheduler hangs didn't completely disappear. For deeper debugging, I isolated the connections used by the scheduler into a dedicated asyncpg connection pool to dive deeper into lock-related issues.

# Old advisory lock method (example)
import psycopg2

conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("SELECT pg_try_advisory_lock(123);")
locked = cur.fetchone()[0]
if locked:
    # Perform work
    cur.execute("SELECT pg_advisory_unlock(123);")
conn.close()
# Unexpected error message (similar to actual error)
ERROR: deadlock detected
DETAIL: Process 12345 waits for Process 67890 on lock, which is held by Process 12345.
HINT: See server log for statement that blocked by other processes.

During this process, I came up with the idea of interfacing the scheduler's maintainer logic with PostgreSQL's pg_locks view. By monitoring which locks were held by whom in real-time through pg_locks, I attempted to maintain or release locks by applying strong references (to prevent garbage collection) and heartbeats when necessary.

The Cause

In the end, the problem was multifaceted. In addition to the limitations of the existing advisory lock method, the scheduler's maintainer logic was not properly handling lock-related exceptions. Specifically, the scheduler often entered a permanent hang state and couldn't recover when locks were abnormally released or unexpected deadlocks occurred. I also discovered that excessive diagnostic logs, which were unnecessary, were accumulating during debugging, increasing system load.

The Solution

To fundamentally resolve the lock-related issues, I actively utilized the pg_locks view. I modified the maintainer logic to interface with pg_locks to grasp the current lock status in real-time. If a lock was unexpectedly released, the scheduler was adjusted to immediately reacquire it or take appropriate action.

# Interfacing with pg_locks and applying strong references/heartbeats (conceptual code)
import asyncio
import asyncpg

async def monitor_locks(pool):
    while True:
        conn = await pool.acquire()
        try:
            # Query lock information related to the specific scheduler from pg_locks
            locks = await conn.fetch(
                "SELECT pid, granted, mode FROM pg_locks WHERE application_name = 'my_scheduler';"
            )
            for lock in locks:
                if not lock['granted'] and lock['mode'] == 'ExclusiveLock':
                    # If lock is not granted, retry or notify
                    print(f"Scheduler lock not granted for PID {lock['pid']}. Retrying...")
                    # Add lock retry logic here
                    pass
            # Heartbeat logic: Query to indicate the scheduler is alive (e.g., pg_advisory_unlock)
            await conn.execute("SELECT pg_advisory_unlock(456);") # Example lock ID
        except Exception as e:
            print(f"Error monitoring locks: {e}")
        finally:
            await pool.release(conn)
        await asyncio.sleep(10) # Check lock status every 10 seconds

async def run_scheduler():
    # Create a dedicated asyncpg connection pool
    pool = await asyncpg.create_pool(user='user', password='password', database='db', min_size=1, max_size=1)
    # Start the asynchronous task for lock monitoring
    asyncio.create_task(monitor_locks(pool))

    # Scheduler main logic
    while True:
        # ... Perform scheduler tasks ...
        await asyncio.sleep(1)

# asyncio.run(run_scheduler())

Additionally, I separated the database connections used by the scheduler into a dedicated asyncpg connection pool. This reduced connection contention with other tasks and provided clearer logs and status for diagnosing lock-related issues. I also aggressively removed unnecessary detailed diagnostic logs to reduce system load.

Results

  • The scheduler's permanent hangs and error occurrences have been completely resolved.
  • Overall system stability has significantly improved.
  • Intermittent lock-related issues have disappeared, ensuring data consistency.
  • System load has decreased due to the removal of unnecessary logs.

Takeaways — To Avoid the Same Pitfalls

  • [ ] If you encounter lock-related issues in your scheduler or background tasks, actively use PostgreSQL's pg_locks view to monitor the current lock status in real-time.
  • [ ] To prepare for scenarios where locks are abnormally released or deadlocks occur, consider applying strong references (to prevent GC) and heartbeat mechanisms to your lock maintenance or release logic.
  • [ ] For complex asynchronous operations or systems with many database connections, use dedicated asynchronous connection pools to isolate issues and facilitate diagnostics.
  • [ ] Periodically review whether the detailed logs added for debugging are causing system load, and remove any unnecessary logs.