Skip to main content

Undo-based Rewind

OrioleDB provides an undo-based rewind capability that allows a database cluster to be reverted to a consistent previous state. Unlike Point-in-Time Recovery (PITR), which relies on Write-Ahead Log (WAL) replay, OrioleDB rewind utilizes undo logs to roll back changes. This mechanism is generally faster than WAL-based recovery for reverting to recent database states, because it directly reverses recent changes using the undo chain.

For orioledb tables, rewind uses the engine's native undo logs. For standard PostgreSQL heap tables, rewind functionality is supported by delaying vacuuming; older tuple versions are retained in the heap until they fall outside the configured rewind retention window.

Experimental Feature

This feature is experimental and imposes a significant performance penalty. It is disabled by default and should be used with caution.

Configuration

The following parameters control the rewind subsystem. These must be set in postgresql.conf or via ALTER SYSTEM.

ParameterTypeDefaultRangeDescription
orioledb.enable_rewindbooleanoffon/offEnables the collection of rewind data and starts the Rewind Worker.
orioledb.rewind_max_timeinteger36001 - 86400Maximum age (in seconds) a transaction record is retained for rewind.
orioledb.rewind_max_transactionsinteger1000001 - INT_MAXMaximum number of transactions to retain in the rewind queue.
orioledb.rewind_buffersinteger10246 - INT_MAXNumber of shared memory buffers for rewind metadata.
note

Enabling rewind increases the number of background processes. Ensure that max_worker_processes is configured with sufficient overhead to accommodate the Rewind Worker.

The Rewind Worker

When orioledb.enable_rewind is set to on, OrioleDB launches a background Rewind Worker. This process is responsible for managing the lifecycle of transaction history and undo logs.

Key Responsibilities

  • The worker monitors the rewind queue. Once a transaction is older than the retention threshold, the worker marks the item as completed, allowing the system to safely delete old undo files and vacuum dead heap tuples.
  • It maintains the "Rewind Horizon" — the furthest point in time to which the database can be safely reverted.
  • The worker continues to process the queue at a regular interval even during periods of database inactivity to prevent storage bloat.
Rewind Worker

While the Rewind Worker manages its "to-do list" in a fixed-size circular buffer, its failure to progress has two major side effects on system health:

  • If the worker lags, OrioleDB cannot reclaim space in the physical Undo Logs. These logs will spill from memory to the orioledb_undo directory on disk, causing it to grow indefinitely until the worker catches up.

  • The worker is responsible for advancing the global xmin horizon. If the worker is stuck on an old transaction, PostgreSQL and OrioleDB will consider all subsequent row versions as "potentially needed." This prevents VACUUM from removing dead tuples, leading to significant table bloat and degraded query performance.

Rewind Functions

orioledb_rewind_by_time(seconds int4[, attempt_restart bool])

Rewinds the cluster state by the specified number of seconds from the current time.

  • Parameters:
    • seconds: The number of seconds to rewind the cluster state.
    • attempt_restart: If true, the database system will automatically restart after the rewind. If false, the database will only be shut down. Default is false.
  • Examples:
    • SELECT orioledb_rewind_by_time(600); — Rewinds by 10 minutes and shuts down the database without restarting.
    • SELECT orioledb_rewind_by_time(600, true); — Rewinds by 10 minutes and restarts the database.

orioledb_rewind_to_timestamp(target_time timestamptz[, attempt_restart bool])

Rewinds the cluster to a specific point in time.

  • Parameters:
    • target_time: The exact timestamp to which the cluster state will be rewound.
    • attempt_restart: If true, the database system will automatically restart after the rewind. If false, the database will only be shut down. Default is false.
  • Examples:
    • SELECT orioledb_rewind_to_timestamp('2025-01-01 12:00:00 UTC', true); — Rewinds to January 1, 2025, at 12:00:00 UTC, and restarts the database.
    • SELECT orioledb_rewind_to_timestamp('2025-01-01 12:00:00 UTC'); — Rewinds to the specified timestamp and shuts down the database without restarting.

orioledb_rewind_to_transaction(xid int4, oxid int8[, attempt_restart bool])

Rewinds the cluster to a state before a specific transaction pair identified by the PostgreSQL Transaction ID (xid) and the OrioleDB Transaction ID (oxid).

  • Parameters:
    • xid: The PostgreSQL Transaction ID.
    • oxid: The OrioleDB Transaction ID.
    • attempt_restart: If true, the database system will automatically restart after the rewind. If false, the database will only be shut down. Default is false.
  • Examples:
    • SELECT orioledb_rewind_to_transaction(1750, 555, true); — Rewinds the cluster state to just before transaction pair 1750/555 and restarts the database.
    • SELECT orioledb_rewind_to_transaction(1750, 555); — Rewinds the cluster state to just before the specified transaction pair and shuts down the database without restarting.

Examining the Rewind Horizon

To check the available rewind range, use:

  • orioledb_get_current_oxid(): Returns the current OrioleDB transaction's ID. It will assign a new one if the current transaction does not have one already.
  • orioledb_get_complete_xid(): Returns the oldest PostgreSQL xid available for rewind.
  • orioledb_get_complete_oxid(): Returns the oldest OrioleDB oxid available for rewind.

Operational Workflow

When a rewind function is invoked, the system executes the following steps:

  1. Validates that the requested target is within the retention window (rewind_max_time) and that orioledb.enable_rewind is active.
  2. Signals the Rewind Worker to stop adding new transactions to the buffer.
  3. Signals all other active backends to terminate. The process waits up to 100 seconds for backends to exit.
  4. Reverts the data pages to the requested state.
  5. Once the rewind is complete, depending on the value of attempt_restart argument, the function either shuts down the database or attempts to restart the PostgreSQL instance to finalize the state change.
Restart Reliability

The automatic restart is a single, best-effort attempt. If the restart fails (e.g., due to configuration errors), manual intervention is required to bring the server back online. Ensure you have system-level access to the server to manually start the service, as the SQL connection will be terminated immediately upon completion of the rewind.

Examples

To rewind the database to a specific transaction state, you must first record the transaction identifiers at your desired "recovery point."

  1. Identify the recovery point:
-- Record these values
SELECT pg_current_xact_id(), orioledb_get_current_oxid();

pg_current_xact_id | orioledb_get_current_oxid
--------------------+---------------------------
1750 | 555
  1. Perform modifications:
-- Accidental data loss or undesired changes occur here
DROP TABLE important_data;
  1. Perform the rewind:
-- Revert to the IDs recorded in step 1
SELECT orioledb_rewind_to_transaction(1750, 555);

The server will log "Rewind complete" and shut down.

Data consistency

For applications requiring strong consistency guarantees, it is recommended to explicitly acquire locks on relevant tables within the transaction intended as the rewind target. (Step 1 in the example above)

Limitations and Caveats

  • The rewind buffer is stored in shared memory and is not persistent across restarts. You cannot rewind to a point in time prior to the current cluster start time.
  • Rewind is destructive to all data modifications occurring after the target point. It is recommended to take a backup before initiating a rewind.
  • Because rewind requires retaining old versions in heap tables, standard vacuuming is inhibited for data within the rewind window. High write volume to heap tables may lead to significant bloat.
  • Rewind is currently incompatible with physical replication. Standby servers do not reflect the rewind operation and will become inconsistent with the primary.
  • If the system crashes or is interrupted during the rewind phase, the database may be left in an inconsistent state.