Skip to main content

4 posts tagged with "benchmarks"

View All Tags

OrioleDB fastpath search

· 4 min read
Alexander Korotkov
Creator of OrioleDB

When you optimize the CPU time of a transactional database management system, it comes down to one question: how fast can you read a page without breaking consistency? In this post, we explore how OrioleDB avoids locks, trims memory copies, and — starting with beta12 — even bypasses both copying and tuple deforming altogether for fixed-length types during intra-page search. This means that not only are memory copies skipped, but the overhead of reconstructing tuples is also eliminated. The result: an even faster read path, with no manual tuning required.

Why page copies matter … and why they hurt

Every time a PostgreSQL backend descends an OrioleDB B-tree, it needs a consistent view of the target page. Instead of locking the whole page, OrioleDB keeps a 32-bit state word in the page header.

The low bits represent a change count that increments with every data modification; the high bits hold lightweight flags for "exclusive" and "read-blocked" states. A reader copies the necessary bytes, then re-reads the state, and retries if the counter has changed without using locks, yet achieving perfect consistency.

The following pseudo-code illustrates how to copy a consistent page image using a state variable for synchronization.

def read_page(page, image):
while true:
state = read_state(page)
if state_is_blocked(state):
continue

copy_data(image, page)

newState = read_state(page)
if state_is_blocked(state) or get_change_count(newState) != get_change_count(state):
continue

return

OrioleDB beta12: features and benchmarks

· 5 min read
Alexander Korotkov
Creator of OrioleDB

Since our last public update, OrioleDB has continued to evolve with a series of new releases. These updates refine the core engine, extend functionality, and improve performance across a range of workloads. Together, they move us closer to a beta release and lay the groundwork for broader adoption.

What is OrioleDB?

OrioleDB is a PostgreSQL storage extension that implements a custom Table Access Method as a drop‑in replacement for the default Heap storage engine. It is designed to address scalability bottlenecks in PostgreSQL’s buffer manager and reduces the WAL, enabling better utilization of modern multi-core CPUs and high‑performance storage systems.

By rethinking core components such as MVCC, page caching, and checkpoints, OrioleDB improves throughput and predictability in transactional workloads without altering PostgreSQL’s user-facing behavior.

What’s New in OrioleDB?

Building on this foundation, recent releases have introduced several user-facing enhancements:

  • Support for non‑B-tree index types on OrioleDB tables.
  • Support for rewinding recent changes in the database.
  • Support for tablespaces.
  • fillfactor support for OrioleDB tables and indexes.
  • orioledb_tree_stat() SQL function for space utilization statistics.
  • Support for tables with more than 32 columns.

These additions improve OrioleDB/PostgreSQL compatibility and provide more flexibility for workloads with diverse schema and indexing requirements.

Why PostgreSQL needs a better API for alternative table engines?

· 9 min read
Alexander Korotkov
Creator of OrioleDB

For a long time now, PostgreSQL has had an extensible Index Access Method API (called AM), which has stood the test of time and enabled numerous robust extensions to provide their own index types. For example: rum, pgvector, bloom, zombodb and others. PostgreSQL 12 introduced the Table AM API, promising equivalent flexibility for table access methods.

Despite PostgreSQL's Table AM API being available since version 12 and ongoing criticisms of its built-in storage engine — particularly the MVCC model ([1], [2], [3]) — it remains surprising that no fully featured transactional storage engine has yet emerged purely as an extension.

Since the table AM and index AM APIs are tightly coupled, this is an issue for both implementations.

The features most in demand for alternative PostgreSQL table engines are:

  1. Alternative MVCC implementations, e.g.., UNDO-log-based storages. The motivation to provide this is well-discussed in Uber blog post, Andy Pavlo blog post, and many other sources.
  2. Non-heap-like storages. For example, in index-organized tables, the index is not an optional addition to the table that speeds up requests but a necessary layer the table storage uses internally. Consequently, the table tuple is a part of a complex data structure and can’t be addressed by a fixed-length address like page number and offset number. It requires to be addressed by variable length identifier like index key.

The API extension to provide #2 is more or less understandable. This could be done by replacing ctid with an arbitrary sequence of bytes in all the APIs. However, #1 seems rather complex and requires much clarification.

As an example that illustrates the motivation for the changes to the table and index AM API comes OrioleDB. It is an extension providing a table access method we developed to address many well-known shortcomings of the built-in storage engine. However, OrioleDB is not yet a drop-in extension; it requires several patches to the PostgreSQL Core.

Besides these two things, which will be discussed below, there are numerous needs for further API improvements, like pointer squizzling and alternative WAL-logging, that are outside the scope of this post.

OrioleDB beta7: Benchmarks

· 7 min read
Alexander Korotkov
Creator of OrioleDB
Pavel Borisov
PostgreSQL contributor

OrioleDB is a storage extension for PostgreSQL which uses PostgreSQL's pluggable storage system. Designed as a drop-in replacement for PostgreSQL's existing Heap storage, OrioleDB aims to overcome scalability bottlenecks and fully utilize modern hardware capabilities. By integrating seamlessly with PostgreSQL, it offers improved performance, efficiency, and scalability without sacrificing the robustness and reliability that PostgreSQL is known for.

Today we’re releasing OrioleDB version beta7. This marks a significant step in delivering a high-performance, next-generation storage engine for Postgres users. OrioleDB is designed to extract the full potential of modern hardware, offering superior performance, efficiency, and scalability.

OrioleDB design choices

OrioleDB is built from the ground up to leverage modern hardware, reduce maintenance needs, and enhance distributed capabilities. The key technical decisions forming the foundation of OrioleDB are:

  1. Elimination of Buffer Mapping and Lock-less Page Reading: In OrioleDB, in-memory pages are directly linked to storage pages, eliminating the need for buffer mapping and its associated bottlenecks. Additionally, in-memory page reading is performed without atomic operations, enabling lock-less access. Together, these design choices significantly elevate vertical scalability for PostgreSQL.
  2. MVCC Based on the UNDO Log Concept: OrioleDB employs a Multi-Version Concurrency Control (MVCC) mechanism based on an undo log. Old versions of tuples are evicted into undo logs—forming undo chains—rather than causing bloat in the main storage system. Page-level undo records allow the system to promptly reclaim space occupied by deleted tuples. Combined with page merging, these mechanisms eliminate bloat in most cases. As a result, dedicated vacuuming of tables is unnecessary, removing a common cause of system performance degradation and database outages.
  3. Copy-on-Write Checkpoints and Row-Level WAL: OrioleDB utilizes copy-on-write checkpoints to provide structurally consistent snapshots of data at all times. This approach is friendly to modern SSDs and enables row-level Write-Ahead Logging (WAL). Row-level WAL is easy to parallelize (already implemented), compact, and suitable for active-active multi-master configurations (planned).

Benchmarking OrioleDB vs PostgreSQL Heap

To illustrate the performance characteristics of OrioleDB we used TPC-C benchmarking, a complex test simulating real database workloads that is considered a modern standard in database applications.