Data Ingestion Engine

Data ingestion & write path

The QuestDB Data Ingestion Engine supports bulk and streaming ingestion. It writes data to a row-based write-ahead log (WAL) and then converts it into a columnar format. In QuestDB Enterprise, the WAL segments are shipped to object storage for replication.

Bulk ingestion

  • CSV ingestion: QuestDB offers a CSV ingestion endpoint via the REST API and the web console. A specialized COPY command uses io_uring on fast drives to speed up ingestion.

Real-time streaming

  • High-frequency writes: The streaming ingestion path handles millions of rows per second with non-blocking I/O.

  • Durability: As seen above, the system writes data to a row-based write-ahead log (WAL) and then converts it into column-based files for efficient reads. At the expense of performance, sync mode can be enabled on commit for extra durability.

  • Concurrent writes: Multiple connections writing to the same table create parallel WAL files that the engine later consolidates into columnar storage.

Contents of the `db` folder, showing multiple pending WAL files
and the binary columnar data.

├── db
│ ├── Table
│ │ │
│ │ ├── Partition 1
│ │ │ ├── _archive
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ ├── column2.k
│ │ │ └── ...
│ │ ├── Partition 2
│ │ │ ├── _archive
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ ├── column2.k
│ │ │ └── ...
│ │ ├── txn_seq
│ │ │ ├── _meta
│ │ │ ├── _txnlog
│ │ │ └── _wal_index.d
│ │ ├── wal1
│ │ │ └── 0
│ │ │ ├── _meta
│ │ │ ├── _event
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ └── ...
│ │ ├── wal2
│ │ │ └── 0
│ │ │ ├── _meta
│ │ │ ├── _event
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ └── ...
│ │ │ └── 1
│ │ │ ├── _meta
│ │ │ ├── _event
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ └── ...
│ │ │
│ │ ├── _meta
│ │ ├── _txn
│ │ └── _cv

Ingestion via ILP protocol

  • Native ILP integration: QuestDB supports the Influx Line Protocol (ILP) for high-speed data ingestion.

  • Extensions to ILP: QuestDB extends ILP to support different timestamp units and the array data type.

  • Binary ILP: Since Float64 is one of the most common data types used when streaming data, QuestDB extends the ILP protocol to support encoding of Float64 numbers. This extension is available by default in all official QuestDB ILP clients and can be enforced by passing version v2 when sending raw ILP data with other clients.

  • Minimal parsing overhead: The ILP parser quickly maps incoming data to internal structures.

  • Parallel ingestion: The ILP path uses off-heap buffers and direct memory management to bypass JVM heap allocation.

  • Protocol versatility: In addition to ILP, QuestDB also supports ingestion via the REST and PostgreSQL wire protocols.

Next Steps