Skip to content

Internals#

This section provides a high-level overview of how Concourse stores and organizes data internally. Understanding these details is not required for using Concourse, but can help with capacity planning, performance tuning, and troubleshooting.

Storage Architecture#

Concourse uses a two-tier storage architecture: a write-optimized Buffer and a read-optimized Database. Data flows from the Buffer to the Database through an automatic background process called transport.

1
2
3
4
5
Write ──> Buffer (append-only, fast writes)
              │
              ├── Transport (background)
              ▼
          Database (indexed, fast reads)

Buffer#

The Buffer is an append-only log that accepts writes in constant time. Every write — whether an add, remove, or any other modification — is appended to the Buffer as a revision and immediately acknowledged to the client.

The Buffer is organized into fixed-size pages (default 8KB, configurable via buffer_page_size). Pages are memory-mapped for durability, so data survives server restarts even before it reaches the Database.

Database#

The Database stores data in immutable Segments that are fully indexed for fast reads. Each Segment contains three types of indexes:

  • Table: Maps (record, key) to a set of values. This is the primary storage for record data.
  • Index: An inverted index mapping (key, value) to a set of records. This powers query evaluation.
  • Corpus: A full-text search index mapping (key, term) to a set of records. This powers search and CONTAINS queries.

Transport#

The transporter automatically moves data from the Buffer to the Database in the background. Concourse supports two transport strategies:

  • Batch (default): Processes large batches per pass without blocking operations until merge time. Provides higher throughput.
  • Streaming: Processes smaller batches more frequently. Provides consistent throughput but may suffer under high concurrency.

See Configuration for transport settings.

Automatic Indexing#

When data is transported to the Database, it is automatically indexed three ways:

  1. Primary index (Table): Enables fast lookups by record ID and key.
  2. Secondary index (Index): Enables queries like find "age > 30" by providing value-to-record mappings.
  3. Search index (Corpus): Breaks string values into substrings and indexes them for full-text search.

This triple indexing happens transparently. You never need to create, manage, or think about indexes. Every value is indexed for equality, range, and search queries the moment it reaches the Database.

Version Control#

Concourse never overwrites or deletes data. Every change is recorded as a revision — an immutable record of what happened:

1
Revision: [action, key, value, record, timestamp]

Where action is either ADD (value was stored) or REMOVE (value was unstored).

To determine the current state of a field, Concourse replays all revisions for that field in order and computes the net result. This append-only design is what enables time travel — because every historical state is preserved, you can query any point in the past.

Atomic Commit Timestamps#

When multiple writes are committed as part of an atomic operation or transaction, all revisions share the same timestamp. This ensures that historical reads see atomic operations as a single point-in-time event — either all changes are visible at a given timestamp or none are.

Segments#

The Database organizes data into immutable Segments using the v3 storage format (introduced in version 0.11). Each Segment is stored as a single .seg file containing all three index types (Table, Index, Corpus) plus metadata. The v3 format consolidates all data for a segment into a single file, replacing the older v2 format that used multiple .blk files per view.

Segment Lifecycle#

  1. Data accumulates in the Buffer.
  2. The transporter batches the data and creates a new Segment.
  3. The Segment is written to disk as an immutable .seg file.
  4. The Segment’s indexes are loaded into memory for serving reads.

Caching#

Concourse caches frequently accessed index records in memory using a memory-aware eviction policy. The cache automatically adapts to available heap space, evicting entries when memory pressure increases.

Search indexes are not cached by default due to their larger size. Enable enable_search_cache if your workload is search-heavy and you have sufficient memory.

Compaction#

When enable_compaction is set to true, Concourse runs a background compaction process that merges Segments to:

  • Eliminate redundant revisions: If a value was added and then removed, both revisions can be dropped.
  • Reduce file count: Fewer, larger Segments mean fewer files to open and fewer index records to search.
  • Improve read performance: Consolidated Segments have better data locality.

Compaction runs continuously in the background without blocking reads or writes.

Record Inventory#

Concourse maintains a compact, in-memory inventory of all record IDs that have ever been created. This allows inventory() and holds() to respond instantly without scanning storage.

Concurrency#

Just-in-Time Locking#

Concourse uses just-in-time (JIT) locking for transactions. Instead of acquiring all locks upfront (which would reduce concurrency), locks are acquired as each operation executes:

  • Token locks: Protect specific (key, record) fields.
  • Range locks: Protect range queries from seeing inconsistent data during concurrent writes.
  • Shared locks: Coordinate record-level operations.

Conflict Detection#

At commit time, Concourse checks for conflicts between concurrent transactions. If two transactions modified the same data, one succeeds and the other fails. The failed transaction must be retried by the application.

Data Directories#

Concourse stores data in two configurable directories:

Directory Default Contents
buffer_directory ~/concourse/buffer Buffer pages, inventory, transaction backups
database_directory ~/concourse/db Segment files (.seg)

For optimal performance, place these directories on separate physical disks.