Skip to content

Search#

Concourse automatically indexes every String value for full-text search. This means you can search for records containing specific words or phrases without any additional configuration or index management.

The search method performs a full-text search against a specific key and returns the IDs of all records that contain a matching value.

1
2
// Java
Set<Long> records = concourse.search("name", "Jeff");
1
2
// CaSH
search "name", "Jeff"

The search query is matched against the indexed text of every String value stored for the given key. Matching is case-insensitive and supports partial word matches (substring matching).

How Indexing Works#

When a String value is written to Concourse, the storage engine automatically breaks it into substrings and indexes each one. This enables efficient substring searches without requiring the query to match the entire value.

For example, writing the value "Jeff Nelson" generates index entries for substrings like "jeff", "jef", "je", "nelson", "nel", etc. A search for "eff" or "nels" would match this value.

Tags Are Not Indexed#

Values stored as Tags are not indexed for full-text search. Use Tags when you want to store string data without the overhead of search indexing (e.g., identifiers, codes, or structured data that will only be queried with exact-match operators).

Search in Queries#

You can embed full-text search directly within CCL queries using the CONTAINS and NOT_CONTAINS operators. This allows you to combine search with other conditions in a single query.

CONTAINS#

The CONTAINS operator finds records where a key’s value matches a search query:

1
name contains "Jeff"
1
2
3
// Java
Set<Long> records = concourse.find(
    "name", Operator.CONTAINS, "Jeff");

NOT_CONTAINS#

The NOT_CONTAINS operator finds records where a key’s value does not match a search query:

1
name not_contains "admin"
1
2
3
// Java
Set<Long> records = concourse.find(
    "name", Operator.NOT_CONTAINS, "admin");

Combining Search with Other Conditions#

Because CONTAINS and NOT_CONTAINS are standard query operators, you can combine them with any other operators in a CCL expression:

1
2
name contains "Jeff" and age > 30
department = "Engineering" and bio contains "distributed"
1
2
3
// Java
Set<Long> records = concourse.find(
    "name contains \"Jeff\" and age > 30");

This eliminates the need to perform a separate search call and then intersect the results with a find call.

Compiled Search Queries#

Concourse internally compiles search queries for optimal performance. When a search query consists of a single token, Concourse uses an optimized algorithm (Boyer-Moore) for direct string matching rather than the general substring index. This compilation happens transparently and requires no configuration.

For repeated searches with the same query pattern, the compiled form is cached, further reducing overhead.

Search Configuration#

Maximum Substring Length#

The max_search_substring_length configuration option controls the maximum length of substrings that are indexed for full-text search. The default is 40 characters.

Reducing this value decreases the storage overhead of search indexes but limits the length of search queries that can match. Increasing it allows longer search queries at the cost of larger indexes.

1
2
# concourse.yaml
max_search_substring_length: 40

Stopwords#

Concourse indexes all words, including common stopwords (e.g., “the”, “is”, “and”). This ensures that searches for phrases containing stopwords return accurate results.

Stopword Policy Change

Prior to version 0.12, Concourse excluded common English stopwords from the search index. Starting in version 0.12, all words are indexed to ensure search accuracy. This change means searches involving stopwords now work correctly, but existing data written before the upgrade retains the old indexing behavior for those values.