Skip to content

Filter-Expression Hybrid Search

Since 0.5.40, BM25 text scoring and vector similarity are callable as ordinary expressions inside WHERE and RETURN clauses. The planner rewrites the matching predicate shapes into dedicated TextScan and VectorScan operators, so a text or vector index is used whenever one applies, and a brute-force per-row fallback kicks in when no index exists.

This is the unified-query companion to hybrid_search(): instead of calling a fusion API, you express the filter directly in GQL and let the planner pick the execution strategy.

When to use each

Need Use
Simple top-K fusion across text + vector hybrid_search()
Text or vector predicate combined with a MATCH pattern Filter expressions (this page)
AND/OR composition with other WHERE predicates Filter expressions
Score column in the result set Filter expressions (put the same call in RETURN)
Works without an index Both: filter expressions fall back to per-row eval

Functions

Function Returns Shape
text_score(n.prop, "query") Float64 (BM25 score, higher = more relevant) Use in WHERE with a threshold, or project in RETURN
text_match(n.prop, "query") Boolean (true if the document matches) Use directly as a WHERE predicate
cosine_similarity(n.vec, $q) Float64 (higher = more similar) WHERE threshold or RETURN projection
euclidean_distance(n.vec, $q) Float64 (lower = more similar) WHERE threshold (use <) or RETURN projection

The same names work in Cypher. SPARQL and SQL/PGQ follow the same shape where supported.

text_score with a threshold

MATCH (doc:Article)
WHERE text_score(doc.body, 'attention mechanisms') > 0.0
RETURN doc.title

With a text index on Article.body, the planner rewrites this into a TextScanOperator in threshold mode, pulling only matching documents from the inverted index. Without an index, the same query falls through to per-row BM25 evaluation (slow but correct).

text_match as a boolean

MATCH (doc:Article)
WHERE text_match(doc.body, 'rust')
RETURN doc.title

text_match is the index-friendly way to ask "does this document match the query at all?" and maps to the same TextScan operator without needing a threshold.

Top-K by score

Pair ORDER BY ... DESC LIMIT k with a score function and the planner recognizes it as top-K, pushing k into the underlying scan:

MATCH (doc:Article)
RETURN doc.title, text_score(doc.body, 'attention mechanisms') AS score
ORDER BY text_score(doc.body, 'attention mechanisms') DESC
LIMIT 10

The same pattern works for vector similarity:

MATCH (doc:Article)
RETURN doc.title
ORDER BY cosine_similarity(doc.embedding, [0.85, 0.15, 0.05]) DESC
LIMIT 5

Vector similarity thresholds

MATCH (doc:Article)
WHERE cosine_similarity(doc.embedding, [0.85, 0.15, 0.05]) > 0.5
RETURN doc.title

With a vector index, this pushes down into a VectorScanOperator. Without one, the planner falls back to brute-force per-row evaluation so the query still returns the correct rows.

Use euclidean_distance(...) < threshold for the distance formulation:

MATCH (doc:Article)
WHERE euclidean_distance(doc.embedding, [0.9, 0.1, 0.0]) < 0.5
RETURN doc.title

Operator direction matters for pushdown

Natural directions push down to an index scan: cosine_similarity(prop, q) > t, euclidean_distance(prop, q) < t, manhattan_distance(prop, q) < t, and text_score(prop, q) > t / text_score(prop, q) >= t. Inverted comparisons (e.g. cosine_similarity < t), dot_product (not currently pushdown-supported), and queries whose vector is not resolvable at plan time (property reference, unresolved parameter) also fall through to brute-force per-row evaluation.

Compound predicates (AND / OR)

Filter expressions compose with other WHERE predicates. AND narrows, OR unions:

MATCH (doc:Article)
WHERE cosine_similarity(doc.embedding, [0.85, 0.15, 0.05]) > 0.3
  AND text_match(doc.body, 'attention mechanisms')
RETURN doc.title
MATCH (doc:Article)
WHERE cosine_similarity(doc.embedding, [0.1, 0.9, 0.0]) > 0.9
   OR text_match(doc.body, 'attention mechanisms')
RETURN doc.title

Combining with graph patterns

Filter expressions run after pattern matching, so you can gate a traversal on similarity:

MATCH (u:User {name: 'Alix'})-[:FOLLOWS]->(friend)-[:WROTE]->(doc:Article)
WHERE cosine_similarity(doc.embedding, [0.85, 0.15, 0.05]) > 0.3
RETURN doc.title

Here the user → friend → article traversal produces candidate rows, and the vector similarity predicate filters them per-row.

Projecting the score

Reusing the same call in WHERE and RETURN does not recompute: the planner keeps the score column from the scan and projects it through.

MATCH (doc:Article)
WHERE text_score(doc.body, 'attention mechanisms') > 0.0
RETURN doc.title, text_score(doc.body, 'attention mechanisms') AS score

If you only need the score (no threshold), put it in RETURN without a WHERE clause. The planner falls back to per-row scoring, returning one row per matched node with a Float64 (or 0.0 for non-matches):

MATCH (doc:Article)
RETURN doc.title, text_score(doc.body, 'rust database') AS score

Graceful degradation without indexes

Missing index Behavior
No text index text_score returns 0.0 per row, text_match returns false per row, query still runs
No vector index cosine_similarity / euclidean_distance evaluate per-row over all candidates

Queries still return correct results in every case, but with an index the planner executes them through dedicated scan operators instead of a full scan.