Skip to content

Roadmap

Grafeo is a high-performance, embeddable graph database written in Rust. This roadmap shows where the project has been, where it is now and where it's going. Priorities may shift based on community feedback and real-world usage.

For detailed release notes, see the CHANGELOG.


Completed Releases

0.1: Foundation

Established the core graph engine: labeled property graph (LPG) storage with MVCC transactions, WAL persistence and multiple index types (hash, B-tree, trie, adjacency). Shipped the GQL (ISO standard) parser as the primary query language, with experimental support for Cypher, SPARQL, Gremlin and GraphQL. Python bindings via PyO3 from day one.

0.2: Performance

Made the engine competitive on query throughput. Factorized query processing eliminates Cartesian products in multi-hop traversals. Worst-case optimal joins (Leapfrog TrieJoin) handle cyclic patterns efficiently. Lock-free concurrent reads, query plan caching and direct lookup APIs brought large speedups on common access patterns. First graph algorithms (community detection, clustering coefficient, BFS shortest path). Added RDF Ring Index, Block-STM parallel transactions, tiered storage and succinct data structures as opt-in features.

0.3: AI Compatibility

Added first-class vector support: Value::Vector type, HNSW approximate nearest neighbor index, four distance metrics (cosine, euclidean, dot product, manhattan) with SIMD acceleration (AVX2, SSE, NEON). Vector quantization (scalar, binary, product) for memory-constrained deployments. Hybrid graph + vector queries across all supported query languages. Serializable snapshot isolation for write-heavy workloads.

0.4: Developer Accessibility

Expanded the binding surface: Node.js/TypeScript (napi-rs), Go (C FFI), WebAssembly (wasm-bindgen, 660 KB gzipped), SQL/PGQ (SQL:2023 GRAPH_TABLE). Shipped grafeo-cli with interactive shell, filtered/MMR vector search with incremental indexing. All public API items documented.


Current: 0.5, Beta

Preparing for production readiness.

The beta series focuses on correctness, completeness and real-world durability. Key areas:

Search and retrieval: BM25 text search, hybrid search (BM25 + vector via RRF/weighted fusion), optional in-process ONNX embeddings, MMR for diverse RAG results.

Graph algorithms: CALL procedure interface exposing all 22 algorithms (PageRank, Dijkstra, Louvain, SSSP, etc.) from GQL, Cypher and SQL/PGQ. Algorithms themselves were introduced in 0.2, the query-callable interface is new in 0.5.

Data management: Temporal types (Date, Time, Duration, DateTime), graph type enforcement with schema validation and constraints, LOAD DATA for CSV/JSONL/Parquet, named graph persistence, cross-graph transactions.

Transaction correctness: MVCC dirty-read prevention, DELETE rollback with full undo log, write-write conflict detection, session auto-rollback, savepoints.

Persistence: Single-file .grafeo format with dual-header crash safety, index metadata persistence (snapshot v4), read-only open mode with shared file lock for concurrent reader processes.

Bindings: C#/.NET 8, Dart/Flutter (community contribution), C FFI layer shared by Go, and C#.

Ecosystem: grafeo-server, grafeo-web, grafeo-mcp, grafeo-memory, grafeo-langchain, grafeo-llamaindex.

Delivered in 0.5.30-0.5.32

  • Async storage backend (0.5.30): AsyncStorageBackend trait, AsyncTypedWal with async WAL operations, AsyncLocalBackend filesystem implementation
  • CompactStore (0.5.31): read-only columnar store with per-label tables, double-indexed CSR adjacency, zone-map skip optimization, CompactStoreBuilder API. Integrates via GrafeoDB::with_read_store()
  • compact() method (0.5.32): one-call conversion from live database to CompactStore. Available in Python, Node.js, WASM, C, Rust. ~60x memory reduction, 100x+ traversal speedup
  • Hybrid Logical Clock (0.5.32): monotonic HLC timestamps in CDC events for causal ordering
  • Session CDC (0.5.32): mutations via query sessions (INSERT, SET, DELETE) now generate CDC events, buffered per-transaction
  • Correctness hardening (0.5.32): epoch monotonicity guarantees, concurrent stress tests, write-write conflict detection improvements
  • grafeo-server replication (0.5.32): atomic sync apply (all-or-nothing transaction visibility on replicas), persistent replica epoch tracking (replicas resume from last position after restart), CDC auto-activation on replication primaries, relaxed replica guard for sync endpoints

Delivered in 0.5.33

  • GraphChallenge benchmark suite: k-truss decomposition, parallel triangle counting, subgraph isomorphism (VF2), stochastic block partitioning
  • TSV/MMIO bulk import: import_tsv(), import_mmio(), import_tsv_rdf() for GraphChallenge datasets
  • RDF streaming Turtle: TripleSink-based streaming parser for large RDF datasets
  • RdfGraphStoreAdapter: bridges RdfStore to GraphStore, giving RDF graphs access to all 25+ graph algorithms

Delivered in 0.5.34

  • GQL schema hierarchy (ISO/IEC 39075 Section 4.2.5): CREATE SCHEMA/DROP SCHEMA, SESSION SET SCHEMA, full data isolation
  • Streaming RDF triple sink: TripleSink trait with BatchInsertSink and CountSink
  • Golden fixture tests: snapshot v4, .grafeo file format and WAL frame byte-equality checks
  • Feature matrix CI: per-profile build+test jobs (gql-only, gql+vector, gql+rdf, embedded, browser)

Delivered in 0.5.35

  • Persona-based feature profiles: lpg, rdf, analytics, ai, edge, enterprise replace deployment-target profiles. Old names (embedded, browser, server, full) kept as deprecated aliases, scheduled for removal in 0.7.0
  • Section-based container format: .grafeo files use a section directory with checksummed, independently addressable sections
  • grafeo-storage crate: persistence I/O extracted from grafeo-adapters as a sibling to grafeo-core
  • Block-based LPG/RDF section format (v2): structured layout with string tables, packed arrays, columnar property blocks, per-block CRC
  • Arrow IPC export: zero-copy export for DuckDB, Polars, pandas, DataFusion interop
  • GEXF + GraphML export: graph interchange for Gephi, Cytoscape, NetworkX, yEd, igraph
  • Incremental backup: backup_full(), backup_incremental(), restore_to_epoch() with CLI commands
  • CDC retention and eviction: CdcRetentionConfig with epoch-based and count-based limits
  • Python named graph management: create_graph(), drop_graph(), list_graphs(), set_graph(), set_schema()
  • Python per-transaction CDC override: begin_transaction_with_cdc(True|False)
  • Breaking changes: QueryResult.rows now private (use rows()/into_rows()), 95 public enums are #[non_exhaustive], storage format changed (databases from 0.5.34 or earlier must be re-created)

Delivered in 0.5.36

  • Role-based access control (Auth M1): Identity, Role (Admin, ReadWrite, ReadOnly), StatementKind for session-level permission scoping
  • Per-graph access grants: Grant type restricts identity access to specific named graphs
  • Graph projections: CREATE PROJECTION, DROP PROJECTION, SHOW PROJECTIONS in GQL, plus API in Python, Node.js, WASM and C
  • CSV/JSON Lines import: CLI grafeo import csv/grafeo import jsonl, Python import_csv()/import_jsonl(), Node.js importCsv()/importJsonl()
  • Gremlin repeat().times()/.emit(): fixed-depth and all-depths traversal
  • Unified aggregate accumulator: push-based aggregate operator gains all 30+ aggregate functions

Delivered in 0.5.37

  • SPARQL compliance pass: 18 spec gaps closed: CONSTRUCT, BIND, OPTIONAL, MINUS, UNION, FILTER, EXISTS/NOT EXISTS, named graph CRUD, SPARQL UPDATE. Composite indexes (SP, PO, OS) for O(1) multi-bound lookups. 109 new W3C tests
  • Ring Index planner: wavelet-tree compact triple index wired into the SPARQL planner with Leapfrog WCOJ for multi-way star joins and hash join fallback for LANG/DATATYPE columns. Ring Index persistence via bincode serialization to .grafeo container
  • SHACL validation: W3C Shapes Constraint Language with all 28 Core constraint types, SHACL-SPARQL (sh:sparql), 7 property path types with cycle detection, ValidationReport with to_triples() RDF materialization. Python: db.validate_shacl("shapes_graph")
  • EXPLAIN ANALYZE: physical plan tree without executing (EXPLAIN), or profiled execution with per-operator timing (EXPLAIN ANALYZE). Python explain_sparql() binding
  • Arrow bulk export: nodes_to_arrow()/edges_to_arrow() (pyarrow Table), nodes_to_polars()/edges_to_polars() (Polars DataFrame), nodes_to_pandas()/edges_to_pandas() (pandas DataFrame via Arrow). ~10-100x faster than per-element nodes_df()/edges_df() at scale
  • RDF query optimizer: per-predicate cardinality estimates, cached statistics, cost-based join reordering
  • Dictionary encoding: TermDictionary maps RDF terms to u32 IDs with lazy construction, DictResolveOperator resolves at result boundaries
  • COUNT(*) fast paths: O(1) for unbound scans via store.len(), O(log sigma) for bound patterns via Ring Index

Delivered in 0.5.38

  • Incremental backup fix (#267): backup_incremental always failed after a full backup because the WAL cursor was not rotated, making post-backup writes invisible
  • Edge variable resolution (#268): multi-hop queries returned edge variables as raw IDs instead of maps; centralized edge column registration in the query planner

Planned Releases

Version Focus
0.5.39 Performance and parallelism: Block-STM batch execution, WASM size optimization, cache-line aligned chunks, observability (metrics, spans, telemetry), graph analytics in Python/Node.js/WASM bindings
0.5.40 API stability and developer experience: stable/beta/experimental tier annotations, cursor-based streaming results, memory introspection, contributor documentation
0.5.41 Improved temporal queries: temporal indexes, GQL temporal syntax extensions, async storage server integration
0.5.42 Offline-first sync protocol, cross-language query translation, final 0.6.x blocker audit
0.5.43 Flutter/mobile builds (Android NDK, iOS xcframework), final feature profile audit and doc sweep

Next: 0.6, Release Candidate

No new major features. Bug fixes, community integrations, and quality of life.

The scope is intentionally narrow:

  • Bug fixes from real-world 0.⅚ usage
  • Performance tuning informed by actual workloads, not synthetic benchmarks
  • API ergonomics and documentation polish
  • Binary size and compile time optimization
  • C FFI parity refactor: expand grafeo-c to match Python/Node.js API surface, update downstream bindings

The goal is confidence: if something works in 0.6, it works in 1.0.


0.7: Pluggable Auth + Profile Cleanup (Auth M2)

  • Pluggable auth providers: AuthProvider trait with JWT and OIDC backends
  • Audit logging: structured request-level audit trail per identity
  • Deprecated feature profile removal: old names (embedded, browser, server, full) removed

1.0: Stable

Semantic versioning commitment. Public API frozen. No breaking changes without a major version bump.

  • Enterprise auth (Auth M3): row-level security, property masking, GRANT/REVOKE, LDAP/SAML, per-tenant encryption, compliance reporting

Future Considerations

Not scheduled, but on the radar:

  • Distributed/clustered deployment
  • Additional language bindings (Java/Kotlin, Swift)
  • Cloud-native integrations

Contributing

Interested in contributing? Check the GitHub Issues, join the Discussions or hop into the Discord server.


Last updated: April 2026