Compression¶
Grafeo uses type-specific compression for efficient storage.
Compression Strategies¶
| Type | Strategy | Description |
|---|---|---|
| Bool | Bit-packing | 8 bools per byte |
| Int64 | Delta + BitPack | Store differences, pack bits |
| Float64 | None / Gorilla | Raw or XOR-based |
| String | Dictionary | Common values stored once |
Dictionary Encoding¶
Graph properties like labels, types and categorical values repeat heavily. Dictionary encoding stores each unique string once and replaces occurrences with small integer codes, reducing memory use and enabling fast equality comparisons (integer compare instead of string compare).
Original: ["apple", "banana", "apple", "apple", "banana"]
Dictionary: {0: "apple", 1: "banana"}
Encoded: [0, 1, 0, 0, 1]
Delta Encoding¶
For sorted or sequential integers:
Bit-Packing¶
Pack small integers into minimal bits:
Compression Selection¶
Grafeo automatically selects compression based on data characteristics:
- Analyze sample of data
- Estimate compression ratio for each strategy
- Select best strategy per column