Compression¶
Grafeo uses type-specific compression for efficient storage.
Compression Strategies¶
| Type | Strategy | Description |
|---|---|---|
| Bool | Bit-packing | 8 bools per byte |
| Int64 | Delta + BitPack | Store differences, pack bits |
| Float64 | None / Gorilla | Raw or XOR-based |
| String | Dictionary | Common values stored once |
Dictionary Encoding¶
For string columns with repeated values:
Original: ["apple", "banana", "apple", "apple", "banana"]
Dictionary: {0: "apple", 1: "banana"}
Encoded: [0, 1, 0, 0, 1]
Delta Encoding¶
For sorted or sequential integers:
Bit-Packing¶
Pack small integers into minimal bits:
Compression Selection¶
Grafeo automatically selects compression based on data characteristics:
- Analyze sample of data
- Estimate compression ratio for each strategy
- Select best strategy per column