Component Deep Dive: src/compressor.rs

The compressor is a stateless helper that bridges the uncompressed Page representation and its compressed byte-packed form. It serializes pages with bincode, compresses/decompresses using lz4_flex, and returns cache-friendly wrappers.

Source Highlights

src/compressor.rs
 9  pub struct Compressor {}
12  impl Compressor {
13      pub fn new() -> Self { … }
16      pub fn compress(&self, Arc<PageCacheEntryUncompressed>) -> PageCacheEntryCompressed
23      pub fn decompress(&self, Arc<PageCacheEntryCompressed>) -> PageCacheEntryUncompressed
}

Pipeline Overview

Arc<PageCacheEntryUncompressed>
       │
       │ 1) Extract Page struct
       ▼
bincode::serialize(Page) → Vec<u8>
       │
       │ 2) Compress with lz4_flex::compress_prepend_size
       ▼
Vec<u8> (compressed blob)
       │
       └─ wrapped as PageCacheEntryCompressed { page: Vec<u8> }

Decompression Path

Arc<PageCacheEntryCompressed>
       │
       │ 1) Extract Vec<u8> (LZ4 frame with size prefix)
       ▼
lz4_flex::decompress_size_prepended(Vec<u8>)
       │
       │ 2) Deserialize via bincode -> Page
       ▼
PageCacheEntryUncompressed { page: Page }

ASCII Visualization

                 ┌───────────────────────┐
                 │  Page (struct)        │
                 │  - page_metadata      │
                 │  - entries: Vec<Entry>│
                 └──────────┬────────────┘
                            │
        bincode serialize   │   lz4 compress (size prepended)
 Arc<PageCacheEntryUncompressed> ────────────────────────────► Vec<u8>
                            │
                            ▼
                 PageCacheEntryCompressed

Design Choices

  • bincode Serialization
    Lightweight binary serializer with low overhead for Serde-compatible types. Keeps serialization/deserialization cost minimal when pages churn between caches.

  • lz4_flex Compression
    compress_prepend_size and decompress_size_prepended embed the uncompressed length at the front of the blob. The IO layer can then recover the exact page size without decompressing metadata.

  • Arc-Based API
    Accepting Arc<PageCacheEntry*> avoids cloning large vectors. Compressor accesses the underlying data via Arc::as_ref, maintaining shared ownership semantics consistent with the caches.

Error Handling

Currently, serialization/deserialization calls .unwrap(). This reflects a prototype assumption that the in-memory structures are always valid. Production code should surface Result and handle corrupt blobs gracefully (especially when reading from disk).

Integration Points

  • PageHandler::decompress_from_cpc calls Compressor::decompress to materialize pages into UPC.
  • Future UPC eviction will call Compressor::compress before inserting into CPC.
  • WAL and snapshot routines can reuse the same API to convert between formats without duplicating logic.

Extensibility Ideas

  • Support alternative compression strategies (e.g., ZSTD) via feature flags.
  • Embed checksum or schema version in the compressed envelope for forward compatibility.
  • Track compression ratios to inform caching heuristics or page splitting.