Component Deep Dive: src/metadata_store.rs
The metadata store is the in-memory catalog that maps logical column ranges to physical page locations and their MVCC (multi-version concurrency control) history. It is the authoritative guide for locating and versioning pages during reads and writes.
Source Map
src/metadata_store.rs
47 #[derive(Clone)] pub struct PageMetadata { id, disk_path, offset }
53 pub struct MVCCKeeperEntry { page_id, locked_by, commit_time }
59 pub struct TableMetaStoreEntry { start_idx, end_idx, page_metas }
65 pub struct RangeScanMetaResponse { page_metas: Vec<Arc<PageMetadata>> }
102 pub struct TableMetaStore { col_data, page_data }
145 impl TableMetaStore { pub fn new() -> Self { … } }
168 pub fn get_latest_page_meta(&self, …)
184 fn add_new_page_meta(&mut self, …)
192 fn add_new_page_to_col(&mut self, …)
214 pub fn get_ranged_pages_meta(&self, …)
Data Structures
High-Level Layout
┌────────────────────────────────────────────────────────────────────┐
│ TableMetaStore │
│ │
│ col_data : HashMap<Column, Arc<RwLock<Vec<TableMetaStoreEntry>>>> │
│ page_data: HashMap<PageId, Arc<PageMetadata>> │
└────────────────────────────────────────────────────────────────────┘
Column Catalog (col_data)
Column "temperature"
│
▼
Arc<RwLock<Vec<TableMetaStoreEntry>>>
│
├─ Entry[0]: covers rows [0, 1024)
│ page_metas:
│ ┌─────────────────────────────────────────┐
│ │ MVCCKeeperEntry │
│ │ page_id = "pA" │
│ │ commit_time= 1692200000000 │
│ │ locked_by = 0 (placeholder) │
│ └─────────────────────────────────────────┘
│ ┌─────────────────────────────────────────┐
│ │ MVCCKeeperEntry │
│ │ page_id = "pB" │
│ │ commit_time= 1692200100000 │
│ └─────────────────────────────────────────┘
│
└─ Entry[1]: covers rows [1024, 2048)
page_metas: [ MVCCKeeperEntry(page_id="pC", …) ]
- Each
TableMetaStoreEntryspans a contiguous[start_idx, end_idx)range. page_metasholds MVCC versions sorted by commit_time (newest last).Arc<RwLock<_>>allows many readers to examine range metadata concurrently while enabling writers to append new versions.
Page Catalog (page_data)
page_data
"pA" -> Arc<PageMetadata { id="pA", disk_path="/data/t0.bin", offset=4096 }>
"pB" -> Arc<PageMetadata { id="pB", disk_path="/data/t0.bin", offset=8192 }>
"pC" -> Arc<PageMetadata { id="pC", disk_path="/data/t1.bin", offset=0 }>
PageMetadata owns the disk coordinates for each page. By centralizing the actual metadata objects here, col_data can store only page IDs (cheap clones) while page_data maintains shared ownership via Arc.
Core Operations
Initialization
TableMetaStore::new()
│
├─ col_data := {}
└─ page_data := {}
Registering a New Page
fn add_new_page_to_col(col, disk_path, offset)
│
├─ new_page_id := add_new_page_meta(disk_path, offset)
│ └─ PageMetadata::new(disk_path, offset)
│ └─ page_data[page_id] = Arc<PageMetadata>
│
├─ ensure col_data[col] exists (Arc<RwLock<Vec<_>>>)
│
└─ write-lock Vec<TableMetaStoreEntry>
├─ if empty → push entry with range [0,1) and MVCCKeeperEntry(page_id, commit_time=now)
└─ else → extend last entry.end_idx += 1
last_entry.page_metas.push(MVCCKeeperEntry(page_id, commit_time=now))
Note: Page IDs are currently hard-coded ("1111111"). Real ID generation is a documented TODO.
Fetch Latest Page for a Column
get_latest_page_meta(column)
│
├─ read-lock Arc<RwLock<Vec<TableMetaStoreEntry>>>
├─ take last TableMetaStoreEntry
├─ take its last MVCCKeeperEntry (most recent commit)
└─ look up page_data[page_id] → Option<&Arc<PageMetadata>>
The function returns a borrowed Arc<PageMetadata>, allowing the caller to clone it without prolonging the read lock.
Range Scan Metadata (get_ranged_pages_meta)
Inputs: column, l_bound, r_bound, commit_time_upper_bound
1) Acquire read lock on column vector.
2) Binary-search first range whose end_idx > l_bound.
3) Iterate forward until start_idx >= r_bound:
For each TableMetaStoreEntry:
- Binary search MVCC versions to find the newest commit_time ≤ upper bound.
- Collect page_id clones.
4) Drop read lock.
5) Map collected page_ids -> Arc<PageMetadata> via page_data.
6) Return RangeScanMetaResponse { page_metas: Vec<Arc<PageMetadata>> }.
ASCII Flow Diagram
Range request: column="temperature", [1500, 2600), commit_time ≤ T
┌─────────────┐ read-lock ┌────────────────────────────────┐
│ TableMeta… │──────────►│ Vec<TableMetaStoreEntry> │
└─────────────┘ │ [0,1024) → versions:[pA(T=90),│
│ pB(T=110)] │
│ [1024,2048) → versions:[pC(T=150)]│
│ [2048,3072) → versions:[pD(T=200)]│
└────────────────────────────────┘
│ select commits ≤ T
▼
page_ids = ["pC","pD"]
│ drop lock
▼
lookup page_data for coordinates
│
▼
RangeScanMetaResponse { page_metas: [Arc(PageMetadata{pC}), Arc(PageMetadata{pD})] }
Concurrency Characteristics
- Column-level contention: Readers obtain
RwLock::readguards and immediately cloneArc<PageMetadata>handles, minimising lock duration. - MVCC version ordering: Append-only pattern avoids resorting by always pushing newer commits at the end. Binary search assumes commit_times are monotonic.
- Thread-safety TODOs:
MVCCKeeperEntry.locked_byis a plainu8; future work will replace it with an atomic counter for concurrent write coordination.
Future Enhancements
- Real page ID generation (UUIDs or incremental IDs) within
PageMetadata::new. - Splitting and merging ranges when Page boundaries are rebalanced.
- Persisting the metadata store to disk or a WAL to survive restarts.
- Tracking per-page reference counts to coordinate cache eviction with metadata removal.