ContentId and ContentStore
This document describes the content-addressed identity and storage layer introduced by the storage-agnostic commits design. For the full design rationale, see Storage-agnostic commits and sync.
Overview
Fluree's storage-agnostic architecture separates identity (what something is) from location (where its bytes live). Every immutable artifact—commit, transaction payload, index root, index leaf, dictionary blob—is identified by a ContentId (a CIDv1 value) and stored/retrieved via a ContentStore trait.
Identity is a content ID; location is a local configuration detail.
ContentId
ContentId is a CIDv1 (multiformats) value that encodes three things:
- Version: CIDv1
- Multicodec: identifies the kind of the bytes (e.g., Fluree commit, index root)
- Multihash: identifies the hash function + digest (SHA-256)
Multicodec assignments (private-use range)
Fluree uses the multicodec private-use range for type-tagged CIDs:
| Codec value | ContentKind | Description |
|---|---|---|
0x300001 | Commit | Commit payload |
0x300002 | Txn | Original transaction payload |
0x300003 | IndexRoot | Binary index root descriptor |
0x300004 | IndexBranch | Index branch manifest |
0x300005 | IndexLeaf | Index leaf file |
0x300006 | DictBlob | Dictionary artifact |
0x300007 | DefaultContext | Default JSON-LD @context |
String representation
The canonical string form is base32-lower multibase (the familiar bafy… / bafk… prefixes from IPFS/IPLD). This is the form used in JSON APIs, logs, nameservice records, and CLI output.
bafybeigdyr... (commit CID)
bafkreihdwd... (index root CID)
Binary representation
The compact binary form (varint version + varint codec + multihash bytes) is used for:
- On-wire pack streams
- Internal caches and indexes
- Embedded references inside commit payloads
Creating a ContentId
A ContentId is derived by hashing the canonical bytes of an artifact with SHA-256, then wrapping the digest as a CIDv1 with the appropriate multicodec:
use fluree_db_core::content_id::{ContentId, ContentKind};
let bytes: &[u8] = /* canonical commit bytes */;
let cid = ContentId::from_bytes(ContentKind::Commit, bytes);
// String form for JSON/logs
let s = cid.to_string(); // "bafybeig..."
// Parse back
let parsed = ContentId::from_str(&s)?;
assert_eq!(cid, parsed);
ContentId in commit references
Commits reference parents and related artifacts by ContentId only—never by storage addresses:
{
"t": 42,
"previous": "bafybeigdyr...commitParent",
"txn": "bafkreihdwd...txnBlob",
"index": "bafybeigdyr...indexRoot"
}
ContentKind
ContentKind is an enum that maps 1:1 to multicodec values. It serves two purposes:
- Embedded in CIDs: the multicodec tag lets stores, caches, and validators identify what an object is without parsing its bytes.
- Routing: the ContentStore uses
ContentKindto route objects to the appropriate storage tier (commit store vs index store).
pub enum ContentKind {
Commit,
Txn,
IndexRoot,
IndexBranch,
IndexLeaf,
DictBlob,
DefaultContext,
}
Routing by kind (replaces URL parsing)
Previously, storage routing parsed URL path segments (e.g., looking for "/commit/" in an address string). With ContentId, routing is explicit:
Commit+Txn→ commit-tier store(s)IndexRoot+IndexBranch+IndexLeaf+DictBlob→ index-tier store(s)
ContentStore trait
ContentStore provides content-addressed get/put operations keyed by ContentId:
#[async_trait]
pub trait ContentStore: Debug + Send + Sync {
/// Retrieve bytes by content ID
async fn get(&self, id: &ContentId) -> Result<Vec<u8>>;
/// Store bytes, returning the computed ContentId
async fn put(&self, kind: ContentKind, bytes: &[u8]) -> Result<ContentId>;
/// Check whether an object exists
async fn has(&self, id: &ContentId) -> Result<bool>;
}
Relationship to Storage trait
ContentStore is the primary abstraction for immutable object access. The Storage / StorageRead / ContentAddressedWrite traits handle address-routed I/O for the underlying storage backends (filesystem, S3, etc.), while ContentStore provides the content-addressed layer on top.
Implementations
MemoryContentStore: In-memoryHashMap<ContentId, Vec<u8>>for testing.BridgeContentStore: Adapter that wraps aStorageimplementation, mapping ContentIds to physical storage addresses.- Filesystem / S3 / IPFS: Direct implementations that store objects keyed by CID.
Layered composition
ContentStore implementations can be layered:
Local cache (filesystem)
↓ miss
Shared store (S3 / IPFS / shared filesystem)
Reads fall through from cache to shared store. Writes go to both (policy-configurable).
How ContentId flows through the system
Transaction path
- Transactor produces commit bytes
ContentId::from_bytes(ContentKind::Commit, &bytes)computes the CIDcontent_store.put(Commit, &bytes)stores the blob- Nameservice head is updated:
commit_head_id = cid, commit_t = t
Index path
- Indexer builds binary index, producing root descriptor bytes
ContentId::from_bytes(ContentKind::IndexRoot, &root_bytes)computes the CID- All artifacts (branches, leaves, dicts) are stored via
content_store.put() - Nameservice index head is updated:
index_head_id = cid, index_t = t
Query path
- Query engine reads nameservice to get
index_head_id content_store.get(&index_head_id)fetches the index root- Index root references branches/leaves/dicts by their ContentIds
- Each artifact is fetched via
content_store.get()(with caching)
Replication path (clone/pull/push)
- Client fetches remote nameservice heads (ContentIds + watermarks)
- Client sends
have[]/want[]roots to server - Server walks commit chain and (optionally) index graph to compute missing objects
- Missing objects streamed as
(ContentId, bytes)pairs - Client stores objects in local ContentStore and advances local nameservice heads
No address rewriting is needed because commits contain no storage addresses.
Implementation status
ContentIdtype andContentKindenum:fluree-db-core/src/content_id.rsContentStoretrait +MemoryContentStore+ bridge adapter:fluree-db-core/src/storage.rsCommitandCommitRefuseContentIdfor all references (index pointers are tracked exclusively via nameservice, not embedded in commits)- Nameservice records use
head_commit_id/index_head_idas ContentId values IndexRoot(FIR6) references all artifacts by ContentId- Transact and indexer paths use
ContentStorefor all object I/O
Related documentation
- Storage-agnostic commits and sync — full design rationale
- Storage traits — existing storage trait hierarchy
- Index format — binary index format (IndexRoot / FIR6)
- Nameservice schema v2 — nameservice record schema