Namespace allocation and fallback modes
Fluree encodes IRIs as compact SIDs: a (ns_code, local) pair where:
ns_codeis au16namespace code that identifies an IRI prefixlocalis the remaining suffix (bytes) after removing the matched prefix
The database maintains a namespace table (LedgerSnapshot.namespace_codes: ns_code -> prefix string).
That table is embedded in the published index root and is loaded whenever a LedgerSnapshot is opened.
This document describes how Fluree chooses a namespace prefix for an IRI, and how it mitigates datasets that would otherwise allocate an excessive number of distinct namespace prefixes.
Goals
- Keep declared namespaces intact: if a dataset declares
@prefix foo: <...>, we want IRIs in that namespace to use that exact prefix, not a derived/split prefix. - Stable behavior across writes: after importing an “outlier” dataset, subsequent transactions should continue using the same fallback rules for previously unseen IRIs (e.g. new hosts), avoiding regression back to finer-grained splitting.
- Contain namespace explosion: avoid allocating one namespace code per highly-specific leaf
(e.g. splitting on the last
/for IRIs whose paths are effectively unique).
Core rule: declared-prefix trie match wins
Namespace resolution is trie-first:
- Load all known prefixes (predefined defaults + DB namespace table) into a byte-level trie.
- For each IRI, perform a longest-prefix match.
- If a match is found, emit
Sid(ns_code, iri[prefix_len..])and do not run fallback logic.
Only IRIs with no matching prefix fall through to the fallback splitter.
Implementation: fluree-db-transact/src/namespace.rs
NamespaceRegistry::sid_for_iri(transactions, serial paths)SharedNamespaceAllocator::sid_for_iri(parallel bulk import)
Fallback split modes (only for unmatched IRIs)
Fluree uses a small set of fallback “splitters” that derive (prefix, local) for IRIs that do not
match any known prefix.
The active fallback behavior is represented by NsFallbackMode:
LastSlashOrHash(default): split on the last/or#(prefix is inclusive)CoarseHeuristic(outlier mitigation):- http(s): usually
scheme://host/<seg1>/ - special-case: DBLP-style
.../pid/<digits>/buckets may keep 2 segments - non-http(s) with
:but no/or#: split at the 2nd:when present (e.g.urn:isbn:), else the 1st:
- http(s): usually
HostOnly(“fallback to the fallback”):- http(s):
scheme://host/ - non-http(s) with
:but no/or#: split at the 1st: - else: last-slash-or-hash
- http(s):
Implementation: fluree-db-transact/src/namespace.rs
Bulk import: streaming preflight + dynamic mitigation
For large Turtle streaming imports, Fluree attempts to detect “namespace explosion” early without an extra I/O pass:
StreamingTurtleReadersamples bounded byte windows within the first chunk region and counts distinct prefixes underLastSlashOrHash.- If the sample exceeds a budget (
NS_PREFLIGHT_BUDGET, currently 255), the reader publishes a preflight result recommending mitigation. - The import forwarder enables
CoarseHeuristicon the shared allocator before parsing begins (so the earliest allocations are already coarse). - If allocations under
CoarseHeuristicstill grow beyond the u8-ish threshold (>255), the shared allocator switches toHostOnlyso new, unseen hosts do not allocate deeper-than-host namespaces.
Implementation:
- Preflight detector:
fluree-graph-turtle/src/splitter.rs - Policy application:
fluree-db-api/src/import.rs - Runtime switch:
SharedNamespaceAllocator::get_or_allocateinfluree-db-transact/src/namespace.rs
Transactions after import: preventing regression for unseen IRIs
Bulk import can upgrade fallback behavior at runtime (shared allocator). For subsequent normal
transactions, we also need “outlier mode” to persist so new IRIs do not regress to LastSlashOrHash.
Fluree derives this from the DB’s namespace table at open time:
- When a
LedgerSnapshotis opened,NamespaceRegistry::from_db(db)loadsdb.namespace_codes. - If the DB has already allocated namespace codes beyond the u8-ish threshold (>255), the registry
sets its fallback mode to
HostOnly.
That means a new IRI like:
http://some-unseen-host/blah/123/456
will allocate (if needed) at:
http://some-unseen-host/
instead of falling back to a finer last-slash split.
Implementation: NamespaceRegistry::from_db and NamespaceRegistry::sid_for_iri in
fluree-db-transact/src/namespace.rs
Notes and trade-offs
HostOnlycan still result in many namespaces if a dataset genuinely contains many distinct hosts (one per host), but it prevents deeper fragmentation that is common in path-heavy IRIs.- The
OVERFLOWnamespace code is a sentinel used whenu16codes are exhausted; it is not a fallback mode. Overflow SIDs store the full IRI as the SID name.