Nameservice Schema v2 Design
Schema Version: 2
Overview
This document describes the design for a unified nameservice schema that supports:
- Ledgers with named graphs and independent indexing
- Non-ledger graph sources (indexes/mappings like BM25, Iceberg/R2RML, Vector/HNSW, JDBC, etc.) with varying versioning semantics
- Four independent atomic concerns that can be updated without contention
- Watermarked updates for client subscription and push notifications
- Pluggable backends (DynamoDB, S3, filesystem) with consistent semantics
Terminology:
- Prefer graph source in docs and user-facing API descriptions.
- Non-ledger data sources (BM25, vector, Iceberg, R2RML) are called graph sources.
Design Goals
- Stable schema: Minimize attribute changes as features evolve
- Flexible payloads: Use JSON Maps for evolving/variable content
- Reduced conflict probability: Logically independent concerns minimize contention
- Client subscriptions: Watermarks enable efficient change detection
- Coordination via status: Soft locks/leases for distributed process coordination
The Four Concerns Model
Each nameservice record has four independent concerns, each with its own watermark and payload:
| # | Concern | Watermark | Payload | Updated By |
|---|---|---|---|---|
| 1 | Head | commit_t | commit | Transactor (on commit) |
| 2 | Index | index_t | index | Indexer (on index publish) |
| 3 | Status | status_v | status | Various (state changes, metrics, locks) |
| 4 | Config | config_v | config | Admin (settings changes) |
Each concern can be pushed independently without affecting or contending with the others.
DynamoDB Schema
Table Name
fluree-nameservice (configurable)
Physical layout: item-per-concern (PK+SK)
DynamoDB serializes writes per item, not per attribute. To achieve true per-concern independence (transactor vs indexer vs admin), represent each concern as a separate item under the same address partition:
pk(partition key): record address in thename:branchform (e.g.,"mydb:main","products-search:main")sk(sort key): concern discriminator
Recommended sk values:
metahead(ledgers only)index(ledgers + graph sources)config(ledgers + graph sources)status(ledgers + graph sources)
This layout aligns with the file-backed v2 pattern (.index.json separate) while also eliminating DynamoDB physical contention between writers.
Design Note: Per-Concern Independence
Each concern is logically independent:
- No shared
updated_at: Each concern’s watermark (commit_t,index_t, etc.) serves as its timestamp/version marker - Disjoint items: Updating one concern does not touch any attributes of another concern
- Reduced conflict probability: Independent concerns minimize logical contention
With the item-per-concern layout, DynamoDB contention is limited to writers of the same concern.
Entity kinds and graph source types
The meta item carries the record discriminator:
kind:ledger|graph_sourcesource_type(graph sources only): a type string (e.g.,f:Bm25Index,f:HnswIndex,f:IcebergSource,f:R2rmlSource,f:JdbcSource)
Use graph_source naming consistently in pk values and type strings.
Watermark Semantics
Watermarks are strict monotonic per concern. This ensures:
- Clients can detect changes by comparing watermarks.
- No change is ever "invisible" to subscribers.
- Simple comparison logic:
if remote_watermark > local_watermark then changed.
commit_t (Ledger commit watermark)
- Value: Equals the commit
t(transaction time). - Update rule: Strict monotonic (
new_t > current_t). - Rationale: Commits are already strictly ordered by
t, sotIS the version
index_t (Index watermark)
- Value: Transaction time
tthat the published index covers. - Update rule: Strict monotonic (
new_t > current_t). - Admin reindex: allow idempotent overwrite at the same
t(new_t >= current_t) when rebuilding an index to the same watermark with a new address.
status_v (Status Watermark)
- Value: Atomic incrementing integer
- Update rule: Strict monotonic (
new_v > current_v) - Rationale: Status has no
trelation; version is just a change counter
config_v (Config Watermark)
- Value: Atomic incrementing integer
- Update rule: Strict monotonic (
new_v > current_v) - Rationale: Config has no
trelation; version is just a change counter
Unborn State Semantics
When a record is initialized but has no data yet for a concern:
| Concern | Unborn Watermark | Unborn Payload | Meaning |
|---|---|---|---|
head | commit_t = 0 | commit = null | Ledger initialized, no commits yet |
index | index_t = 0 | index = null | No index published yet |
status | status_v = 1 | status = {state: "ready"} | Always has initial status |
config | config_v = 0 | config = null | No config set yet |
Key distinction:
*_v = 0withpayload = null: Initialized but unborn (record exists)- Record not found (GetItem returns nothing): Unknown/never created
Payload Schemas
commit (Ledger)
{
"id": "bafybeigdyr...commitCid",
"t": 42
}
| Field | Type | Description |
|---|---|---|
id | String | ContentId (CIDv1) of the commit |
t | Number | Transaction time (redundant with commit_t but explicit) |
See ContentId and ContentStore for details on the CID format.
index (Ledger with Named Graphs)
{
"default": {
"id": "bafybeig...indexRootDefault",
"t": 42,
"rev": 0
},
"txn-metadata": {
"id": "bafybeig...indexRootTxnMeta",
"t": 42,
"rev": 1
},
"audit-log": null
}
| Field | Type | Description |
|---|---|---|
{named-graph} | Object | null | Index state per named graph |
.id | String | ContentId (CIDv1) of the index root |
.t | Number | Transaction time the index covers |
.rev | Number | Revision at that t (0, 1, 2... for reindex operations) |
Named graph = null means that graph exists but hasn't been indexed yet.
index (Graph Source)
For graph sources with index state (e.g., BM25, vector, spatial, Iceberg, etc.), the nameservice stores a head pointer to the graph source's latest index root/manifest. The payload is intentionally opaque to nameservice: the graph source implementation defines what the ContentId points to and how (or whether) it supports time travel.
{
"id": "bafybeig...graphSourceIndexRoot",
"index_t": 42
}
For graph sources with no index concept (e.g., JDBC mappings): null.
Design note: Snapshot history (if any) is stored in graph-source-owned manifests in storage, not in nameservice. See docs/design/graph-source-index-manifests.md.
status
{
"state": "ready",
"queue_depth": 3,
"last_commit_ms": 45
}
| Field | Type | Description |
|---|---|---|
state | String | Current state (see State Values below) |
* | Any | Additional metadata varies by state and entity type |
State Values
| State | Description | Typical Metadata |
|---|---|---|
ready | Normal operating state (default initial state) | queue_depth, last_commit_ms |
indexing | Background indexing in progress | index_lock |
reindexing | Full reindex in progress | reindex_lock, progress |
syncing | Graph source syncing from source | progress, source_t, synced_t |
maintenance | Administrative maintenance in progress | maintenance_lock |
retracted | Soft-deleted | retracted_at, reason |
error | Error state | error, error_at |
status with Locks (Coordination)
{
"state": "indexing",
"index_lock": {
"holder": "indexer-7f3a",
"target_t": 45,
"acquired_at": 1705312200,
"expires_at": 1705316100
}
}
| Field | Type | Description |
|---|---|---|
index_lock | Object | null | Soft lock for indexing coordination |
.holder | String | Identifier of the process holding the lock |
.target_t | Number | The t being indexed |
.acquired_at | Number | Unix epoch when lock was acquired |
.expires_at | Number | Unix epoch when lock expires (lease timeout) |
config
{
"default_context_id": "bafkreih...contextCid",
"index_threshold": 1000,
"replication": {
"factor": 3,
"regions": ["us-east-1", "us-west-2"]
}
}
Config is fully flexible JSON. Common fields:
| Field | Type | Description |
|---|---|---|
default_context_id | String | ContentId (CIDv1) of default JSON-LD context |
index_threshold | Number | Commits before auto-index |
replication | Object | Replication settings |
For graph sources, config contains type-specific settings:
BM25:
{
"k1": 1.2,
"b": 0.75,
"fields": ["title", "body", "description"]
}
JDBC:
{
"connection_string": "jdbc:postgresql://host:5432/db",
"schema": "public",
"pool_size": 10
}
DynamoDB Operations
CAS Semantics (Git-like Push)
All push operations support compare-and-set (CAS) semantics with expected old values. This enables Git-like divergence detection:
- Caller provides
expected(the last-known state) andnew(the desired state) - Backend rejects if current state doesn't match
expected - On rejection, backend returns
actualcurrent state for caller to reconcile
This is stronger than simple watermark monotonicity: it detects divergence, not just staleness.
Create (Initialize)
Operation: PutItem
ConditionExpression: attribute_not_exists(#pk)
Item: {
pk: "mydb:main",
sk: "meta",
schema: 2,
kind: "ledger",
name: "mydb",
branch: "main",
dependencies: null,
created_at: <now>, // optional
updated_at_ms: <now_ms>,
retracted: false,
}
push_commit (Publish Commit)
Option A: Monotonic only (simpler, allows fast-forward by any newer commit)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }
ConditionExpression: attribute_not_exists(#ct) OR #ct < :new_t
UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
"#ct": "commit_t",
"#c": "commit"
}
ExpressionAttributeValues: {
":new_t": 42,
":commit": { "id": "bafybeig...commitT42", "t": 42 }
}
Option B: CAS with expected value (Git-like, detects divergence)
CAS checks both watermark equality AND payload equality. The condition is a single OR'd expression handling both existing and unborn cases:
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "head" }
// Single condition: existing case OR unborn case
ConditionExpression:
(#ct = :expected_t AND #c = :expected_commit AND :new_t > :expected_t)
OR
(#ct = :zero AND attribute_type(#c, :null_type) AND :new_t > :zero)
UpdateExpression: SET #ct = :new_t, #c = :commit
ExpressionAttributeNames: {
"#ct": "commit_t",
"#c": "commit"
}
ExpressionAttributeValues: {
":expected_t": 41, // caller's last-known watermark
":expected_commit": { "id": "bafybeig...commitT41", "t": 41 }, // caller's last-known payload
":new_t": 42,
":commit": { "id": "bafybeig...commitT42", "t": 42 },
":zero": 0,
":null_type": "NULL"
}
Caller logic: Set :expected_v and :expected_head based on last-known state:
- If unborn:
:expected_v = 0,:expected_headcan be any value (the unborn clause matches on#hv = :zero) - If existing:
:expected_v = last_v,:expected_head = last_payload
Note: DynamoDB does support nested paths like #h.#addr (with #h=head, #addr=address). However, comparing the entire map (#h = :expected_head) is simpler and avoids partial-match edge cases.
Recommendation: Use Option B (CAS) for transactors to detect divergence. Use Option A for distributed sync where fast-forward is acceptable.
push_index (Publish Index)
CAS with expected watermark + monotonic enforcement:
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "index" }
ConditionExpression: (attribute_not_exists(#it) OR #it < :new_t)
UpdateExpression: SET #it = :new_t, #i = :index
ExpressionAttributeNames: {
"#it": "index_t",
"#i": "index"
}
ExpressionAttributeValues: {
":new_t": 42,
":index": {
"default": { "id": "bafybeig...indexDefault", "t": 42, "rev": 0 },
"txn-metadata": { "id": "bafybeig...indexTxnMeta", "t": 42, "rev": 1 }
},
}
Note: For admin rebuilds at the same watermark, allow #it <= :new_t as the condition (idempotent overwrite at equal t).
push_status (Update Status)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "status" }
ConditionExpression: (#sv = :expected_v AND :new_v > :expected_v)
OR
(attribute_not_exists(#sv) AND :expected_v = :zero)
UpdateExpression: SET #sv = :new_v, #s = :status
ExpressionAttributeNames: {
"#sv": "status_v",
"#s": "status"
}
ExpressionAttributeValues: {
":expected_v": 89,
":zero": 0,
":new_v": 90,
":status": { "state": "ready", "queue_depth": 0 }
}
Note: status_v starts at 1 (not 0) on creation, so attribute_not_exists(#sv) handles cases where the attribute is missing (e.g., partially-written or manually-created items). Normal updates use the first clause.
push_config (Update Config)
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "config" }
ConditionExpression: (#cv = :expected_v AND :new_v > :expected_v)
OR
(#cv = :zero AND attribute_type(#c, :null_type) AND :expected_v = :zero)
UpdateExpression: SET #cv = :new_v, #c = :config
ExpressionAttributeNames: {
"#cv": "config_v",
"#c": "config"
}
ExpressionAttributeValues: {
":expected_v": 2,
":zero": 0,
":new_v": 3,
":config": { "default_context_id": "bafkreih...", "index_threshold": 500 },
":null_type": "NULL"
}
Note: Unborn clause checks both #cv = :zero AND attribute_type(#c, NULL) to prevent accepting writes against inconsistent states.
Retract
Operation: UpdateItem
Key: { pk: "mydb:main", sk: "meta" }
UpdateExpression: SET #r = :true, #sv = :new_sv, #s = :status
ExpressionAttributeNames: {
"#r": "retracted",
"#sv": "status_v",
"#s": "status"
}
ExpressionAttributeValues: {
":true": true,
":new_sv": 91,
":status": { "state": "retracted", "retracted_at": 1705315800 }
}
Lookup (Read)
Operation: GetItem
Key: { pk: "mydb:main", sk: "meta" }
ConsistentRead: true
To read full state, query all items for the record address: pk = "mydb:main" and assemble meta + head + index + status + config as present.
List by Kind
Operation: Query (requires GSI on kind)
KeyConditionExpression: #kind = :kind
ExpressionAttributeNames: { "#kind": "kind" }
ExpressionAttributeValues: { ":kind": "ledger" }
To list graph sources, query kind = graph_source.
To list graph sources of a specific type (optional GSI), query source_type = f:Bm25Index, etc.
Push Result Handling
Each push operation returns one of:
| Result | Meaning | Action |
|---|---|---|
Updated | Update accepted | Proceed |
Conflict | Expected didn't match current | Reconcile using actual |
Rust Types (aligned with existing RefKind/CasResult vocabulary)
/// Which concern is being read or updated.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub enum ConcernKind {
/// The commit head pointer (`commit_t` + `commit` payload)
Head,
/// The index state (`index_t` + `index` payload)
Index,
/// The status state (status_v + status payload)
Status,
/// The config state (config_v + config payload)
Config,
}
/// Value of a concern: watermark + optional payload.
///
/// - `Some(ConcernValue { v: 0, payload: None })` — unborn (initialized, no data)
/// - `Some(ConcernValue { v: N, payload: Some(...) })` — has data
/// - `None` (at Option level) — record doesn't exist
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ConcernValue<T> {
pub v: i64,
pub payload: Option<T>,
}
/// Outcome of a compare-and-set push operation.
///
/// Conflicts are NOT errors — they are expected outcomes of concurrent
/// writes and must be handled by the caller (retry, report, etc.).
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum CasResult<T> {
/// CAS succeeded — the concern was updated to the new value.
Updated,
/// CAS failed — `expected` did not match the current value.
/// `actual` carries the current concern value so the caller can decide
/// what to do next (retry, diverge, etc.).
Conflict { actual: Option<ConcernValue<T>> },
}
Conflict Handling
On Conflict, the caller receives the actual current state and can:
- Fast-forward: If
actual.v < new.v, retry withexpected = actual - Divergence: If
actual.v >= new.vor addresses differ unexpectedly, handle merge/error - Retry loop: For distributed systems, implement bounded retry with backoff
async fn push_with_retry<T>(
ns: &impl ConcernPublisher<T>,
address: &str,
kind: ConcernKind,
new: ConcernValue<T>,
max_retries: usize,
) -> Result<CasResult<T>> {
let mut expected = ns.get_concern(address, kind).await?;
for _ in 0..max_retries {
match ns.push_concern(address, kind, expected.as_ref(), &new).await? {
CasResult::Updated => return Ok(CasResult::Updated),
CasResult::Conflict { actual } => {
// Check if fast-forward is still possible
if let Some(ref act) = actual {
if new.v <= act.v {
// Diverged - can't fast-forward
return Ok(CasResult::Conflict { actual });
}
}
// Retry with new expected
expected = actual;
}
}
}
// Exhausted retries
let actual = ns.get_concern(address, kind).await?;
Ok(CasResult::Conflict { actual })
}
Example Records
DynamoDB (item-per-concern) examples
This section shows the DynamoDB physical layout (multiple items per address partition). Other backends serialize the same logical concerns differently.
Ledger (typical items)
Ledger records are represented as multiple items under the same pk:
{
"pk": "mydb:main",
"sk": "meta",
"schema": 2,
"kind": "ledger",
"name": "mydb",
"branch": "main",
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false
}
{
"pk": "mydb:main",
"sk": "head",
"schema": 2,
"commit_t": 42,
"commit": { "id": "bafybeig...commitT42", "t": 42 }
}
{
"pk": "mydb:main",
"sk": "index",
"schema": 2,
"index_t": 42,
"index": {
"default": { "id": "bafybeig...indexDefaultT42", "t": 42, "rev": 0 }
}
}
{
"pk": "mydb:main",
"sk": "config",
"schema": 2,
"config_v": 2,
"config": { "default_context_id": "bafkreih...contextCid", "index_threshold": 1000 }
}
{
"pk": "mydb:main",
"sk": "status",
"schema": 2,
"status_v": 89,
"status": { "state": "ready", "queue_depth": 3, "last_commit_ms": 45 }
}
Ledger (unborn)
An "unborn" ledger has all 5 concern items created atomically at initialization. The head and index items have watermarks set to 0 with null payloads. The status item starts at status_v=1 with state="ready". The config item starts at config_v=0 (unborn).
Graph Source (BM25)
{
"pk": "search:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:Bm25Index",
"name": "search",
"branch": "main",
"dependencies": ["mydb:main"],
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false
}
Additional concern items for the same pk (examples):
{
"pk": "search:main",
"sk": "config",
"schema": 2,
"config_v": 1,
"config": { "k1": 1.2, "b": 0.75, "fields": ["title", "body"] }
}
{
"pk": "search:main",
"sk": "index",
"schema": 2,
"index_t": 42,
"index": { "id": "bafybeig...bm25IndexRoot" }
}
Graph Source (Iceberg)
{
"pk": "analytics:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:IcebergSource",
"name": "analytics",
"branch": "main",
"dependencies": ["mydb:main"],
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false,
"...": "see config/index items"
}
Graph Source (JDBC - No Index)
{
"pk": "erp:main",
"sk": "meta",
"schema": 2,
"kind": "graph_source",
"source_type": "f:JdbcSource",
"name": "erp",
"branch": "main",
"dependencies": null,
"created_at": 1705312200,
"updated_at_ms": 1705312200123,
"retracted": false,
"...": "see config item; index item may be absent or have index_t=0"
}
Git-like Push Model
The nameservice follows a git-like model where:
- Local nameservice: Each node has a local NS for reads and local writes
- Upstream nameservice: The "source of truth" that accepts or rejects pushes
- Push operations: Local changes are pushed upstream
- Forward operations: Requests can be forwarded upstream without local write
┌─────────────────┐ push_head ┌─────────────────────┐
│ Transactor │ ────────────────────────▶ │ │
│ (local NS) │ │ Upstream NS │
└─────────────────┘ │ │
│ - DynamoDB, or │
┌─────────────────┐ push_index │ - S3 + ETags, or │
│ Indexer │ ────────────────────────▶ │ - FS + locks, or │
│ (local NS) │ │ - Service │
└─────────────────┘ │ │
▲ │ Enforces: │
│ pull/sync │ - Watermark rules │
└─────────────────────────────────────│ - Serialization │
└─────────────────────┘
Upstream NS Backend Options
| Backend | How It Enforces Rules |
|---|---|
| DynamoDB | Conditional expressions on watermarks |
| S3 | ETags for CAS + application logic |
| Filesystem | File locks or single-writer process |
| Service | Queue + application logic |
The push interface is the same regardless of backend.
Status-based Coordination (Soft Locks)
Status can carry soft locks for coordinating distributed processes:
Lock Acquisition Flow
1. Indexer starts up
2. Read current status
3. If index_lock exists and not expired:
→ Another indexer is working, wait or skip
4. If no lock or lock expired:
→ Push status with our lock claim (status_v + 1)
→ If accepted: we own the lock, proceed
→ If rejected: someone else claimed it, back off
5. Do indexing work (periodically refresh lock by pushing status)
6. Push index update
7. Push status: clear lock, set state to ready
Lock Expiry (Crash Recovery)
If a process crashes while holding a lock:
- The
expires_attimestamp allows other processes to take over - No manual intervention needed
- Typical lease duration: 5-15 minutes depending on operation
Lock Refresh
Long-running operations should periodically refresh their lock:
{
"state": "indexing",
"index_lock": {
"holder": "indexer-7f3a",
"target_t": 45,
"acquired_at": 1705312200,
"expires_at": 1705316100,
"refreshed_at": 1705314000
},
"progress": 0.67
}
Client Subscription Model
Clients track watermarks to detect changes:
{
"subscriptions": {
"mydb:main": {
"kind": "ledger",
"commit_t": 42,
"index_t": 42,
"status_v": 89,
"config_v": 2
},
"search:main": {
"kind": "graph_source",
"source_type": "f:Bm25Index",
"index_t": 42,
"status_v": 12,
"config_v": 1
}
}
}
Change Detection
- Client polls or receives notification
- Compare watermarks:
if remote.commit_t > local.commit_t - Fetch only the changed concern(s)
- Update local cache
Subscription Granularity
Clients can subscribe to:
- All concerns for an address
- Specific concerns (e.g., only
commit_tfor a query client) - All addresses of a kind (e.g., all ledgers)
File-backed Nameservice Considerations
The logical concerns (head/index/status/config) can be stored in different physical layouts depending on the backend.
The file-backed and storage-backed implementations in this repo use the ns@v2 JSON-LD format (see fluree-db-nameservice/src/file.rs and fluree-db-nameservice/src/storage_ns.rs):
- Main record:
ns@v2/{name}/{branch}.json(commit/head + status + config-ish fields) - Index record:
ns@v2/{name}/{branch}.index.json(index head pointer only)
Field names differ from the DynamoDB layout, but the semantics match:
- logical
commit_tis stored asf:t - logical
commit.idis stored asf:ledgerCommit.@id(a CID string) - logical
index_tis stored asf:ledgerIndex.f:t(orf:indexTfor graph source index files) - logical
index.idis stored asf:ledgerIndex.@id(a CID string, orf:indexIdfor graph source index files)
Layout Options
Option A: Single File (Unified)
ns@v2/{name}/{branch}.json
- Contains all four concerns in one file
- Simplest for reads (one fetch)
- Requires single-writer discipline or file-level CAS
Option B: Separate Head and Index Files (Current Implementation)
ns@v2/{name}/{branch}.json # head + status + config
ns@v2/{name}/{branch}.index.json # index only
- Matches current implementation
- Allows transactor and indexer to write independently
- 2 files to read per entity for full state
- Trade-off: Status and config updates contend with head updates at file-lock level. Acceptable if status updates are low-frequency (state changes only, not high-frequency metrics).
Option C: Fully Separate Files (Maximum Independence)
ns@v2/{name}/{branch}.head.json
ns@v2/{name}/{branch}.index.json
ns@v2/{name}/{branch}.status.json
ns@v2/{name}/{branch}.config.json
- Each concern in its own file
- Maximum write independence
- 4 files to read per entity
Recommended Approach
Use Option B (separate head/index) as the default:
- Proven in current implementation
- Solves the main contention issue (transactor vs indexer)
- Reasonable read overhead (2 files)
- Constraint: Status updates should be coarse-grained (state transitions, not per-transaction metrics). If high-frequency status updates are needed, consider Option C.
Use Option C (fully separate files) when:
- Status updates are frequent (e.g., real-time queue depth reporting)
- Multiple independent processes update different concerns
- Write independence is more important than read efficiency
For queryable nameservice with many entities:
- Read files in parallel
- Consider in-memory caching with file-change notification
- The 2-file layout is acceptable; 4-file layout may add too much I/O
Atomicity Mechanisms
| Backend | Mechanism | Notes |
|---|---|---|
| Filesystem | Atomic rename (write to temp, rename) | POSIX guarantees |
| S3 | ETags for CAS | If-Match header |
| GCS | Generation numbers | Similar to ETags |
File Content Format
Each file contains JSON matching the concern's payload plus metadata:
head file ({name}/{branch}.json):
{
"@context": { "f": "https://ns.flur.ee/db#" },
"@id": "mydb:main",
"@type": ["f:Database", "f:LedgerSource"],
"f:ledger": { "@id": "mydb" },
"f:branch": "main",
"f:ledgerCommit": { "@id": "bafybeig...commitT42" },
"f:t": 42,
"f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 },
"f:status": "ready"
}
index file ({name}/{branch}.index.json):
{
"@context": { "f": "https://ns.flur.ee/db#" },
"f:ledgerIndex": { "@id": "bafybeig...indexRootT42", "f:t": 42 }
}
Global Secondary Indexes (GSIs)
GSI1: gsi1-kind (Implemented)
| GSI Name | Partition Key | Sort Key | Use Case |
|---|---|---|---|
gsi1-kind | kind | pk | List all entities of a kind (ledger, graph_source) |
- Only
metaitems carry thekindattribute and project into the GSI - Projection:
INCLUDEwithname,branch,source_type,dependencies,retracted - Used by
all_records()(kind=ledger) andall_vg_records()(kind=graph_source) - After GSI query returns meta items,
BatchGetItemfetches remaining concern items (config,index) to assemble full records
Future GSIs
| GSI Name | Partition Key | Sort Key | Use Case |
|---|---|---|---|
source-type-index | source_type | pk | List all graph sources of a given type |
state-index | status_state | pk | Find entities in specific state |
Note on state-index: DynamoDB GSIs cannot use nested map attributes as keys. To enable this GSI:
- Add an optional denormalized attribute
status_state(String) on thestatusitem - Update
status_statewheneverstatus.statechanges - Only add it if you need GSI-based queries by state
Alternative: Use Scan with FilterExpression on status.state (less efficient but no schema extension needed)
Future Considerations
Streams and Events
DynamoDB Streams can be enabled to:
- Trigger Lambda on changes
- Build event sourcing
- Replicate to other regions
Multi-region
For global deployments:
- Use DynamoDB Global Tables
- Or regional nameservices with cross-region sync
Appendix: Attribute Reference
All items share:
| Attribute | Type | Description |
|---|---|---|
pk | String | Record address (name:branch) |
sk | String | Concern discriminator (meta, head, index, status, config) |
schema | Number | Schema version (always 2) |
meta item
| Attribute | Type | Description |
|---|---|---|
kind | String | ledger | graph_source |
name | String | Base name |
branch | String | Branch name |
retracted | Boolean | Soft-delete flag |
branches | Number | Child branch reference count (0 for leaf branches, omitted when 0 in JSON-LD) |
dependencies | List<String> | null | Graph-source dependencies (optional) |
source_type | String | null | Graph-source type (e.g., f:Bm25Index) |
created_at | Number | Creation timestamp (epoch seconds, optional) |
updated_at_ms | Number | Last update time (epoch millis, optional) |
meta item: Branch Attributes
For branches created via create_branch, the meta item carries an additional attribute recording the source branch:
| Attribute | Type | Description |
|---|---|---|
bp_source | String | null | Source branch name (e.g., "main") |
This attribute is null/absent for the original main branch. The JSON-LD format uses f:sourceBranch. The divergence point between a branch and its source is computed on demand by walking the commit chains rather than being stored.
head item (ledgers only)
| Attribute | Type | Description |
|---|---|---|
commit_t | Number | Commit watermark (t) |
commit | Map | null | { id, t } (id is a ContentId CID string) |
index item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
index_t | Number | Index watermark (t) |
index | Map | null | Ledger index map or graph-source head pointer payload |
status item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
status_v | Number | Status change counter |
status | Map | Status payload |
status_state | String | null | Optional denormalized status.state for a GSI |
config item (ledgers + graph sources)
| Attribute | Type | Description |
|---|---|---|
config_v | Number | Config change counter |
config | Map | null | Config payload |
Watermark Semantics Summary
| Watermark | Semantics | Initial Value | Update Rule |
|---|---|---|---|
commit_t | = commit t | 0 (unborn) | Strict: new > current |
index_t | = index t | 0 (unborn) | Strict: new > current (admin may allow equal) |
status_v | Counter | 1 (ready) | Strict: new > current |
config_v | Counter | 0 (unborn) | Strict: new > current |