Indexing Side-Effects

Transactions in Fluree trigger background indexing processes that build query-optimized data structures. Understanding these side-effects is crucial for performance tuning and capacity planning.

What is Indexing?

Indexing is the process of building query-optimized data structures from transaction data. Fluree maintains four index permutations (SPOT, POST, OPST, PSOT) that enable efficient query execution.

Commit vs Index

Commit (immediate):

Transaction written to log
Small, append-only files
Published to nameservice immediately
Available for time travel queries

Index (asynchronous):

Query-optimized structures built
Background process
Published to nameservice when complete
May lag behind commits

Index Structure

Fluree maintains four index permutations:

SPOT (Subject-Predicate-Object-Time)

ex:alice → schema:name → "Alice" → [t=1, t=5, t=10]
ex:alice → schema:age → 30 → [t=1]
ex:alice → schema:age → 31 → [t=10]

Optimized for: "What are all properties of this subject?"

POST (Predicate-Object-Subject-Time)

schema:name → "Alice" → ex:alice → [t=1, t=5, t=10]
schema:age → 30 → ex:alice → [t=1]
schema:age → 31 → ex:alice → [t=10]

Optimized for: "What subjects have this property/value?"

OPST (Object-Predicate-Subject-Time)

"Alice" → schema:name → ex:alice → [t=1, t=5, t=10]
30 → schema:age → ex:alice → [t=1]
31 → schema:age → ex:alice → [t=10]

Optimized for: "What subjects have this value?"

PSOT (Predicate-Subject-Object-Time)

schema:name → ex:alice → "Alice" → [t=1, t=5, t=10]
schema:age → ex:alice → 30 → [t=1]
schema:age → ex:alice → 31 → [t=10]

Optimized for: "What are all values for this predicate?"

Indexing Pipeline

1. Transaction Commit

t=42: Transaction committed
  - Flakes written to transaction log
  - Commit published to nameservice
  - commit_t updated to 42

2. Index Trigger

Background indexing process detects new commits:

Indexer: commit_t=42, index_t=40
Indexer: Need to index t=41, t=42

3. Index Building

Process transactions to build indexes:

For each flake in t=41, t=42:
  - Update SPOT index
  - Update POST index
  - Update OPST index
  - Update PSOT index

4. Index Publication

When complete, publish new index:

  - Write index snapshot to storage
  - Publish index_id to nameservice
  - Update index_t to 42

Novelty Layer

The novelty layer is the gap between indexed and committed data:

commit_t = 45
index_t = 40
novelty layer = [t=41, t=42, t=43, t=44, t=45]

Query Execution with Novelty

Queries combine index + novelty:

Query Result = Indexed Data (t ≤ 40) + Novelty Layer (41 ≤ t ≤ 45)

Performance Impact:

Small novelty: Fast queries (mostly indexed)
Large novelty: Slower queries (more transaction replay)

Indexing Performance

Transaction Size Impact

Larger transactions take longer to index:

Transaction with 10 flakes:
  - 10 flakes × 4 indexes = 40 index updates
  - Indexing time: ~1ms

Transaction with 10,000 flakes:
  - 10,000 flakes × 4 indexes = 40,000 index updates
  - Indexing time: ~100ms

Indexing Rate

Typical indexing rates:

Light load:
  - 1,000 flakes/second
  - ~10 moderate transactions/second

Heavy load:
  - 10,000 flakes/second
  - ~100 moderate transactions/second

Actual rates depend on:

Hardware (CPU, disk I/O)
Storage backend (memory, file, AWS)
Transaction patterns
System load

Monitoring Indexing

Check Indexing Status

curl http://localhost:8090/v1/fluree/info/mydb:main

Response:

{
  "ledger_id": "mydb:main",
  "commit_t": 150,
  "index_t": 140
}

Indexing lag (txns): commit_t - index_t = number of unindexed transactions

Healthy vs Unhealthy

Healthy:

commit_t = 1000
index_t = 998
novelty = 2 transactions (good!)

Unhealthy:

commit_t = 1000
index_t = 850
novelty = 150 transactions (indexing lag!)

Indexing Lag

Indexing lag occurs when indexing can't keep up with transaction rate.

Causes

High Transaction Rate
- More transactions than indexing can handle
- Sustained write load
Large Transactions
- Individual transactions with many flakes
- Bulk imports
Resource Constraints
- CPU bottleneck
- Disk I/O bottleneck
- Memory pressure
Storage Backend Latency
- Slow storage (network attached)
- AWS S3 latency

Impact

Large indexing lag affects:

Query Performance:

More novelty to replay
Slower query execution
Higher CPU usage for queries

Memory Usage:

Novelty layer held in memory
Larger memory footprint

Backup/Recovery:

Larger gap to replay
Longer recovery times

Tuning Indexing

Background indexing is controlled primarily by:

Enabling/disabling background indexing (--indexing-enabled / FLUREE_INDEXING_ENABLED)
Novelty thresholds that trigger indexing / apply backpressure (--reindex-min-bytes, --reindex-max-bytes)

See Operations: Configuration and Background Indexing for the canonical settings and tuning guidance.

4. Dedicated Indexing Process

For high-load deployments, run dedicated indexer:

# Main server (transact only; background indexing disabled)
fluree-server --indexing-enabled=false

# Indexing server
./fluree-db-indexer --ledgers mydb:main,mydb:dev

Transaction Patterns and Indexing

Batch Transactions

Good pattern:

// Batch into reasonable sizes
const batchSize = 1000;
for (let i = 0; i < entities.length; i += batchSize) {
  const batch = entities.slice(i, i + batchSize);
  await transact({ "@graph": batch });
  
  // Allow indexing time
  if (i % (batchSize * 10) === 0) {
    await sleep(1000);
  }
}

Bad pattern:

// Single giant transaction
await transact({ "@graph": allEntities });  // 1 million entities!

Continuous Transactions

For continuous transaction load:

async function writeWithBackpressure(data) {
  const status = await checkIndexingStatus();
  
  const lag = status.commit_t - status.index_t;
  if (lag > 100) {
    // Too much lag, slow down
    await sleep(1000);
  }
  
  await transact(data);
}

Bulk Imports

For large imports:

async function bulkImport(entities) {
  const batchSize = 1000;
  
  for (let i = 0; i < entities.length; i += batchSize) {
    const batch = entities.slice(i, i + batchSize);
    await transact({ "@graph": batch });
    
    // Wait for indexing to catch up every 10 batches
    if ((i / batchSize) % 10 === 0) {
      await waitForIndexing();
    }
    
    console.log(`Imported ${i + batch.length} / ${entities.length}`);
  }
}

async function waitForIndexing() {
  while (true) {
    const status = await checkIndexingStatus();
    const lag = status.commit_t - status.index_t;
    if (lag < 5) break;
    await sleep(1000);
  }
}

Graph Source Indexing

Graph sources have their own indexing processes:

BM25 Indexing

Full-text search indexes built asynchronously:

t=100: Transaction with new documents
  - Main index updated
  - BM25 indexer triggered
  - Documents added to BM25 index

Vector Search Indexing

Vector embeddings can be indexed separately for approximate nearest-neighbor (ANN) search via HNSW vector indexes (implemented with usearch, feature-gated behind the vector feature).

Inline similarity functions (dotProduct, cosineSimilarity, euclideanDistance) do not require a separate graph-source index; they compute scores directly during query execution.

t=100: Transaction with embeddings
  - Main index updated
  - Vector indexer triggered
  - Vectors added to vector index

See Vector Search for details on HNSW vector indexes and query syntax.

Best Practices

1. Monitor Novelty Layer

Track indexing lag:

setInterval(async () => {
  const status = await checkIndexingStatus();
  const lag = status.commit_t - status.index_t;
  metrics.gauge('index_lag_txns', lag);
  
  if (lag > 100) {
    logger.warn(`High indexing lag: ${lag} transactions`);
  }
}, 10000);  // Check every 10 seconds

2. Batch Appropriately

Keep transactions reasonable size:

Recommended: 100-1000 entities per transaction
Maximum: 10,000 entities per transaction

3. Rate Limiting

Implement rate limiting for heavy write loads:

const rateLimiter = new RateLimiter({
  tokensPerInterval: 100,
  interval: "minute"
});

await rateLimiter.removeTokens(1);
await transact(data);

4. Scheduled Imports

Run large imports during off-hours:

if (isOffPeakHours()) {
  await runBulkImport();
} else {
  logger.info('Deferring bulk import to off-peak hours');
}

5. Alert on Lag

Set up alerts for indexing lag:

const lag = status.commit_t - status.index_t;
if (lag > 200) {
  alert('Critical: Indexing lag > 200 transactions');
}

6. Capacity Planning

Plan capacity based on write load:

Expected load: 10,000 transactions/day
Average size: 100 flakes/transaction
Total: 1,000,000 flakes/day

Indexing capacity needed: ~12 flakes/second
With 4× safety margin: ~50 flakes/second

Troubleshooting

High indexing lag

Symptom: commit_t - index_t growing continuously

Causes:

Transaction rate exceeds indexing capacity
Large transactions
Resource constraints

Solutions:

Reduce transaction rate
Split large transactions
Increase indexing resources
Tune indexing parameters

Slow Queries

Symptom: Queries slower than expected

Possible Cause: Large novelty layer

Check:

curl http://localhost:8090/v1/fluree/info/mydb:main | jq '.t - .index.t'

Solution: Wait for indexing or reduce write rate

Index Memory Usage

Symptom: High memory usage

Cause: Large indexes or large novelty layer