Performance Investigation with Distributed Tracing
Fluree includes deep instrumentation that decomposes every query, transaction, and indexing operation into a span waterfall visible in Jaeger, Grafana Tempo, AWS X-Ray, or any OpenTelemetry-compatible backend. This guide explains how to use that instrumentation to find and fix performance bottlenecks.
When to Use Deep Tracing
| Symptom | Start with | Escalate to |
|---|---|---|
| Single slow query | /v1/fluree/explain plan | Deep tracing at debug level |
| Slow queries in general, unclear which phase | Deep tracing at debug level | trace level for operator detail |
| Slow transactions / commits | Deep tracing at debug level | Check txn_commit sub-spans |
| Indexing taking too long | Deep tracing at debug level | Check build_index per-order timing |
| Intermittent latency spikes | Sustained tracing + Jaeger search by duration | Correlate with indexing traces |
| Production regression | Compare Jaeger traces before/after deploy | Filter by tracker_time span attribute |
Deep tracing is complementary to explain plans, not a replacement. Explain plans show the shape of a query plan; tracing shows where wall-clock time actually went.
Quick Start: Local Investigation
The otel/ directory at the repository root provides a self-contained Makefile-driven harness for local trace investigation.
Prerequisites
- Docker (for Jaeger)
- Rust toolchain
- curl, bash
One-liner setup
cd otel/
make all # starts Jaeger, builds with --features otel, starts server, runs smoke tests
make ui # opens Jaeger UI in browser
This gives you a running Fluree server exporting traces to a local Jaeger instance with pre-loaded test data.
Investigate a specific query
Once the server is running (via make server or make all):
# Run your problematic query against the server
curl -s -X POST http://localhost:8090/v1/fluree/query/otel-test:main \
-H 'Content-Type: application/json' \
-d '{
"select": ["?name", "?price"],
"where": [
{"@id": "?p", "@type": "ex:Product"},
{"@id": "?p", "ex:name": "?name"},
{"@id": "?p", "ex:price": "?price"}
],
"orderBy": [{"desc": "?price"}],
"limit": 100
}'
Then open Jaeger (make ui or http://localhost:16686), select service fluree-server, and find the trace. The waterfall shows exactly where time was spent.
Teardown
make clean-all # stops server, stops Jaeger, removes data
Writing Custom Scenario Scripts
The otel/scripts/ directory contains scenario scripts you can use as templates. To investigate a specific performance issue:
1. Create a scenario script
#!/usr/bin/env bash
# otel/scripts/my-investigation.sh
set -euo pipefail
PORT="${PORT:-8090}"
LEDGER="${LEDGER:-otel-test:main}"
BASE="http://localhost:${PORT}"
echo "=== My investigation scenario ==="
# Step 1: Insert data that triggers the problem
curl -sf -X POST "${BASE}/${LEDGER}/insert" \
-H 'Content-Type: application/json' \
-d '{
"@context": {"ex": "http://example.org/ns/"},
"@graph": [
... your test data ...
]
}' > /dev/null
sleep 0.5 # let OTEL batch exporter flush
# Step 2: Run the problematic query multiple times
for i in $(seq 1 5); do
echo " Query iteration $i..."
curl -sf -X POST "${BASE}/${LEDGER}/query" \
-H 'Content-Type: application/json' \
-d '{ ... your query ... }' > /dev/null
sleep 0.3
done
echo "=== Done. Check Jaeger for traces. ==="
2. Run it
cd otel/
make up build server # ensure infrastructure is running
bash scripts/my-investigation.sh
make ui # inspect traces
3. Add a Makefile target (optional)
# In otel/Makefile
my-investigation: _data/storage
bash scripts/my-investigation.sh
Tips for effective scenario scripts
- Pause between requests (
sleep 0.3-0.5) to let the OTEL batch exporter flush. Without this, spans from adjacent requests may interleave in Jaeger, making waterfall analysis harder. - Run the query multiple times to see variance. Sort by duration in Jaeger to find the worst case.
- Use different
RUST_LOGlevels for different investigations. Override when starting the server:make server RUST_LOG=info,fluree_db_query=trace - Isolate variables: test with and without indexing (
INDEXING=false), with different data volumes, or with different query patterns.
Reading Jaeger Waterfalls
Anatomy of a query trace
request (info) ─────────────────────────── 834ms
query_execute (debug) ─────────────────────────── 832ms
query_prepare (debug) ──── 12ms
reasoning_prep (debug) ── 3ms
pattern_rewrite (debug) ── 2ms
plan (debug) ── 5ms
query_run (debug) ──────────────────────── 818ms
scan (debug) ── 4ms
join (debug) ─────────────────── 780ms
join_flush_scan_spot (debug) ────────────────── 775ms
filter (debug) ── 2ms
sort (debug) ── 15ms
sort_blocking (debug) ── 14ms
project (debug) ── 1ms
format (debug) ── 2ms
In this example, the bottleneck is immediately visible: the join_flush_scan_spot span accounts for 775ms of the 834ms total. This tells you the query is doing a large range scan during the join phase.
Key span attributes to check
| Span | Attribute | What it tells you |
|---|---|---|
query_execute | tracker_time, tracker_fuel | Total tracked time and fuel consumption |
pattern_rewrite | patterns_before, patterns_after | Whether pattern rewriting is effective |
plan | pattern_count | Complexity of the query plan |
scan | (trace level) | How long individual scans take |
join_flush_scan_spot | unique_subjects, total_leaves | Join scan size — large values indicate broad scans |
sort_blocking | input_rows, sort_ms | Sort cost — are you sorting a huge result set? |
txn_stage | insert_count, delete_count | Transaction size |
txn_commit | flake_count, delta_bytes | Commit I/O volume |
Note: Span attributes like
tracker_time,tracker_fuel,patterns_before,assertion_count,template_count, andpattern_countare verified by acceptance tests (it_tracing_spans.rs). Other attributes in this table (unique_subjects,total_leaves,sort_ms,flake_count,delta_bytes) are documented from code inspection but not programmatically verified — they may drift if span instrumentation is refactored.
Anatomy of a transaction trace
request (info) ─────────────── 245ms
transact_execute (debug) ─────────────── 243ms
txn_stage (debug) ────── 45ms
where_exec (debug) ── 8ms
delete_gen (debug) ── 3ms
insert_gen (debug) ── 12ms
cancellation (debug) ── 5ms
policy_enforce (debug) ── 2ms
txn_commit (debug) ──────────── 195ms
commit_nameservice_lookup (debug) ── 2ms
commit_verify_sequencing (debug) ── 1ms
commit_write_raw_txn (debug) ── 5ms (await of spawned upload)
commit_build_record (debug) ── 3ms
commit_write_commit_blob (debug) ────── 65ms
commit_publish_nameservice (debug) ────── 35ms
When store_raw_txn is opted in, the raw-transaction bytes are uploaded on a Tokio task spawned at the top of the pipeline (see PendingRawTxnUpload). commit_write_raw_txn then measures just the await of that task — usually a few ms, even on S3, because the upload overlapped staging CPU work. If you see commit_write_raw_txn approaching the upload's intrinsic latency (50-100ms on S3), staging finished faster than the upload; otherwise the overlap has absorbed it. The bottleneck on S3 is now typically commit_write_commit_blob alone.
Anatomy of an indexing trace
Indexing runs as a separate trace (not nested under an HTTP request). Search Jaeger for operation name index_build:
If the index build was queued by an HTTP transaction request, use logs to bridge the two views: the background worker now copies the triggering request_id and trace_id onto its log lines, but the OTEL/Jaeger indexing trace remains separate by design.
index_build (debug) ─────────────────── 12.5s
commit_chain_walk (debug) ── 50ms
commit_resolve (debug, commits=2) ── 27ms
build_all_indexes (debug) ─────────────────── 12.4s
build_index (debug, order=SPOT) ──────── 3.1s
build_index (debug, order=PSOT) ──────── 3.2s
build_index (debug, order=POST) ──────── 3.0s
build_index (debug, order=OPST) ──────── 3.1s
Common Bottleneck Patterns
1. Join scan too broad
Symptom: join_flush_scan_spot has unique_subjects in the thousands and dominates the waterfall.
Cause: A join pattern that matches too many subjects, forcing a large range scan.
Fix: Add more selective patterns or filters to narrow the join. Check the explain plan for join order.
2. Sort on large result set
Symptom: sort_blocking shows input_rows > 10,000 and sort_ms dominates.
Cause: Sorting happens after all joins/filters, on the full result set.
Fix: Add LIMIT if possible, or ensure filters run before the sort by placing restrictive patterns first.
3. Commit I/O on S3
Symptom: commit_write_commit_blob takes 50-200ms. commit_write_raw_txn may also show time if staging completed before the parallel upload finished.
Cause: S3 PutObject latency (~50-100ms per call). The raw-txn upload is parallelized with staging, so its cost is usually absorbed, but the commit-blob write is serial on the critical path.
Fix: S3 latency is inherent. Batch multiple small transactions into fewer larger ones. Consider file storage for latency-sensitive workloads. If commit_write_raw_txn is non-trivial, it indicates staging finished faster than the raw-txn upload — the overlap helped but couldn't fully hide it.
4. Indexing backlog
Symptom: Multiple index_build traces in quick succession, each taking 10+ seconds.
Cause: Transaction volume exceeds indexing throughput, building up novelty.
Fix: Increase the novelty reindex threshold, or reduce transaction frequency. Check build_index sub-spans to see which index order is slowest.
5. Policy evaluation overhead
Symptom: policy_eval or policy_enforce takes a significant fraction of query/transaction time.
Cause: Complex policy rules that require additional queries to evaluate.
Fix: Simplify policy rules, or pre-compute policy decisions where possible.
Controlling Trace Verbosity
RUST_LOG patterns
| Goal | Pattern | Visible spans |
|---|---|---|
| Production default | info | HTTP request spans only (zero operation spans) |
| Query investigation | info,fluree_db_query=debug | + query_execute, query_prepare, query_run, operators |
| Transaction investigation | info,fluree_db_transact=debug | + txn_stage, txn_commit, commit sub-spans |
| Full debug | info,fluree_db_query=debug,fluree_db_transact=debug,fluree_db_indexer=debug | All debug spans |
| Operator-level detail | info,fluree_db_query=trace | + per-leaf: binary_cursor_next_leaf, etc. |
| Everything | debug | Console firehose (OTEL layer still filters to fluree_* only) |
Note: When OTEL is enabled, all fluree_* debug spans flow to the OTEL collector regardless of RUST_LOG. The table above describes console output only.
With the otel/ harness
# Override RUST_LOG when starting the server
make server RUST_LOG='info,fluree_db_query=trace'
# Then run your scenario
make query
In production
Set RUST_LOG via your container orchestrator's environment variables. Start at info and increase selectively:
# ECS task definition (environment section)
RUST_LOG=info,fluree_db_query=debug,fluree_db_transact=debug
Production Tracing: AWS Deployments
Architecture: fluree-db-server on ECS/Fargate
┌─────────────┐ OTLP/gRPC (4317) ┌───────────────────┐
│ ECS Task │ ─────────────────────────▶│ OTEL Collector │
│ fluree-srv │ │ (sidecar or │
│ --features │ │ Daemon/Service) │
│ otel │ └────────┬──────────┘
└─────────────┘ │
┌─────────▼──────────┐
│ Grafana Tempo / │
│ AWS X-Ray / │
│ Jaeger │
└─────────────────────┘
ECS task definition snippet:
{
"containerDefinitions": [
{
"name": "fluree-server",
"image": "your-ecr-repo/fluree-db-server:latest",
"environment": [
{"name": "RUST_LOG", "value": "info,fluree_db_query=debug,fluree_db_transact=debug"},
{"name": "OTEL_SERVICE_NAME", "value": "fluree-server"},
{"name": "OTEL_EXPORTER_OTLP_ENDPOINT", "value": "http://localhost:4317"},
{"name": "OTEL_EXPORTER_OTLP_PROTOCOL", "value": "grpc"}
]
},
{
"name": "otel-collector",
"image": "amazon/aws-otel-collector:latest",
"essential": true,
"command": ["--config=/etc/otel-config.yaml"]
}
]
}
OTEL Collector config (for X-Ray export):
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
awsxray:
region: us-east-1
# Or for Grafana Tempo:
# otlp:
# endpoint: tempo.internal:4317
# tls:
# insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray]
Architecture: fluree-db-api as a Rust crate in AWS Lambda
When using fluree-db-api directly (not through fluree-db-server), you initialize OTEL yourself. The key pattern is the same dual-layer subscriber with a Targets filter on the OTEL layer.
use opentelemetry_otlp::SpanExporter;
use opentelemetry_sdk::trace::SdkTracerProvider;
use tracing_subscriber::{filter::Targets, layer::SubscriberExt, Registry};
fn init_tracing() {
// OTEL exporter — Lambda uses HTTP/protobuf to the collector sidecar
let exporter = SpanExporter::builder()
.with_http()
.with_endpoint("http://localhost:4318") // collector sidecar
.build()
.expect("Failed to create OTLP exporter");
let resource = opentelemetry_sdk::Resource::builder()
.with_service_name("my-lambda-fn")
.build();
let provider = SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.with_resource(resource)
.build();
let tracer = provider.tracer("fluree-db");
let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
// Critical: filter OTEL layer to fluree_* crates only
let otel_filter = Targets::new()
.with_target("fluree_db_api", tracing::Level::DEBUG)
.with_target("fluree_db_query", tracing::Level::DEBUG)
.with_target("fluree_db_transact", tracing::Level::DEBUG)
.with_target("fluree_db_indexer", tracing::Level::DEBUG)
.with_target("fluree_db_core", tracing::Level::DEBUG);
let subscriber = Registry::default()
.with(tracing_subscriber::fmt::layer())
.with(otel_layer.with_filter(otel_filter));
tracing::subscriber::set_global_default(subscriber).ok();
}
Lambda deployment with ADOT (AWS Distro for OpenTelemetry):
Add the ADOT Lambda layer and set:
AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler
OTEL_SERVICE_NAME=my-fluree-lambda
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
The ADOT layer runs a collector sidecar that receives OTLP spans and exports them to X-Ray, Tempo, or any configured backend.
Grafana Tempo + Grafana UI
For production trace exploration, Grafana Tempo with the Grafana UI provides the best experience:
- Search by attributes: Find all queries with
tracker_time > 500ms - Service graph: Visualize call patterns between services
- Trace-to-logs: Jump from a slow span to the corresponding log lines
- Trace-to-metrics: Correlate latency spikes with metric dashboards
Tempo query examples (TraceQL):
# Find slow queries
{ resource.service.name = "fluree-server" && name = "query_execute" && duration > 500ms }
# Find large commits
{ name = "txn_commit" && span.flake_count > 1000 }
# Find indexing operations
{ name = "index_build" }
AWS X-Ray
X-Ray works with OTEL traces exported via the AWS OTEL Collector. Key differences from Jaeger/Tempo:
- X-Ray automatically creates a service map showing request flow
- Subsegment annotations map to OTEL span attributes
- X-Ray sampling rules can be configured server-side (no code changes)
- Use X-Ray Insights for anomaly detection on latency patterns
Using the otel/ Harness for Regression Testing
The otel/ directory is designed for reproducible trace validation. Use it to verify that tracing instrumentation works correctly after code changes:
cd otel/
# Clean slate
make fresh
# After all scenarios complete, check Jaeger:
# 1. Transaction traces should show txn_stage > txn_commit with sub-spans
# 2. Query traces should show query_prepare > query_run with operator spans
# 3. Index traces should appear as separate traces (not under a request)
Specific test scenarios
| Scenario | Command | What to verify in Jaeger |
|---|---|---|
| All transaction types | make transact | 5 traces, each with txn_stage + txn_commit |
| All query types | make query | 7 traces with query_prepare + query_run |
| Background indexing | make index | Separate index_build trace (not under a request) |
| Bulk import | make import | Many commit traces, possibly indexing traces |
| Full end-to-end | make smoke | All of the above |
| Multi-cycle stress | make cycle | 3 full cycles, multiple index_build traces |
Analyzing Exported Traces
Jaeger allows exporting traces as JSON files (click the download icon on any trace or search result). These exports are useful for offline analysis, sharing with teammates, and archiving evidence of performance issues.
Exporting from Jaeger
- Open Jaeger UI (
http://localhost:16686or your deployment) - Search for traces (by service, operation, duration, tags)
- Click the download/export icon to save as JSON
What's in the export
The JSON file contains data[].spans[] with full span details: operation names, tags (key-value attributes), parent-child references, durations (in microseconds), and timestamps. Files range from ~100KB for a few traces to 50MB+ for large search exports.
Analyzing with Claude Code
The repository includes Claude Code skills for trace analysis:
/trace-inspect /path/to/traces.json # Drill into a single trace's span tree and timing
/trace-overview /path/to/traces.json # Aggregate stats and anomaly detection across all traces
These skills analyze the export file using targeted Python scripts (to avoid loading the full JSON into context) and cross-reference the results against the expected span hierarchy to produce a diagnosis with concrete code-level fix recommendations.
Manual analysis with Python
For quick one-off checks without Claude Code, the Jaeger JSON structure is straightforward:
import json
with open("traces.json") as f:
data = json.load(f)
for trace in data["data"]:
for span in trace["spans"]:
tags = {t["key"]: t["value"] for t in span.get("tags", [])}
print(f"{span['operationName']} dur={span['duration']/1000:.1f}ms {tags}")
Key fields: operationName (span name), duration (microseconds), tags (span attributes), references (parent-child links with refType: "CHILD_OF").
Related Documentation
- Telemetry and Logging -- OTEL configuration reference
- Adding Tracing Spans -- How to instrument new code paths
- Debugging Queries -- Query-specific debugging (explain plans, etc.)
- otel/README.md -- Full otel/ harness reference