How it works
Fluree maps Iceberg tables as read-only graph sources. Once mapped, they participate in SPARQL and JSON-LD queries alongside ledger data. Fluree reads Parquet files through Iceberg's metadata layer, pushing filters down for partition pruning and column projection.
The iceberg feature flag must be compiled in.
Catalog modes
REST catalog
Connects to an Iceberg REST catalog API (Apache Polaris, Tabular, AWS Glue, etc.):
fluree iceberg map warehouse-orders \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--warehouse my-warehouse \
--auth-bearer $POLARIS_TOKEN
Supports vended credentials (enabled by default) and warehouse selection.
Direct S3
Reads table metadata directly from S3 — no catalog server:
fluree iceberg map execution-log \
--mode direct \
--table-location s3://my-bucket/warehouse/logs/execution_log \
--s3-region us-east-1
Fluree reads {table_location}/metadata/version-hint.text to find the current metadata file, then loads it. The file may contain a bare filename (00001-abc.metadata.json) or a full S3 path.
Requirements:
- The table must be a valid Iceberg layout (created by Spark, iceberg-rust, Flink, etc.)
metadata/version-hint.textmust exist and be non-empty- Uses ambient AWS credentials (IAM roles, env vars,
~/.aws/credentials) — vended credentials are not supported in direct mode
When to use which
| Scenario | Mode |
|---|---|
| Shared catalog, multiple consumers | REST |
| Writer and reader are the same system | Direct |
| Need catalog-managed credentials | REST |
| Minimizing infrastructure (no catalog server) | Direct |
R2RML mapping
Without a mapping, Fluree exposes raw column data. An R2RML mapping (Turtle format) transforms rows into RDF triples with typed predicates:
fluree iceberg map airlines \
--catalog-uri https://polaris.example.com/api/catalog \
--r2rml mappings/airlines.ttl \
--auth-bearer $POLARIS_TOKEN
Example mapping:
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<#OrderMapping>
a rr:TriplesMap ;
rr:logicalTable [ rr:tableName "orders" ] ;
rr:subjectMap [
rr:template "http://example.org/order/{order_id}" ;
rr:class ex:Order
] ;
rr:predicateObjectMap [
rr:predicate ex:orderId ;
rr:objectMap [ rr:column "order_id" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:total ;
rr:objectMap [ rr:column "total" ; rr:datatype xsd:decimal ]
] ;
rr:predicateObjectMap [
rr:predicate ex:orderDate ;
rr:objectMap [ rr:column "order_date" ; rr:datatype xsd:date ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customer ;
rr:objectMap [ rr:template "http://example.org/customer/{customer_id}" ]
] .
Type mapping
| Iceberg type | XSD type |
|---|---|
| int, long | xsd:integer |
| float, double | xsd:decimal |
| string | xsd:string |
| boolean | xsd:boolean |
| date | xsd:date |
| timestamp | xsd:dateTime |
| uuid | xsd:string |
Querying
Once mapped, query with standard SPARQL or JSON-LD Query.
SPARQL
PREFIX ex: <http://example.org/ns/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?orderId ?total ?date
FROM <warehouse-orders:main>
WHERE {
?order ex:orderId ?orderId .
?order ex:total ?total .
?order ex:orderDate ?date .
FILTER (?date >= "2024-01-01"^^xsd:date)
}
ORDER BY DESC(?date)
LIMIT 100
JSON-LD Query
{
"@context": {"ex": "http://example.org/ns/"},
"from": "warehouse-orders:main",
"select": ["?orderId", "?total", "?date"],
"where": [
{"@id": "?order", "ex:orderId": "?orderId"},
{"@id": "?order", "ex:total": "?total"},
{"@id": "?order", "ex:orderDate": "?date"}
],
"filter": "?date >= '2024-01-01'",
"orderBy": ["-?date"],
"limit": 100
}
Aggregation
PREFIX ex: <http://example.org/ns/>
SELECT ?date (SUM(?total) AS ?dailyRevenue) (COUNT(?order) AS ?orderCount)
FROM <warehouse-orders:main>
WHERE {
?order ex:orderDate ?date .
?order ex:total ?total .
}
GROUP BY ?date
ORDER BY ?date
Joining with Fluree ledger data
Multiple sources in a single query — graph data from a Fluree ledger and tabular data from Iceberg:
{
"@context": {"schema": "http://schema.org/", "ex": "http://example.org/ns/"},
"from": ["customers:main", "warehouse-orders:main"],
"select": ["?customerName", "?orderTotal", "?orderDate"],
"where": [
{"@id": "?customer", "schema:name": "?customerName"},
{"@id": "?customer", "ex:customerId": "?customerId"},
{"@id": "?order", "ex:customerId": "?customerId"},
{"@id": "?order", "ex:total": "?orderTotal"},
{"@id": "?order", "ex:orderDate": "?orderDate"}
],
"filter": "?orderDate >= '2024-01-01'",
"orderBy": ["-?orderDate"]
}
Using GRAPH patterns in SPARQL:
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>
SELECT ?customerName ?productName ?quantity
FROM <customers:main>
WHERE {
?customer schema:name ?customerName .
GRAPH <warehouse-orders:main> {
?order ex:customer ?customer ;
ex:product ?product ;
ex:quantity ?quantity .
}
GRAPH <warehouse-products:main> {
?product ex:name ?productName .
}
}
Filter pushdown
Fluree pushes filters to Iceberg:
- Partition pruning: only reads partitions matching the filter
- File skipping: skips data files whose statistics don't match
- Column pruning: only reads columns referenced by the query and mapping
For best performance, partition Iceberg tables by commonly filtered columns (date, region, etc.).
Iceberg time travel
Query historical Iceberg snapshots:
{"from": "warehouse-orders:main@snapshot:12345", ...}
Or by timestamp:
{"from": "warehouse-orders:main@timestamp:2024-01-01T00:00:00Z", ...}
This uses Iceberg's own snapshot model, independent of Fluree's time travel for ledger data.
Managing graph sources
# List all ledgers and graph sources
fluree list
# Inspect configuration
fluree info warehouse-orders
# Query from CLI
fluree query warehouse-orders 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'
# Remove
fluree drop warehouse-orders --force
Authentication
Bearer token
fluree iceberg map orders \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--auth-bearer $POLARIS_TOKEN
OAuth2 client credentials
fluree iceberg map orders \
--catalog-uri https://polaris.example.com/api/catalog \
--table sales.orders \
--oauth2-token-url https://auth.example.com/token \
--oauth2-client-id my-client \
--oauth2-client-secret $CLIENT_SECRET
Direct S3 (ambient credentials)
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1
fluree iceberg map logs \
--mode direct \
--table-location s3://bucket/warehouse/logs/execution_log
Rust API
use fluree_db_api::IcebergCreateConfig;
// REST catalog
let config = IcebergCreateConfig::new(
"warehouse-orders",
"https://polaris.example.com/api/catalog",
"sales.orders",
)
.with_warehouse("my-warehouse")
.with_auth_bearer("my-token")
.with_vended_credentials(true);
fluree.create_iceberg_graph_source(config).await?;
// Direct S3
let config = IcebergCreateConfig::new_direct(
"execution-log",
"s3://bucket/warehouse/logs/execution_log",
)
.with_s3_region("us-east-1");
fluree.create_iceberg_graph_source(config).await?;
Limitations
- Iceberg graph sources are read-only. Writes go through your existing Iceberg writer (Spark, Flink, iceberg-rust, etc.).
- Large joins across Fluree ledgers and Iceberg tables may be slow. Push filters and limits to reduce result sets.
- Requires the
icebergfeature flag at compile time.