Coming from Data Lakes / Apache Iceberg

How it works

Fluree maps Iceberg tables as read-only graph sources. Once mapped, they participate in SPARQL and JSON-LD queries alongside ledger data. Fluree reads Parquet files through Iceberg's metadata layer, pushing filters down for partition pruning and column projection.

The iceberg feature flag must be compiled in.

Catalog modes

REST catalog

Connects to an Iceberg REST catalog API (Apache Polaris, Tabular, AWS Glue, etc.):

fluree iceberg map warehouse-orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl \
  --warehouse my-warehouse \
  --auth-bearer $POLARIS_TOKEN

Supports vended credentials (enabled by default) and warehouse selection.

Direct S3

Reads table metadata directly from S3 — no catalog server:

fluree iceberg map execution-log \
  --mode direct \
  --table-location s3://my-bucket/warehouse/logs/execution_log \
  --r2rml mappings/execution_log.ttl \
  --s3-region us-east-1

Fluree reads {table_location}/metadata/version-hint.text to find the current metadata file, then loads it. The file may contain a bare filename (00001-abc.metadata.json) or a full S3 path.

Requirements:

The table must be a valid Iceberg layout (created by Spark, iceberg-rust, Flink, etc.)
metadata/version-hint.text must exist and be non-empty
Uses ambient AWS credentials (IAM roles, env vars, ~/.aws/credentials) — vended credentials are not supported in direct mode

When to use which

Scenario	Mode
Shared catalog, multiple consumers	REST
Writer and reader are the same system	Direct
Need catalog-managed credentials	REST
Minimizing infrastructure (no catalog server)	Direct

R2RML mapping

Every Iceberg graph source requires an R2RML mapping (Turtle format) that defines how table rows become RDF triples — subject IRI templates, predicate mappings, and type conversions. The --r2rml flag is mandatory on fluree iceberg map:

fluree iceberg map airlines \
  --catalog-uri https://polaris.example.com/api/catalog \
  --r2rml mappings/airlines.ttl \
  --auth-bearer $POLARIS_TOKEN

Example mapping:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<#OrderMapping>
  a rr:TriplesMap ;

  rr:logicalTable [ rr:tableName "orders" ] ;

  rr:subjectMap [
    rr:template "http://example.org/order/{order_id}" ;
    rr:class ex:Order
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:orderId ;
    rr:objectMap [ rr:column "order_id" ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:total ;
    rr:objectMap [ rr:column "total" ; rr:datatype xsd:decimal ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:orderDate ;
    rr:objectMap [ rr:column "order_date" ; rr:datatype xsd:date ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:customer ;
    rr:objectMap [ rr:template "http://example.org/customer/{customer_id}" ]
  ] .

Type mapping

Iceberg type	XSD type
int, long	xsd:integer
float, double	xsd:decimal
string	xsd:string
boolean	xsd:boolean
date	xsd:date
timestamp	xsd:dateTime
uuid	xsd:string

Querying

Once mapped, query with standard SPARQL or JSON-LD Query.

SPARQL

PREFIX ex: <http://example.org/ns/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?orderId ?total ?date
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderId ?orderId .
  ?order ex:total ?total .
  ?order ex:orderDate ?date .
  FILTER (?date >= "2024-01-01"^^xsd:date)
}
ORDER BY DESC(?date)
LIMIT 100

JSON-LD Query

{
  "@context": {"ex": "http://example.org/ns/"},
  "from": "warehouse-orders:main",
  "select": ["?orderId", "?total", "?date"],
  "where": [
    {"@id": "?order", "ex:orderId": "?orderId"},
    {"@id": "?order", "ex:total": "?total"},
    {"@id": "?order", "ex:orderDate": "?date"}
  ],
  "filter": "?date >= '2024-01-01'",
  "orderBy": ["-?date"],
  "limit": 100
}

Aggregation

PREFIX ex: <http://example.org/ns/>

SELECT ?date (SUM(?total) AS ?dailyRevenue) (COUNT(?order) AS ?orderCount)
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderDate ?date .
  ?order ex:total ?total .
}
GROUP BY ?date
ORDER BY ?date

Joining with Fluree ledger data

Multiple sources in a single query — graph data from a Fluree ledger and tabular data from Iceberg:

{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/ns/"},
  "from": ["customers:main", "warehouse-orders:main"],
  "select": ["?customerName", "?orderTotal", "?orderDate"],
  "where": [
    {"@id": "?customer", "schema:name": "?customerName"},
    {"@id": "?customer", "ex:customerId": "?customerId"},
    {"@id": "?order", "ex:customerId": "?customerId"},
    {"@id": "?order", "ex:total": "?orderTotal"},
    {"@id": "?order", "ex:orderDate": "?orderDate"}
  ],
  "filter": "?orderDate >= '2024-01-01'",
  "orderBy": ["-?orderDate"]
}

Using GRAPH patterns in SPARQL:

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>

SELECT ?customerName ?productName ?quantity
FROM <customers:main>
WHERE {
  ?customer schema:name ?customerName .

  GRAPH <warehouse-orders:main> {
    ?order ex:customer ?customer ;
           ex:product ?product ;
           ex:quantity ?quantity .
  }

  GRAPH <warehouse-products:main> {
    ?product ex:name ?productName .
  }
}

Filter pushdown

Fluree pushes filters to Iceberg:

Partition pruning: only reads partitions matching the filter
File skipping: skips data files whose statistics don't match
Column pruning: only reads columns referenced by the query and mapping

For best performance, partition Iceberg tables by commonly filtered columns (date, region, etc.).

Iceberg time travel

Query historical Iceberg snapshots:

{"from": "warehouse-orders:main@snapshot:12345", ...}

Or by timestamp:

{"from": "warehouse-orders:main@timestamp:2024-01-01T00:00:00Z", ...}

This uses Iceberg's own snapshot model, independent of Fluree's time travel for ledger data.

Managing graph sources

# List all ledgers and graph sources
fluree list

# Inspect configuration
fluree info warehouse-orders

# Query from CLI
fluree query warehouse-orders 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'

# Remove
fluree drop warehouse-orders --force

Authentication

Bearer token

fluree iceberg map orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl \
  --auth-bearer $POLARIS_TOKEN

OAuth2 client credentials

fluree iceberg map orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --r2rml mappings/orders.ttl \
  --oauth2-token-url https://auth.example.com/token \
  --oauth2-client-id my-client \
  --oauth2-client-secret $CLIENT_SECRET

Direct S3 (ambient credentials)

export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1

fluree iceberg map logs \
  --mode direct \
  --table-location s3://bucket/warehouse/logs/execution_log \
  --r2rml mappings/execution_log.ttl

Rust API

use fluree_db_api::R2rmlCreateConfig;

// REST catalog
let config = R2rmlCreateConfig::new(
    "warehouse-orders",
    "https://polaris.example.com/api/catalog",
    "sales.orders",
    "fluree:file://mappings/orders.ttl",
)
.with_warehouse("my-warehouse")
.with_auth_bearer("my-token")
.with_vended_credentials(true);

fluree.create_r2rml_graph_source(config).await?;

// Direct S3
let config = R2rmlCreateConfig::new_direct(
    "execution-log",
    "s3://bucket/warehouse/logs/execution_log",
    "fluree:file://mappings/execution_log.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true);

fluree.create_r2rml_graph_source(config).await?;

Limitations

Iceberg graph sources are read-only. Writes go through your existing Iceberg writer (Spark, Flink, iceberg-rust, etc.).
Large joins across Fluree ledgers and Iceberg tables may be slow. Push filters and limits to reduce result sets.
Requires the iceberg feature flag at compile time.