FlureeLabs
GuideMarch 30, 2026

Coming from Data Lakes / Apache Iceberg

How to query Iceberg tables through Fluree as graph sources. Covers catalog modes, R2RML mapping, querying, and joining with Fluree ledger data.

MigrationIcebergData Lakes

How it works

Fluree maps Iceberg tables as read-only graph sources. Once mapped, they participate in SPARQL and JSON-LD queries alongside ledger data. Fluree reads Parquet files through Iceberg's metadata layer, pushing filters down for partition pruning and column projection.

The iceberg feature flag must be compiled in.


Catalog modes

REST catalog

Connects to an Iceberg REST catalog API (Apache Polaris, Tabular, AWS Glue, etc.):

fluree iceberg map warehouse-orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --warehouse my-warehouse \
  --auth-bearer $POLARIS_TOKEN

Supports vended credentials (enabled by default) and warehouse selection.

Direct S3

Reads table metadata directly from S3 — no catalog server:

fluree iceberg map execution-log \
  --mode direct \
  --table-location s3://my-bucket/warehouse/logs/execution_log \
  --s3-region us-east-1

Fluree reads {table_location}/metadata/version-hint.text to find the current metadata file, then loads it. The file may contain a bare filename (00001-abc.metadata.json) or a full S3 path.

Requirements:

  • The table must be a valid Iceberg layout (created by Spark, iceberg-rust, Flink, etc.)
  • metadata/version-hint.text must exist and be non-empty
  • Uses ambient AWS credentials (IAM roles, env vars, ~/.aws/credentials) — vended credentials are not supported in direct mode

When to use which

ScenarioMode
Shared catalog, multiple consumersREST
Writer and reader are the same systemDirect
Need catalog-managed credentialsREST
Minimizing infrastructure (no catalog server)Direct

R2RML mapping

Without a mapping, Fluree exposes raw column data. An R2RML mapping (Turtle format) transforms rows into RDF triples with typed predicates:

fluree iceberg map airlines \
  --catalog-uri https://polaris.example.com/api/catalog \
  --r2rml mappings/airlines.ttl \
  --auth-bearer $POLARIS_TOKEN

Example mapping:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<#OrderMapping>
  a rr:TriplesMap ;

  rr:logicalTable [ rr:tableName "orders" ] ;

  rr:subjectMap [
    rr:template "http://example.org/order/{order_id}" ;
    rr:class ex:Order
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:orderId ;
    rr:objectMap [ rr:column "order_id" ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:total ;
    rr:objectMap [ rr:column "total" ; rr:datatype xsd:decimal ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:orderDate ;
    rr:objectMap [ rr:column "order_date" ; rr:datatype xsd:date ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate ex:customer ;
    rr:objectMap [ rr:template "http://example.org/customer/{customer_id}" ]
  ] .

Type mapping

Iceberg typeXSD type
int, longxsd:integer
float, doublexsd:decimal
stringxsd:string
booleanxsd:boolean
datexsd:date
timestampxsd:dateTime
uuidxsd:string

Querying

Once mapped, query with standard SPARQL or JSON-LD Query.

SPARQL

PREFIX ex: <http://example.org/ns/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?orderId ?total ?date
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderId ?orderId .
  ?order ex:total ?total .
  ?order ex:orderDate ?date .
  FILTER (?date >= "2024-01-01"^^xsd:date)
}
ORDER BY DESC(?date)
LIMIT 100

JSON-LD Query

{
  "@context": {"ex": "http://example.org/ns/"},
  "from": "warehouse-orders:main",
  "select": ["?orderId", "?total", "?date"],
  "where": [
    {"@id": "?order", "ex:orderId": "?orderId"},
    {"@id": "?order", "ex:total": "?total"},
    {"@id": "?order", "ex:orderDate": "?date"}
  ],
  "filter": "?date >= '2024-01-01'",
  "orderBy": ["-?date"],
  "limit": 100
}

Aggregation

PREFIX ex: <http://example.org/ns/>

SELECT ?date (SUM(?total) AS ?dailyRevenue) (COUNT(?order) AS ?orderCount)
FROM <warehouse-orders:main>
WHERE {
  ?order ex:orderDate ?date .
  ?order ex:total ?total .
}
GROUP BY ?date
ORDER BY ?date

Joining with Fluree ledger data

Multiple sources in a single query — graph data from a Fluree ledger and tabular data from Iceberg:

{
  "@context": {"schema": "http://schema.org/", "ex": "http://example.org/ns/"},
  "from": ["customers:main", "warehouse-orders:main"],
  "select": ["?customerName", "?orderTotal", "?orderDate"],
  "where": [
    {"@id": "?customer", "schema:name": "?customerName"},
    {"@id": "?customer", "ex:customerId": "?customerId"},
    {"@id": "?order", "ex:customerId": "?customerId"},
    {"@id": "?order", "ex:total": "?orderTotal"},
    {"@id": "?order", "ex:orderDate": "?orderDate"}
  ],
  "filter": "?orderDate >= '2024-01-01'",
  "orderBy": ["-?orderDate"]
}

Using GRAPH patterns in SPARQL:

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>

SELECT ?customerName ?productName ?quantity
FROM <customers:main>
WHERE {
  ?customer schema:name ?customerName .

  GRAPH <warehouse-orders:main> {
    ?order ex:customer ?customer ;
           ex:product ?product ;
           ex:quantity ?quantity .
  }

  GRAPH <warehouse-products:main> {
    ?product ex:name ?productName .
  }
}

Filter pushdown

Fluree pushes filters to Iceberg:

  • Partition pruning: only reads partitions matching the filter
  • File skipping: skips data files whose statistics don't match
  • Column pruning: only reads columns referenced by the query and mapping

For best performance, partition Iceberg tables by commonly filtered columns (date, region, etc.).


Iceberg time travel

Query historical Iceberg snapshots:

{"from": "warehouse-orders:main@snapshot:12345", ...}

Or by timestamp:

{"from": "warehouse-orders:main@timestamp:2024-01-01T00:00:00Z", ...}

This uses Iceberg's own snapshot model, independent of Fluree's time travel for ledger data.


Managing graph sources

# List all ledgers and graph sources
fluree list

# Inspect configuration
fluree info warehouse-orders

# Query from CLI
fluree query warehouse-orders 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'

# Remove
fluree drop warehouse-orders --force

Authentication

Bearer token

fluree iceberg map orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --auth-bearer $POLARIS_TOKEN

OAuth2 client credentials

fluree iceberg map orders \
  --catalog-uri https://polaris.example.com/api/catalog \
  --table sales.orders \
  --oauth2-token-url https://auth.example.com/token \
  --oauth2-client-id my-client \
  --oauth2-client-secret $CLIENT_SECRET

Direct S3 (ambient credentials)

export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1

fluree iceberg map logs \
  --mode direct \
  --table-location s3://bucket/warehouse/logs/execution_log

Rust API

use fluree_db_api::IcebergCreateConfig;

// REST catalog
let config = IcebergCreateConfig::new(
    "warehouse-orders",
    "https://polaris.example.com/api/catalog",
    "sales.orders",
)
.with_warehouse("my-warehouse")
.with_auth_bearer("my-token")
.with_vended_credentials(true);

fluree.create_iceberg_graph_source(config).await?;

// Direct S3
let config = IcebergCreateConfig::new_direct(
    "execution-log",
    "s3://bucket/warehouse/logs/execution_log",
)
.with_s3_region("us-east-1");

fluree.create_iceberg_graph_source(config).await?;

Limitations

  • Iceberg graph sources are read-only. Writes go through your existing Iceberg writer (Spark, Flink, iceberg-rust, etc.).
  • Large joins across Fluree ledgers and Iceberg tables may be slow. Push filters and limits to reduce result sets.
  • Requires the iceberg feature flag at compile time.