R2RML (Relational to RDF Mapping)
R2RML (RDB to RDF Mapping Language) is a W3C standard for mapping tabular data into RDF triples. In Fluree, R2RML mappings are used to expose Iceberg tables as RDF graph sources, enabling you to query data lake tables using SPARQL or JSON-LD Query.
What is R2RML?
R2RML defines how to map:
- Database tables to RDF classes
- Table columns to RDF properties
- Rows to RDF resources
- Foreign keys to RDF relationships
In Fluree, this enables querying Iceberg tables as if they were RDF graphs.
Configuration
Create R2RML Graph Source (Iceberg-backed)
Use R2rmlCreateConfig to register a graph source that combines:
- an Iceberg table (REST catalog or Direct S3), and
- an R2RML mapping (Turtle) that materializes table rows into RDF triples.
If you use Direct S3 mode, Fluree resolves the current Iceberg metadata by reading metadata/version-hint.text under the configured table_location, then loading the metadata file referenced by the hint. The Iceberg table layout must already exist at that location.
use fluree_db_api::{FlureeBuilder, R2rmlCreateConfig};
let fluree = FlureeBuilder::default().build().await?;
let config = R2rmlCreateConfig::new_direct(
"airlines-rdf",
"s3://bucket/warehouse/openflights/airlines",
"fluree:file://mappings/airlines.ttl",
)
.with_s3_region("us-east-1")
.with_s3_path_style(true)
.with_mapping_media_type("text/turtle");
fluree.create_r2rml_graph_source(config).await?;
R2RML Mapping
Basic Mapping
Map a table to RDF class:
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.org/ns/> .
@prefix schema: <http://schema.org/> .
<#CustomerMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "customers"
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{id}" ;
rr:class schema:Person
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "name" ]
] ;
rr:predicateObjectMap [
rr:predicate schema:email ;
rr:objectMap [ rr:column "email" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customerId ;
rr:objectMap [ rr:column "id" ]
] .
This maps the customers table:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
To RDF triples:
<http://example.org/customer/1>
a schema:Person ;
schema:name "Alice" ;
schema:email "alice@example.org" ;
ex:customerId "1" .
Foreign Key Mapping
Map relationships:
<#OrderMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:tableName "orders"
] ;
rr:subjectMap [
rr:template "http://example.org/order/{id}" ;
rr:class ex:Order
] ;
rr:predicateObjectMap [
rr:predicate ex:orderId ;
rr:objectMap [ rr:column "id" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:customer ;
rr:objectMap [
rr:parentTriplesMap <#CustomerMapping> ;
rr:joinCondition [
rr:child "customer_id" ;
rr:parent "id"
]
]
] ;
rr:predicateObjectMap [
rr:predicate ex:total ;
rr:objectMap [ rr:column "total" ]
] .
Maps foreign key customer_id to RDF object property linking to customer resource.
Complex Queries
Use SQL views for complex mappings:
<#SalesReportMapping>
a rr:TriplesMap ;
rr:logicalTable [
rr:sqlQuery """
SELECT
c.id as customer_id,
c.name as customer_name,
SUM(o.total) as total_spent,
COUNT(o.id) as order_count
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
"""
] ;
rr:subjectMap [
rr:template "http://example.org/customer/{customer_id}" ;
rr:class ex:Customer
] ;
rr:predicateObjectMap [
rr:predicate schema:name ;
rr:objectMap [ rr:column "customer_name" ]
] ;
rr:predicateObjectMap [
rr:predicate ex:totalSpent ;
rr:objectMap [ rr:column "total_spent" ; rr:datatype xsd:decimal ]
] ;
rr:predicateObjectMap [
rr:predicate ex:orderCount ;
rr:objectMap [ rr:column "order_count" ; rr:datatype xsd:integer ]
] .
Querying R2RML Graph Sources
R2RML graph sources are queried using standard SPARQL and JSON-LD query syntax — no special query language is needed. In the Rust API, graph source resolution is wired into the lazy query builders:
fluree.graph("my-gs:main").query()for a single target that may be either a native ledger or a mapped graph sourcefluree.query_from()when the query body specifies the dataset ("from"/FROM) or combines multiple sources
The raw materialized snapshot path (fluree.db(&alias) → fluree.query(&view, ...)) is still the wrong abstraction for graph source aliases because it assumes a native ledger snapshot has already been loaded.
Graph sources can be:
- Queried directly as the target:
fluree query my-gs 'SELECT * WHERE { ?s ?p ?o }' - Referenced in FROM clauses:
SELECT * FROM <my-gs:main> WHERE { ... } - Referenced in GRAPH patterns:
SELECT * WHERE { GRAPH <my-gs:main> { ... } }(useful for joining with ledger data)
Basic Query
{
"@context": {
"schema": "http://schema.org/",
"ex": "http://example.org/ns/"
},
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "@type": "schema:Person" },
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" }
]
}
The mapping controls how subjects and predicate/object values are produced from the scanned table columns.
SPARQL Query
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/ns/>
SELECT ?name ?email
FROM <warehouse-customers:main>
WHERE {
?customer a schema:Person .
?customer schema:name ?name .
?customer schema:email ?email .
}
Filters
{
"from": "warehouse-customers:main",
"select": ["?name", "?email"],
"where": [
{ "@id": "?customer", "schema:name": "?name" },
{ "@id": "?customer", "schema:email": "?email" },
{ "@id": "?customer", "ex:status": "?status" }
],
"filter": "?status == 'active'"
}
Joins
{
"from": "warehouse-orders:main",
"select": ["?customerName", "?orderTotal"],
"where": [
{ "@id": "?customer", "schema:name": "?customerName" },
{ "@id": "?order", "ex:customer": "?customer" },
{ "@id": "?order", "ex:total": "?orderTotal" }
]
}
Combining with Fluree Data
Join Iceberg data with Fluree ledgers:
{
"from": ["products:main", "warehouse-inventory:main"],
"select": ["?productName", "?stockLevel"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:sku": "?sku" },
{ "@id": "?inventory", "ex:stockLevel": "?stockLevel" }
]
}
Combines product data from Fluree with inventory from an Iceberg-backed R2RML graph source.
Performance
R2RML graph sources execute by scanning the underlying Iceberg table and materializing RDF terms according to the mapping.
Best Practices
-
Filter Early: Filters are pushed down to Iceberg for partition pruning.
{ "where": [...], "filter": "?date >= '2024-01-01'" } -
Limit Results:
{ "where": [...], "limit": 100 } -
Project Only Needed Columns: Only columns referenced in the query and mapping are read from Parquet files.
-
Partition by Common Filters: Partition your Iceberg tables by columns frequently used in filters (e.g., date).
Use Cases
Data Lake Analytics
Query Iceberg tables containing large-scale analytical data alongside Fluree ledgers:
{
"from": ["products:main", "warehouse-sales:main"],
"select": ["?productName", "?totalSold"],
"where": [
{ "@id": "?product", "schema:name": "?productName" },
{ "@id": "?product", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:productId": "?pid" },
{ "@id": "?sale", "ex:quantity": "?totalSold" }
]
}
Multi-Table Mapping
A single R2RML mapping file can define multiple TriplesMap entries, each targeting a different Iceberg table or logical view. This enables querying across related tables through a single graph source.
Limitations
- Read-Only: R2RML graph sources are read-only (no writes via Fluree)
- Performance: Complex joins across Fluree + Iceberg may be slow
- Schema Changes: Requires mapping updates when referenced columns change
Troubleshooting
Connection Errors
{
"error": "IcebergConnectionError",
"message": "Cannot load table metadata"
}
Solutions:
- Check catalog configuration (REST vs Direct)
- Verify AWS credentials and S3 access
- Verify
version-hint.textis present for Direct mode
Mapping Errors
{
"error": "R2RMLMappingError",
"message": "Invalid R2RML mapping: table 'customers' not found"
}
Solutions:
- Verify table name / location
- Check referenced column names in the mapping
- Validate R2RML syntax (Turtle)
Slow Queries
Causes:
- Large result sets (many Parquet files scanned)
- No partition pruning
- Complex joins across Fluree + Iceberg
Solutions:
- Add date/partition filters to enable Iceberg partition pruning
- Use LIMIT clause
- Optimize R2RML mapping to project only needed columns
- Partition Iceberg tables by common filter columns
Related Documentation
- Graph Sources Overview - Graph source concepts
- Iceberg - Data lake integration
- Query Datasets - Multi-graph queries