Graph Sources
Differentiator: Graph sources are one of Fluree's most powerful features, enabling seamless integration of specialized indexes and external data sources directly into graph queries. Unlike traditional databases that require separate systems for full-text search, vector similarity, or data lake access, Fluree makes these capabilities first-class citizens in the query language.
What Are Graph Sources?
A graph source is anything you can address by a graph name/IRI in Fluree query execution. Graph sources may be backed by:
- Ledger graphs (default graph and named graphs stored as RDF triples)
- Index graph sources (BM25 and vector/HNSW indexes)
- Mapped graph sources (R2RML and Iceberg-backed mappings)
Key Characteristics
- Query integration: Graph sources can be queried using the same SPARQL and JSON-LD Query interfaces
- Transparent access: Applications don't need to know whether data comes from a ledger graph source or a non-ledger graph source
- Specialization: Each graph source type is optimized for specific query patterns
- Time travel (type-specific): Some graph sources support time-travel queries, but support is not uniform across all types. Time-travel is implemented by each graph source type (not by the nameservice).
Graph Source Types
BM25 Full-Text Search
Differentiator: Fluree includes built-in BM25 full-text search indexing, eliminating the need for separate search systems like Elasticsearch.
Use Cases:
- Product search with relevance ranking
- Document search with keyword matching
- Content discovery with fuzzy matching
Example:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "products:main",
"select": ["?product", "?score"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?score" }
}
],
"orderBy": [["desc", "?score"]],
"limit": 10
}
Key Features:
- Relevance scoring (BM25 algorithm)
- Configurable parameters (k1, b)
- Language-aware search
- Optional time-travel support (BM25-owned manifest; see “Time Travel” below)
See the BM25 documentation for details.
Vector Similarity Search (ANN)
Differentiator: Native support for approximate nearest neighbor (ANN) queries via embedded HNSW indexes, enabling semantic search and similarity queries. Can run embedded (in-process) or via a dedicated remote search service.
Use Cases:
- Semantic search (find similar documents)
- Recommendation systems
- Image similarity search
- Embedding-based queries
Key Features:
- Approximate nearest neighbor search (HNSW algorithm)
- Configurable distance metrics (cosine, euclidean, dot product)
- Embedded indexes (no external service required) or remote mode via
fluree-search-httpd - Support for high-dimensional vectors
- Snapshot-based persistence with watermarks (head-only in v1; time-travel not supported)
See the Vector Search documentation for details.
Apache Iceberg Integration
Differentiator: Query Apache Iceberg tables and Parquet files directly as graph sources, enabling seamless integration with data lake architectures.
Use Cases:
- Query data lake formats without ETL
- Combine graph data with tabular data
- Analytics queries over large datasets
- Integration with existing data pipelines
Example:
# Query Iceberg table as graph source
SELECT ?customer ?order ?amount
FROM <iceberg:sales:main>
WHERE {
?order ex:customer ?customer .
?order ex:amount ?amount .
FILTER(?amount > 1000)
}
Key Features:
- Direct querying of Iceberg tables
- Parquet file support
- R2RML mapping for tabular data (Iceberg-backed)
- Time-travel via Iceberg snapshots
- Direct S3 mode: bypass REST catalog servers for
iceberg-rust/ self-managed tables — readsversion-hint.textfor automatic version discovery
See the Iceberg documentation for details.
R2RML Relational Mapping
Differentiator: Map relational databases to RDF using R2RML (R2RML Mapping Language), enabling graph queries over SQL databases.
Use Cases:
- Adopt graph queries alongside SQL data sources
- Query SQL databases using SPARQL
- Integrate existing systems
- Unified query interface across data sources
Example:
# Query relational database via R2RML mapping
SELECT ?customer ?order
FROM <r2rml:orders:main>
WHERE {
?customer ex:hasOrder ?order .
?order ex:status "pending" .
}
Key Features:
- R2RML standard compliance
- Automatic RDF mapping from SQL schemas
- Read-only access to source databases
- Support for complex joins and transformations
See the R2RML documentation for details.
Graph Source Lifecycle
Creation
Graph sources are created through administrative operations, specifying:
- Type: BM25, Vector, Iceberg, or R2RML
- Configuration: Type-specific settings
- Dependencies: Source ledgers or data sources
- Branch: Graph sources support branching like ledgers
Example BM25 Graph Source Creation:
{
"@type": "f:Bm25Index",
"f:name": "products-search",
"f:branch": "main",
"f:sourceLedger": "products:main",
"f:config": {
"k1": 1.2,
"b": 0.75,
"fields": ["name", "description"]
}
}
Indexing
Graph sources maintain their own indexes:
- BM25: Full-text indexes are built from source ledger data
- Vector: Embeddings stored in HNSW indexes (embedded or remote)
- Iceberg: Metadata is cached for efficient querying
- R2RML: Mapping rules are applied to generate RDF
Querying
Graph sources are queried like regular ledgers:
# Query any graph source
SELECT ?result
FROM <graph-source-name:branch>
WHERE {
# Query patterns specific to graph source type
}
Time Travel
Some graph sources support historical queries using the @t: syntax in the ledger reference, but the behavior is graph-source-type specific:
{
"@context": { "f": "https://ns.flur.ee/db#" },
"from": "products:main@t:1000",
"select": ["?product"],
"where": [
{
"f:graphSource": "products-search:main",
"f:searchText": "laptop",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product" }
}
]
}
BM25
BM25 can support time travel by maintaining a BM25-owned manifest in storage that maps transaction watermarks (t) to index snapshot addresses. The nameservice stores only a head pointer (an opaque address to the latest BM25 manifest/root) and does not store snapshot history.
Vector
Vector search is head-only in v1. If a query requests an @t: (or otherwise requests an historical view), vector search rejects the request with a clear “time-travel not supported” error.
Iceberg
Iceberg time travel (when used) is handled by Iceberg’s own snapshot/metadata model, not by nameservice-managed snapshot history.
Graph Source Architecture
Nameservice Integration
Graph sources are tracked in the nameservice alongside ledgers:
- Discovery: List all graph sources via nameservice
- Metadata: Configuration and status stored in nameservice
- Coordination: Index state tracked separately from source ledgers
Important: for graph sources, the nameservice stores only configuration and a head pointer (as a ContentId) to the graph source's latest index root/manifest. Snapshot history (if any) lives in graph-source-owned manifests in the content store.
Query Execution
When querying a graph source:
- Resolution: Query engine resolves graph source from nameservice
- Type Detection: Determines graph source type (BM25, Vector, etc.)
- Specialized Execution: Routes to type-specific query handler
- Result Integration: Results integrated with regular graph queries
Performance Characteristics
Each graph source type has different performance characteristics:
- BM25: Fast keyword search, relevance scoring
- Vector: Approximate similarity search, configurable accuracy/speed tradeoff
- Iceberg: Columnar storage, efficient for analytical queries
- R2RML: Depends on source database performance
Use Cases
Multi-Modal Search
Combine full-text search, vector similarity, and graph queries:
{
"@context": {
"ex": "http://example.org/",
"f": "https://ns.flur.ee/db#"
},
"from": "products:main",
"select": ["?product", "?textScore", "?vectorScore"],
"values": [
["?queryVec"],
[{"@value": [0.1, 0.2, 0.3], "@type": "https://ns.flur.ee/db#embeddingVector"}]
],
"where": [
{ "@id": "?product", "ex:category": "electronics" },
{
"f:graphSource": "products-search:main",
"f:searchText": "wireless",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?textScore" }
},
{
"f:graphSource": "products-vector:main",
"f:queryVector": "?queryVec",
"f:searchLimit": 10,
"f:searchResult": { "f:resultId": "?product", "f:resultScore": "?vectorScore" }
}
],
"orderBy": [["desc", "(?textScore + ?vectorScore)"]]
}
Vector/HNSW graph sources are currently queried via JSON-LD Query using f:* patterns (e.g. f:graphSource, f:queryVector, f:searchResult). SPARQL query syntax for HNSW vector indexes is not currently available.
Data Lake Integration
Query both graph and tabular data:
SELECT ?customer ?graphData ?lakeData
FROM <customers:main> # Graph ledger
FROM <iceberg:sales:main> # Iceberg graph source
WHERE {
# Graph data
?customer ex:preferences ?graphData .
# Data lake data
GRAPH <iceberg:sales:main> {
?sale ex:customer ?customer .
?sale ex:total ?lakeData .
}
}
Hybrid Search
Combine semantic and keyword search:
{
"@context": {
"f": "https://ns.flur.ee/db#"
},
"from": "documents:main",
"select": ["?document"],
"where": [
{
"f:graphSource": "documents-search:main",
"f:searchText": "machine learning",
"f:searchLimit": 20,
"f:searchResult": { "f:resultId": "?document" }
}
]
}
Semantic similarity via HNSW vector indexes is also queried via JSON-LD Query using f:* patterns. SPARQL syntax for BM25 and vector index search is not currently available.
Best Practices
Graph Source Design
-
Choose Appropriate Type: Match graph source type to query patterns
- Keyword search → BM25
- Semantic search → Vector
- Analytics → Iceberg
- SQL integration → R2RML
-
Configuration Tuning: Optimize graph source parameters
- BM25: Tune k1 and b for relevance
- Vector: Choose appropriate distance metric
- Iceberg: Optimize partition strategy
-
Dependency Management: Understand source data dependencies
- BM25/Vector: Keep in sync with source ledger
- Iceberg: Handle schema evolution
- R2RML: Map schema changes
Performance Optimization
-
Index Maintenance: Keep graph source indexes up-to-date
- Monitor indexing lag
- Tune indexing frequency
- Handle large data volumes
-
Query Planning: Optimize queries using graph sources
- Use graph sources for appropriate query patterns
- Combine with graph queries efficiently
- Consider cost of graph source queries
-
Caching: Cache frequently accessed graph source results
- Cache query results when appropriate
- Consider graph source snapshot caching
- Balance freshness vs performance
Operational Considerations
-
Monitoring: Track graph source health
- Index build status
- Query performance
- Storage usage
-
Backup: Include graph sources in backup strategy
- BM25 indexes can be rebuilt (or restored from stored snapshots/manifests, depending on configuration)
- Vector indexes are stored as head snapshots (time-travel not supported in v1)
- Iceberg metadata in nameservice
-
Scaling: Plan for graph source scaling
- BM25: Scale with source ledger size
- Vector: Scale with embedding count
- Iceberg: Leverage Iceberg partitioning
Comparison with Traditional Approaches
Traditional Architecture
Application
├── Graph Database (Neo4j, etc.)
├── Search Engine (Elasticsearch)
├── Vector DB (Pinecone, etc.)
└── Data Lake (Spark, Presto)
Challenges:
- Multiple systems to manage
- Data synchronization complexity
- Different query languages
- Separate authentication/authorization
Fluree Graph Source Architecture
Application
└── Fluree
├── Graph Ledgers
├── BM25 Graph Sources (built-in)
├── Vector Graph Sources
└── Iceberg Graph Sources
Benefits:
- Single query interface (SPARQL/JSON-LD Query)
- Unified access control (policy enforcement)
- Consistent time-travel across all data
- Simplified operations and deployment
Graph sources make Fluree a unified platform for graph, search, vector, and data lake queries, eliminating the complexity of managing multiple specialized systems.