FlureeLabs

IPFS Storage

Fluree can use IPFS as a content-addressed storage backend via the Kubo HTTP RPC API. This enables decentralized, content-addressed data storage where every piece of data is identified by its cryptographic hash.

Feature flag: Requires the ipfs feature to be enabled at compile time. Build with: cargo build --features ipfs

Overview

IPFS storage maps naturally to Fluree's content-addressed architecture. Fluree already identifies every blob (commits, transactions, index nodes) with a CIDv1 content identifier using SHA-256 hashing and Fluree-specific multicodec values. When IPFS is used as the storage backend, these CIDs are stored directly into IPFS via a local Kubo node.

Key properties:

  • Content-addressed: data is identified by its SHA-256 hash, providing built-in integrity verification
  • Immutable: once written, data cannot be modified or deleted (only unpinned for garbage collection)
  • Decentralized: data can be replicated across IPFS nodes without centralized coordination
  • Compatible: Fluree's native CIDs work directly with IPFS (no translation layer needed)

Kubo Setup

Kubo (formerly go-ipfs) is the reference IPFS implementation. Fluree communicates with Kubo via its HTTP RPC API (default port 5001).

Install Kubo

macOS (Homebrew):

brew install ipfs

Linux (official binary):

wget https://dist.ipfs.tech/kubo/v0.32.1/kubo_v0.32.1_linux-amd64.tar.gz
tar xvfz kubo_v0.32.1_linux-amd64.tar.gz
cd kubo
sudo ./install.sh

Docker:

docker run -d \
  --name ipfs \
  -p 4001:4001 \
  -p 5001:5001 \
  -p 8080:8080 \
  -v ipfs_data:/data/ipfs \
  ipfs/kubo:latest

Initialize and Start

# Initialize IPFS (first time only)
ipfs init

# Start the daemon
ipfs daemon

Verify the node is running:

# Check node identity
curl -s -X POST http://127.0.0.1:5001/api/v0/id | jq .ID

Security Note

The Kubo HTTP RPC API (port 5001) provides full administrative access to the IPFS node. By default, it listens only on 127.0.0.1. Do not expose port 5001 to the public internet. If Fluree and Kubo run on different hosts, use SSH tunneling, a VPN, or a reverse proxy with authentication.

The IPFS gateway (port 8080) is read-only and can be exposed publicly if desired.

Configuration

JSON-LD Configuration

{
  "@context": {
    "@base": "https://ns.flur.ee/config/connection/",
    "@vocab": "https://ns.flur.ee/system#"
  },
  "@graph": [
    {
      "@id": "ipfsStorage",
      "@type": "Storage",
      "ipfsApiUrl": "http://127.0.0.1:5001",
      "ipfsPinOnPut": true
    },
    {
      "@id": "connection",
      "@type": "Connection",
      "indexStorage": { "@id": "ipfsStorage" }
    }
  ]
}

Flat JSON Configuration

{
  "indexStorage": {
    "@type": "IpfsStorage",
    "ipfsApiUrl": "http://127.0.0.1:5001",
    "ipfsPinOnPut": true
  }
}

Configuration Fields

FieldTypeDefaultDescription
ipfsApiUrlstringhttp://127.0.0.1:5001Kubo HTTP RPC API base URL
ipfsPinOnPutbooleantruePin blocks after writing (prevents garbage collection)

Both fields support ConfigurationValue indirection (env vars):

{
  "ipfsApiUrl": { "envVar": "FLUREE_IPFS_API_URL", "defaultVal": "http://127.0.0.1:5001" },
  "ipfsPinOnPut": true
}

Architecture

┌──────────────────────┐
│   Fluree Process     │
│  ┌────────────────┐  │
│  │  IpfsStorage   │  │
│  │  (HTTP client) │  │
│  └────────┬───────┘  │
└───────────┼──────────┘
            │ HTTP RPC
     ┌──────▼──────┐
     │  Kubo Node  │
     │  (port 5001)│
     └──────┬──────┘
            │ libp2p
     ┌──────▼──────┐
     │  IPFS P2P   │
     │  Network    │
     └─────────────┘

Fluree communicates with a local Kubo node via the HTTP RPC API. The Kubo node handles peer-to-peer networking, block storage, and replication with the broader IPFS network.

API Endpoints Used

Kubo EndpointPurpose
POST /api/v0/block/putStore a block with optional codec and hash type
POST /api/v0/block/getRetrieve a block by CID
POST /api/v0/block/statCheck if a block exists (metadata only)
POST /api/v0/pin/addPin a block to prevent garbage collection
POST /api/v0/idHealth check (verify node is reachable)

Content Addressing

How Fluree CIDs Map to IPFS

Fluree uses CIDv1 with SHA-256 multihash and private-use multicodec values:

Content KindMulticodecHexExample
Commitfluree-commit0x300001bafybeig...
Transactionfluree-txn0x300002bafybeig...
Index Rootfluree-index-root0x300003bafybeig...
Index Branchfluree-index-branch0x300004bafybeig...
Index Leaffluree-index-leaf0x300005bafybeig...
Dict Blobfluree-dict-blob0x300006bafybeig...
Garbage Recordfluree-garbage0x300007bafybeig...
Ledger Configfluree-ledger-config0x300008bafybeig...
Stats Sketchfluree-stats-sketch0x300009bafybeig...
Graph Source Snapshotfluree-graph-source-snapshot0x30000Abafybeig...
Spatial Indexfluree-spatial-index0x30000Bbafybeig...

These are in the multicodec private-use range (0x300000+). Kubo accepts them via the cid-codec parameter and resolves blocks by multihash regardless of codec. This means Fluree's native CIDs work directly with IPFS without any translation layer.

Cross-Codec Retrieval

IPFS block storage is keyed by multihash internally. A block stored with codec 0x300001 (Fluree commit) can be retrieved using a CID with codec 0x55 (raw) as long as the SHA-256 digest is the same. This simplifies the address-based StorageRead implementation: given a Fluree address containing a hash, we can construct any CID with that hash to fetch the block.

Pinning

What is Pinning?

IPFS nodes periodically garbage-collect unpinned blocks to free disk space. Pinning tells the node to keep specific blocks permanently. Without pinning, blocks may be removed from the local node (though they remain available on other nodes that have them).

Default Behavior

Fluree pins every block on write when ipfsPinOnPut is true (the default). This ensures that:

  • All committed data survives Kubo garbage collection
  • The local node serves as a reliable storage backend
  • Blocks remain available even if no other node has them

When to Disable Pinning

Set ipfsPinOnPut: false when:

  • Running integration tests (faster, less disk usage)
  • Using a separate pinning service (Pinata, web3.storage, etc.)
  • The Kubo node is configured with --enable-gc=false

Pinning Services

For production deployments, consider using a remote pinning service for redundancy:

# Add a remote pinning service
ipfs pin remote service add pinata https://api.pinata.cloud/psa YOUR_JWT

# Pin a CID to the remote service
ipfs pin remote add --service=pinata bafybeig...

Limitations

No Prefix Listing

IPFS is a content-addressed store with no concept of directory listing or prefix enumeration. The list_prefix() operation returns an error. Operations that require listing (e.g., ledger discovery, GC scans) must use an alternative strategy such as manifest-based tracking.

No Deletion

IPFS content is immutable. The delete() operation is a no-op. Data removal is handled through:

  1. Unpinning the block on the local node
  2. Waiting for Kubo's garbage collector to reclaim space
  3. The block may still exist on other IPFS nodes

Nameservice

IPFS storage currently requires a separate nameservice (file-based or DynamoDB) for ledger metadata. A future phase will add IPNS and/or ENS-based decentralized nameservices.

Latency

Writes go through the Kubo HTTP RPC API, adding HTTP overhead compared to direct file I/O. For latency-sensitive workloads, ensure Kubo runs on the same host as Fluree (localhost communication).

No Encryption

The IPFS storage backend does not currently support Fluree's AES256Key encryption. Blocks are stored unencrypted in IPFS. If encryption is needed, use a separate encryption layer or a private IPFS network.

Storage Addresses

Fluree addresses for IPFS storage follow the standard format:

fluree:ipfs://{ledger_id}/{kind_dir}/{hash_hex}.{ext}

Examples:

fluree:ipfs://mydb/main/commit/a1b2c3...f6a1b2.fcv2
fluree:ipfs://mydb/main/index/roots/d4e5f6...c3d4e5.json
fluree:ipfs://mydb/main/index/spot/abc123...def456.fli

The hash hex in the filename is extracted and used to construct a CID for retrieval from IPFS.

Operational Considerations

Disk Usage

Kubo stores blocks in a local datastore (by default, a LevelDB-based flatfs at ~/.ipfs/blocks/). Monitor disk usage:

# Check IPFS repo size
ipfs repo stat

# Run garbage collection (removes unpinned blocks)
ipfs repo gc

Network Bandwidth

By default, Kubo participates in the IPFS DHT and may serve blocks to other nodes. For a private deployment:

# Disable DHT (private node)
ipfs config Routing.Type none

# Or use a private IPFS network with a swarm key
# See: https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#private-networks

Performance Tuning

# Increase concurrent connections
ipfs config Swarm.ConnMgr.HighWater 300

# Adjust datastore cache
ipfs config Datastore.BloomFilterSize 1048576

# Disable automatic GC (if using external pinning)
ipfs config --json Datastore.GCPeriod '"0"'

Monitoring

Check Kubo node health:

# Node identity and version
ipfs id

# Connected peers
ipfs swarm peers | wc -l

# Repo statistics
ipfs repo stat

# Bandwidth usage
ipfs stats bw

Troubleshooting

Connection Refused

IPFS node connection failed: http://127.0.0.1:5001

Causes:

  • Kubo daemon is not running
  • Kubo is listening on a different address/port
  • Firewall blocking the connection

Fix:

# Start the daemon
ipfs daemon

# Or check what address it's listening on
ipfs config Addresses.API

Block Not Found

IPFS block not found: bafybeig...

Causes:

  • Block was never stored on this node
  • Block was unpinned and garbage collected
  • CID format mismatch

Fix:

# Check if block exists locally
ipfs block stat bafybeig...

# Try fetching from the network
ipfs block get bafybeig... > /dev/null

Slow Writes

Causes:

  • Kubo node under heavy load
  • Network latency (if Kubo is remote)
  • Disk I/O bottleneck

Fix:

  • Run Kubo on the same host as Fluree
  • Use SSD storage for the IPFS datastore
  • Consider disabling DHT for private deployments

Future Roadmap

Phase 2: Decentralized Nameservice

The IPFS storage backend is designed as the foundation for decentralized Fluree deployments. Planned additions:

  • IPNS: Publish mutable pointers to ledger state (commit head, index root)
  • ENS / L2 chain: On-chain CID pointers for trustless ledger discovery
  • Two-tier nameservice: Local nameservice for fast reads with async push to decentralized upstream (similar to git push)

Content Pinning Strategy

Future versions may support:

  • Automatic pinning profiles (pin commits only, pin everything, pin nothing)
  • Integration with remote pinning services (Pinata, web3.storage)
  • Manifest-based tracking for GC and prefix listing

Related Documentation