FlureeLabs
Published

Fluree LLM Accuracy Benchmark

Comparing GraphRAG vs traditional vector RAG approaches — measuring answer accuracy, hallucination rates, and provenance quality across LLM-powered question answering.

We conducted a comprehensive benchmark comparing GraphRAG (knowledge graph-enhanced retrieval) against traditional vector RAG for LLM-powered question answering.

Methodology

Dataset

A curated corpus of 5,000 questions across three domains:

  • Healthcare: Drug interactions, treatment protocols, clinical guidelines
  • Finance: Regulatory requirements, risk assessment, compliance rules
  • Supply Chain: Vendor relationships, logistics dependencies, certification tracking

Approaches Tested

  1. Vector RAG: Standard chunk-and-embed pipeline with cosine similarity retrieval
  2. GraphRAG (Fluree): Semantic data layer with subgraph retrieval and entity-aware context assembly
  3. Hybrid: Vector retrieval augmented with graph-based relationship expansion

LLM

All approaches used GPT-4o as the generation model, with identical system prompts.

Results

MetricVector RAGGraphRAGHybrid
Answer Accuracy67.2%89.4%91.1%
Hallucination Rate18.3%4.1%3.8%
Multi-hop Accuracy41.5%82.3%85.7%
Provenance Score52.0%96.8%94.2%

Analysis

Where GraphRAG Excels

GraphRAG dramatically outperforms vector RAG on questions that require connecting information across multiple entities. When asked "Which suppliers of Component X are also certified for Standard Y?", vector RAG retrieves relevant text chunks but fails to connect the supplier-component-certification chain. GraphRAG traverses these relationships directly.

Where Vector RAG Holds Up

For simple factual lookups ("What is the dosage for Drug X?"), both approaches perform comparably. The overhead of graph construction is not justified for these cases.

The Hybrid Sweet Spot

The hybrid approach captures the best of both worlds — using vector similarity for initial retrieval, then expanding context through graph relationships. This yielded the highest overall accuracy with only marginally more complexity than pure GraphRAG.

Reproduce These Results

The full benchmark suite, including dataset generation scripts, pipeline configurations, and evaluation harness, is available in our GitHub repository.