← Back to Projects

Knowledge Graph QASP Generation

Master's Thesis — Automated Question-Answer-SPARQL triplet generation from Knowledge Graphs using open-source LLMs

Leuphana University · 2025

QASP pipeline architecture
Full pipeline: DBLP KG → triple extraction → prompt construction → LLM generation → validation → DBLP-QASP dataset

Overview

This thesis investigates whether open-source Large Language Models can generate coherent Question-Answer-SPARQL (QASP) triplets directly from Knowledge Graph triples in a single prompt — without fine-tuning or external supervision.

The task combines semantic parsing and natural language generation simultaneously: the model must infer entities, relations and constraints from structured KG triples, and produce a natural language question, a corresponding answer, and a syntactically valid SPARQL query that retrieves that answer from the DBLP knowledge graph. The research explores how design choices — model size, triple representation format, structural metadata, and prompting strategy — affect generation accuracy.

Key Result

87.38% accuracy

Best configuration: Qwen2.5-Coder-32B · hop-info+-linearised-str format · zero-shot · no structural metadata · 10,704 correct QASP triplets generated from 12,250 total

Pipeline

The automated pipeline takes raw KG triples from the DBLP Knowledge Graph (accessed via a Virtuoso SPARQL endpoint) and transforms them through several stages before prompting the LLM.

1 — KG Data Extraction

Raw 1-hop and 2-hop triple chains are extracted from the DBLP KG via SPARQL queries targeting the Publication entity class. Structural metadata (node types, predicates, relation schema) is extracted separately for use in higher-level SMI experiments.

2 — Triple Preprocessing

Triples are cleaned and URI-prefixed to reduce token length without losing semantic information. Semantically irrelevant triples (e.g. hasIdentifier) are filtered out. 6 triple format types (TFT) are generated: array, linearised-array, linearised-str, hop-repeat, hop-info+, and special-tokens.

3 — Prompt Construction

Prompts are assembled from the formatted triple chain, optional structural metadata (SMI levels −1, 0, 1), a primitive question task composition (single-fact), and a 0-shot or 1-shot example. The LLM is run in chat mode with temperature 0 and a configurable repeat parameter to generate multiple QASPs per prompt.

4 — Extraction & Validation

Question, answer and SPARQL are extracted from LLM output via regex matching **Question:**, **Answer:**, **SPARQL:**. Each SPARQL is executed against the DBLP KG API — accuracy is measured by whether the queried answer exactly matches the LLM-generated answer.

Experiments

Experiment Finding
Triple Format Type (TFT)Linearised string formats outperform array formats across most models. hop-info+-linearised-str is the most consistent top performer.
Structural Metadata (SMI)Counterintuitively, no metadata (SMI = −1) gives the highest accuracy. Adding node class and relation schema context confuses models more than it helps.
N-shot promptingZero-shot outperforms one-shot for larger models (Gemma, Mistral). Smaller models benefit from examples. One example can introduce bias towards that model's style.
Model comparisonQwen2.5-Coder-32B achieves 87.38% avg. in SMI experiment. Gemma 3 27B and Qwen3 32B are close. Mistral 123B is not proportionally better than 32B models.
QASP pipeline instance — raw triples to final QASP triplet
Pipeline instance: raw KG triples → prefixed & formatted triples → prompt → final QASP triplet

Models Evaluated

Model Parameters Best Avg. Accuracy
Qwen2.5-Coder-32B-Instruct32B87.38%
Gemma 3 27B Instruct27B72.2%
Qwen3 32B32B71.4%
Mistral Large Instruct 123B123B63.3%
Codestral 22B22B32.2%
Llama 3.1 8B Instruct8B14.4%

Stack

Layer Technology
LanguagePython 3 · Poetry
LLM APIGWDG HPC interface (open-source models via API)
Knowledge GraphDBLP KG · Virtuoso SPARQL endpoint
DataDBLP-QASP dataset — 10,704 validated QASP triplets