SDUs Daisy: A Benchmark for Danish Culture

SDU DAISY is the first version of a dataset designed to evaluate large language models’ understanding of Danish culture, as defined by the official Danish Culture Canon (Kulturkanon, 2006), defined by 746 closed question-answer pairs.

The Canon highlights 108 works across literature, music, visual arts, architecture, design, film, and performing arts. These works form a curated benchmark of what is often considered Denmark’s cultural heritage. By using them as anchors, this dataset enables systematic investigation of how well LLMs can reason about, contextualize, and generate insights into Danish culture.

SDU Daisy Evaluations

Model	Bleu Score	F1 Score	Dataset version	Prompt Template Version
openai/gpt-oss-20b	0.062	0.112	1.0	1.0
openai/gpt-oss-120b	0.126	0.211	1.0	1.0
google/gemma-3-27b-it	0.123	0.193	1.0	1.0
meta-llama/Llama-3.3-70B-Instruct	0.166	0.268	1.0	1.0
mistralai/Mistral-Small-3.1-24B-Instruct-2503-	0.124	0.202	1.0	1.0

Why this dataset?

Cultural Relevance Test – The Canon provides a well-defined cultural benchmark for evaluation.
Knowledge Probing – Randomized prompts (Danish "stikprøvekontrol) test both relevant and less relevant associations with Canon works.
Human Validation – Every generated question/response pair is annotated for validation and relevance, even though we both want to main- and non-mainstream knowledge.

Methodology

Sampling (Stikprøvekontrol)
For each Canon title, random questions are generated — ranging from directly relevant inquiries (e.g., about historical context) to more peripheral or unexpected ones.
Response Collection
LLMs provide answers to these questions, creating a structured dataset of outputs.
Human Evaluation
- Relevance (on-topic vs. off-topic)
- Accuracy (correct vs. incorrect)
- Cultural Insight (does it capture nuance/meaning? - also including small or even niece facts)

Applications

Benchmarking LLM performance on Danish culturally sub-domains
Supporting digital humanities research on how AI engages with cultural canons
Encouraging critical reflection on the boundaries of cultural knowledge encoded in AI systems

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
dataset_files		dataset_files
docs		docs
evaluation		evaluation
model_evals		model_evals
public		public
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SDUs Daisy: A Benchmark for Danish Culture

SDU Daisy Evaluations

Why this dataset?

Methodology

Applications

About

Uh oh!

Releases

Packages

Languages

schneiderkamplab/SDU-Daisy

Folders and files

Latest commit

History

Repository files navigation

SDUs Daisy: A Benchmark for Danish Culture

SDU Daisy Evaluations

Why this dataset?

Methodology

Applications

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages