Skip to content

The Raspa Data Quality Score: A Cumulative Scale for Cultural Heritage Data Assessment

To ensure the effective documentation, processing, and long-term preservation of cultural heritage information, the Dédalo project proposes the Raspa Data Quality Score a cumulative metric that evaluates data across ten progressive levels of computational readiness, semantic richness, and ethical transparency. This scale reflects the project's commitment to structured, interoperable, and ethically-managed data built entirely on Free and Open Source Software (FOSS).

The Raspa data quality score has two dimensions; Technical dimension and Community / social dimension

Technical dimension

The Raspa score begins at Level 0, assigned to unstructured data such as books, Word documents, and PDFs. While these formats may contain valuable knowledge, they remain machine-opaque and unsuitable for computational processing without prior transformation. Their human readability is not sufficient for automated reasoning or integration into structured knowledge systems like Dédalo (Dédalo can handle this data but it is used as media).

At Level 1, structured data formats such as spreadsheets (CSV, Excel) and relational databases (SQL) become machine-readable and receive 1 Raspa point. However, they exhibit significant limitations: structural rigidity, limited support for semantic relationships, and insufficient compatibility with the conceptual models required in cultural heritage documentation.

Level 2 introduces ontologically modeled data, where information is structured using formal ontologies. These representations enable the explicit definition of entities and relationships, domain-specific modeling, and support for inferencing, critical capabilities for managing complex heritage knowledge.

Advancing to Level 3, computable data employs standardized computational primitives (e.g., TC39 Temporal for time representation), eliminating syntactic ambiguities and enabling precise temporal and spatial reasoning.

At Level 4, data becomes traceable, incorporating robust provenance tracking and version control. These systems record the full history of modifications, including who made each change, when it occurred, and the rationale behind it, ensuring transparency, accountability, and scholarly reproducibility. Traceable data enables users to assess the origin, evolution, and reliability of the information, supporting responsible reuse and long-term stewardship.

Level 5 assesses the epistemological flexibility of the model, awarding points to data structures capable of representing multiple perspectives, evolving knowledge, and supporting non-destructive schema changes. This capacity is particularly relevant for heritage knowledge, which is inherently interpretive and temporally dynamic.

At Level 6, contextualization becomes paramount. Data is enriched with metadata that articulates certainty levels, establishes source attribution chains, and embeds domain-specific framing—allowing users to assess the reliability and interpretative lens of the information.

Community / Social dimension

Level 7 recognizes translatable data, where linguistic content is decoupled from core data structures. Systems at this level support internationalization and localization, preserving semantic meaning across multiple languages, an essential requirement for global cultural heritage platforms.

Level 8 concerns openness, recognizing data that is made publicly accessible under clear licensing terms (e.g., Creative Commons) and distributed via REST APIs or equivalent open standards. This level ensures that data is not only reusable but also free from proprietary constraints that hinder scholarly and public access.

At Level 9, data demonstrates multi-standard interoperability. Such datasets maintain compliance with multiple domain standards (e.g., CIDOC CRM, Dublin Core, Nomisma) and support lossless transformation between schemas, ensuring semantic alignment across institutions, disciplines, and technologies.

Finally, Level 10 is reserved for data that is processed entirely through Free and Open Source Software. This level guarantees end-to-end transparency, reproducibility, and ethical integrity by eliminating dependencies on proprietary tools. It reflects the Dédalo project's core philosophy: that cultural heritage data should be freely accessible, verifiably processed, and ethically managed within open infrastructures.

Extra point

Sustainable data. Data that is sustainable over time receives the an extra Raspa score, reflecting its resilience, long-term accessibility, and preservation-readiness. Sustainable data is not only well-structured and ethically processed, but also designed to withstand technological, organizational, and epistemological change.

Key characteristics of sustainable data include:

  • Format durability: Use of open, standardized, and non-obsolete formats (e.g., JSON, XML, TIFF).
  • Long-term storage strategy: Integration with digital preservation infrastructures.
  • Documentation continuity: Thorough metadata, contextual notes, and technical documentation that support future interpretation and migration.
  • Community stewardship and participation: Maintained by active institutions or open communities that ensure updates, backups, and governance.

Sustainable data ensures that cultural heritage remains accessible, intelligible, and reusable not just today, but decades into the future, even as technologies evolve.

The Raspa Score table

Technical dimension

Level Data Quality Tier Key Characteristics Technical Requirements Raspa's
0 Unstructured Data Human-readable only (books, PDFs) No computational structure
1 Basic Structured Data Machine-readable tables (CSV, Excel, SQL) Relational schemas raspa
2 Ontologically Modeled Formal semantic relationships Domain ontologies raspa raspa
3 Computable Data Native machine processing (ECMA TC39 temporal dates, geocoordinates as geojson) Data type standardization raspa
5 Traceable Data Full provenance tracking Immutable logs, version control (W3C PROV) raspa
6 Reinterpretable Data Epistemic flexibility for future revisions Non-destructive schema evolution raspa raspa
7 Contextualized Data Embedded certainty levels and source attribution Data-frame metadata structures raspa

Community and social dimension

Level Data Quality Tier Key Characteristics Technical Requirements Raspa's
7 Translatable Data Language-agnostic representation Internationalization frameworks, Unicode compliance raspa
8 Open Data Standards-compliant public access Open APIs (REST, GraphQL, etc.), CC licensing raspa
9 Multi-Standard Interoperable Crosswalk capability across schemas CIDOC CRM, Dublin Core, schema.org,, etc. mappings raspa raspa
10 Free Software Processed End-to-end open toolchain FOSS stack verification raspa raspa
+1 Sustainable data Log-term preservation and community evolve Standardized formats, checksums and backups, network working data raspa

You can earn different Raspa points if the quality of your data meets the level, with a maximum of 15.

Raspa Acronym Definition

R – Reliable & Reproducible

A – Adaptable & Aligned

S – Structured & Sustainable

P – Public & Participatory

A – Actionable & Archivable