AI systems as digital public goods - a dive into what this means from a slightly more technical perspective

May 21, 2025

Author: Ricardo Mirón Torres, DPGA Secretariat's Chief Technology Officer

The DPGA is now accepting submissions for AI systems! This post provides a practical overview and detailed description of the requirements outlined in the DPG Standard that an AI system must meet to be recognized as a digital public good and listed on the DPG Registry.

How we got here ?

A digital public good (DPG) entails much more than simply being open software, open data, an open content collection, or an open AI system. DPGs are open-source solutions that must also be accessible, adaptable, and designed to do no harm. Therefore, to be recognised as a DPG, a solution must demonstrate adherence to the DPG Standard to ensure those important elements are embedded into the design of a digital solution and, by doing so, can facilitate more impactful and safe technology deployment.

In 2023, recognising both the immense potential of AI for development as well as the risks associated with it, the DPGA Secretariat, in collaboration with UNICEF, convened a dedicated Community of Practice (CoP) on AI systems. This group was brought together to specifically examine how the DPG Standard may need to adapt to better determine what constitutes AI systems as a type of DPG and to explore the intersection between open and responsible AI.

Alongside a set of recommendations from the CoP on AI systems, the DPGA Secretariat closely monitored and participated in relevant conversations that were also taking place. This included the OSI's Open Source AI Definition (OSAID), the Linux Foundation’s Model Openness Framework (MOF), and consultations with DPGA members actively working on AI, such as OpenFuture, Creative Commons, and the Open Knowledge Foundation.

Following an open commenting period on GitHub, the DPG Standard Council carefully considered inputs surfaced throughout this process and, as part of a set of updates to the DPG Standard, introduced changes to strengthen the transparency and accountability of AI system DPGs while ensuring that they meet consistent requirements across all DPG categories.

What's an AI system, anyway?

Before diving into the specific updates, it’s valuable to provide an understanding of what is meant by “AI system,” as it has implications for the components that must be DPG Standard compliant. We recognise AI systems as machine-based systems designed to operate with varying levels of autonomy that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, or generate outputs, such as text, images, or sounds.

This understanding aligns with OECD guidelines on AI (Recommendation of the Council on Artificial Intelligence, OECD Publishing, Paris, 2019)

In order to be recognised as a DPG status, AI systems must provide the following components, alongside the following requirements:

Component	Description	Accepted Licenses
Data	[1] The dataset(s) used to train, validate, and test the system.	[1] Conformant licenses to the Open Definition
Code	[2] The code used for data pre-processing, training, validation, and testing. [3] Other code elements, such as inference, supporting libraries, and tools.	[2][3] OSI-approved licenses
Model	[4] The model architecture, type of model, layers, and structure. [5] The model parameters, such as weights, optimizers, coefficients, and other applicable hyperparameters.	[4] OSI-approved licenses [5] OSD-conformant terms

These requirements mean that all data used in an AI system must be open and available. We acknowledge that this is a high bar to set, and it may restrict the number of AI systems that meet the DPG Standard. However, we consider this important, as it reinforces the DPGA’s core beliefs. Access to the data further contributes to advancing the public good, while underscoring our firm commitment to openness that supports accountability, safety, and meaningful societal benefits You can read more about how we arrived at this position in our blogs on the role of open data in AI and its relevance for public interest AI.

What does this mean for documentation?

One of the most valued benefits of DPGs is their potential for reuse. To enable this, clear documentation is needed so that a technically skilled person—who may be unfamiliar with the specific AI system—can understand, use, and adapt it. This is also why, alongside the existing requirements for general software documentation, we now include specific requirements related to documentation of data and models.

While multiple templates for model cards and datasheets exist, we aim to be flexible. Documentation can be submitted in different formats, as long as it contains the following information at a minimum.

Information	Description
Model	Model Overview Name, version, date, developer, description, and contact information. Intended Use Primary intended uses, intended users, and out-of-scope applications. Performance Metrics Key quantitative evaluation metrics, accuracy, precision, recall, and other relevant performance indicators. Limitations Known weaknesses, failure modes, and potential biases.
Data	Data Overview Dataset name/identifier, version, and date, creator/maintainer, use cases, and other general details. Technical Details Data provenance, data dictionary, data schema, unique identifiers, crosswalks to ontologies or vocabularies, data quality, and limitations. Dataset Composition and Characteristics Data instances, number of instances, data format, data fields/features, labels/target variables (if applicable), data splits (if applicable). Data Collection and Preprocessing Data sources, collection process, data cleaning and preprocessing steps, and data labeling.

Information

Description

Model

Model Overview
Name, version, date, developer, description, and contact information.

Intended Use
Primary intended uses, intended users, and out-of-scope applications.

Performance Metrics
Key quantitative evaluation metrics, accuracy, precision, recall, and other relevant performance indicators.

Limitations
Known weaknesses, failure modes, and potential biases.

Data

Data Overview
Dataset name/identifier, version, and date, creator/maintainer, use cases, and other general details.

Technical Details
Data provenance, data dictionary, data schema, unique identifiers, crosswalks to ontologies or vocabularies, data quality, and limitations.

Dataset Composition and Characteristics
Data instances, number of instances, data format, data fields/features, labels/target variables (if applicable), data splits (if applicable).

Data Collection and Preprocessing
Data sources, collection process, data cleaning and preprocessing steps, and data labeling.

Responsible AI considerations

Following best practices and building on top of UNESCO's Recommendation on the Ethics of Artificial Intelligence, we have also introduced specific requirements for how AI systems anticipate, prevent, and do no harm by design.

Developers must provide information on how risk was considered during the development of the AI system - specifically how it was tested for bias, fairness, transparency, security - and whether appropriate mitigation measures were implemented if potential harm was identified. As a result, the following minimum requirements must be met:

Information	Description
Proportionality	Impact on people and vulnerable groups, engagement with stakeholders, principles followed, etc.
Bias and Fairness	Steps to monitor, mitigate, and address biases, fairness assessment, model thresholds, etc.
Risks and Harms	Validation tests, misuse or unintended use, ethical considerations, guardrails, etc.
Mitigations	Accuracy evaluation, model validation and quality assurance, robustness and security, oversight and control, etc.
Transparency	Model explainability, logic, and decision-making, user information, tagging AI-generated content, etc.

Using more comprehensive frameworks for AI risk and responsibility assessments is encouraged, but not required for DPG recognition.

It's a Starting Point

While we try to keep requirements and definitions as broadly applicable as possible, we acknowledge that AI systems are not a single technology, but rather a broad field encompassing various techniques and levels of complexity. From more traditional machine learning models to generative AI, we will continuously evaluate and evolve our DPG Standard as necessary.

These definitions and updates to the DPG Standard are compatible with OSI’s requirements to exercise the basic freedoms of Open Source AI as well as the Model Openness Framework Class I – Open Science Model, making the DPG nomination an operational process for evaluating compliance against multiple specifications of openness.

Chart showing DPG Standard vs other initiatives

This chart provides a quick look at how the DPG Standard is similar to, yet distinct from, other initiatives defining open-source AI.

We would like to thank all the individuals and organisations who have helped shape how we are evolving the DPG Standard to better assess AI system DPGs. Like everything we do at the DPGA, this was far from an independent endeavour.

We are excited to begin a new chapter of accepting AI systems as DPGs, as we know they have an important role to play in the future of digital transformation worldwide. We believe that open, transparent, and safe access to AI systems is not only possible, but essential. We encourage any developer of an AI system who believes their solution can make an impact and may be eligible for DPG recognition to join us by submitting it to the DPG Registry!