New training course: Responsibly innovating with AI

(NL) Three-day training course "Responsibly innovating with AI"
A photographic rendering of a young black man standing in front of a cloudy blue sky, seen through a refractive glass grid and overlaid with a diagram of a neural network.
Tool

Data Collection Bias Assessment

Introduction

Prevention is better than a cure. Therefore, we offer you the Data Collection Bias Assessment form. Using this form, you make a few choices from the beginning of the data collection so that you can discover possible biases at an early stage.

Bias or prejudices

There are different kinds of prejudices or biases. Friedman and Nissenbaum presented 3 biases that often recur in algorithms. These biases apply to algorithms, but also to artificial intelligence.

  1. Pre-existing bias: bias that stems from social institutions, practices, and opinions.
  2. Technical bias: bias derived from technical limits and requirements.
  3. Emergent bias: bias resulting from the use of an algorithm.

The Data Collection Bias Assessment form can help you make the first 2 biases visible. It allows you to discuss the technical limits without having to share the data that serves as a basis for your AI system. The form allows you to reflect on your team and the possible biases present in your team. You can also use this form as a kind of leaflet to the outside world. In this way, the outside world knows whether the AI system has been trained on the right data to be used in, for example, a new project.

The tool

The Data Collection Bias Assessment is simple to use: the tool consists of a series of questions, each with a sample answer as well. Below, we briefly go over the different elements of the form.

  1. Introduction. Completing this page can be done in a short time. It functions as an overlay to the rest of the form. It helps to introduce the main research question and to introduce the team and the objective of the original project.
  2. Algorithm goal. The overall goal of the project has been introduced on the introduction page. However, often the artificial intelligence system is only a small but integral part of the overall project. Therefore, it is necessary to provide information on the goal of the algorithms used, the assumptions, and the collection instruments.
  3. Design of the algorithm. The information requested here is meant to provide insight into the choices that were made with regard to the study design and to illustrate certain decisions that were made. It is possible that by providing this information, a new user of the artificial intelligence system could decide not to make use of the algorithm if it requires the acquisition of specific equipment, such as smart watches.
  4. Methods and materials. This section focuses on the materials and methods that will be used for the data collection. It provides information regarding validation of the choices made in materials and methods and enables reflection on the strengths and weaknesses of the instruments.
  5. 'Sampling parameters'. The collection of data and prevention of bias is the focus of this form, more specifically, the sampling parameters and how they were selected. The questions offer you the means to discuss your ideal sample, as well as the actual sample. This enables the detection of possible imperfections in your actual sample that could influence the outcomes of your AI system.
  6. Reflection on prejudices/biases. This part of the form focuses on biases that might be present in your dataset from the start. It enables you to question yourself if an already existing bias in your dataset can become a problem in your overall model.

About

Cover image by Alan Warburton / © BBC / Better Images of AI / Quantified Human / CC-BY 4.0

Downloads

Download the form and get started. 

DANDA form (pdf, 2049KB)

Download