Tool

Data Collection Bias Assessment

Prevention is better than cure. Therefore, we offer you the Data Collection Bias Assessment form. Using this form, you make a few choices from the beginning of the data collection so that you can discover possible biases at an early stage.

What kind of biases do we mean by this?

There are different kinds of prejudices or biases. Friedman and Nissenbaum presented 3 biases that often recur in algorithms. These biases do not only apply to algorithms, but also to artificial intelligence.

  1. Pre-existing bias: bias that stems from social institutions, practices and opinions.
  2. Technical bias: bias derived from technical limits and requirements.
  3. Emergent bias: bias resulting from the use of an algorithm.

The Data Collection Bias Assessment form can help you to make the first 2 biases visible. It allows you to discuss the technical limits without having to share the data that serves as a basis for your AI system. The form allows you to reflect on your team and the possible biases present in your team. You can also use this form as a kind of leaflet to the outside world. In this way, the outside world knows whether the AI system has been trained on the right data to be used in, for example, a new project.

Want to know more about the form? Read the manual below and get started.

Manual

The use of the Data Collection Bias Assessment form is not difficult. For every question asked beyond the first page, it is possible to view an example of an answer. In this manual we will briefly discuss the separate parts of the form and their goal.

Example

The example used in the form is a research project on stress levels of white-collar workers. The data collection was conducted via watches, patches and surveys. We use this example to showcase how the form might be completed.

Introduction page

Completing this page can be done in a short time. It functions as an overleaf to the rest of the form. It helps to introduce the main research question and to introduce the team and the objective of the original project.

Algorithm goal

The overall goal of the project has been introduced in the introduction page. However, often the artificial intelligence system is only a small but integral part of the overall project. Therefore it is necessary to provide information on the goal of the algorithms used, the assumptions and the collection instruments.

Study design

The information requested here is meant to provide insight in the choices that were made with regards to the study design and to illustrate certain decisions that were made. It is possible that by providing this information a new user of the artificial intelligence system could decide to not make use of the algorithm if it requires the acquisition of specific equipment, such as smart watches.

Materials and methods

This section focuses on the materials and methods that will be used for the data collection. It provides information with regards to validation of the choices made in materials and methods and enables reflection on the strengths and weakness of the instruments.

Sampling parameters

The collection of data and prevention of bias is the focus of this form, more specifically the sampling parameters and how they were selected. The questions offer you the means to discuss your ideal sample, as well as the actual sample. This enables the detection of possible imperfections in your actual sample that could influence the outcomes of your AI system.

Reflection on bias risk

This part of the form focuses on biases that might be present in your data set from the start. It enables you to question yourself if an already existing bias in your data set can become a problem in your overall model.

At the end, three appendices are included that enable you to answer some specific questions in the form. The form tells you when to use the appendices.