As data scientists, we often treat the data given to us as the raw input. We seldom question where the data came from and how it was collected. Where did these pre-existing inherited biases come from, and can they be eliminated?
Emergent biases are not inherited from the training data. That means these biases can still emerge even when we have trained the machine-learning (ML) algorithms with a cleansed data set with no known biases.
AI bias problem is currently being thought of as a data problem. And to a large extent, if we can fix the biased data, we would’ve addressed most of the AI biases. However, not all biases are bad. In fact, biases are often introduced in training data to improve the performance of the trained model.
Today, most AI practitioners in the industry are treating AI bias as a data problem. To a large extent, if we can fix the biased data, we fix the AI bias problem. However, fixing the biased data is not the only way to address the problem.