The Trouble with Training Sets for Machine Learning

Although the title is Defining what’s ethical in artificial intelligence needs input from Africans is accurate, the author is more generally raising the issues of poor quality training data sets for Machine Learning leads to bias in the results.

The most common type of Machine Learning uses input data called “training sets” to train the machine on what it needs to recognize. The resulting Machine Learning Model will only be as accurate as the input training data set. Current training sets are generally biased leading to Facial Recognition being accurate with white males, than with females or people of color. The article attributes this to representation of these groups in the training data sets.

This is not a new issue, nor exclusively the domain of AI and ML. Five years ago in 2016, Cathy O’Niel wrote Weapons of Math Destruction, which is a very frightening book as it points out the many ways that inequality is already built into algorithms and ML Models.

It’s encouraging that a UNESCO draft recommendation on the ethics of AI states:

We need international and national policies and regulatory frameworks to ensure that these emerging technologies benefit humanity as a whole.

In recent years many frameworks and guidelines have been created that identify objectives and priorities for ethical AI to look beyond technical solutions when addressing issues of bias or inclusivity. Biases can enter at the level of who frames the objectives and balances the priorities.

This is certainly a step in the right direction. But it’s also critical to look beyondtechnical solutions when addressing issues of bias or inclusivity. Biases can enter at the level of who frames the objectives and balances the priorities.

But then the article finishes on the same note of concern that I’m trying to raise:

Challenges like these – or even acknowledgement that there could be such challenges – are largely absent from the discussions and frameworks for ethical AI.