What are some of the ethical issues associated with Machine Learning and Computer Vision?

Artificial intelligence vs humanity: how "honest" and “ethical” are machine learning algorithms and what consequences they can lead to? Lead Software Engineer Ihar Nestsiarenia shares his thoughts.

Lead Software Engineer Ihar Nestsiarenia

Ethics and fairness — what is the difference?

— Fairness in machine learning (ML) algorithms focuses on the fact that machine learning models can produce results that are biased towards or against certain groups of people. As a rule, this happens due to poor-quality data. Generally, the idea of fairness in this context focuses on how to correct and eliminate biases.

Ethics, on the other hand, is a philosophical question related to the use of technologies. Are the “bubbles” we are forced into by social network recommendations ethical? We already have driverless cards; we’ve already had the first accidents involving driverless cars, and there will be more. What are the ethical ramifications? How should the laws work in this context? How do ethics and the law overlap?

One burning issue is facial recognition. It has been estimated that cities around the world have more than a billion cameras that they use to track citizens’ movement. States have a lot of information they promise to use to fight crime, but that same information can be used for other purposes. One example we’ve recently seen is the footage taken during protests in several countries that has been used to track and identify protestors.

Where is the line between ethical and unethical?

— A lot depends on the country in question. In China, for example, social reputation, surveillance cameras, and even a system for tracking people of certain nationalities are all considered norms. There are many options to test in terms of technologies; and the idea of social ranking is interesting. I think the main question, however, is whether these systems are used for the good of the society as a whole – or just to control those who don’t toe the party line.

ML and biases

— There are frequent misunderstandings about how ML algorithms work. Some people, for example, think that their rights are being infringed when they indicate their race, and then the computer turns on the camera to check. In fact, ML models take an enormous number of face pictures, and the algorithm determines what’s in the data. The model learns to identify important features in the photographs. If there is a dearth of photographs of people of a certain race, the algorithm can’t learn to recognize them – and will make mistakes as a result. Data are the foundation of any machine learning system. To build a good system, you need a sufficient amount of good data.

At the peak of the COVID-19 pandemic, there were articles published almost daily about how scientists had learned to identify the coronavirus based on chest X-rays or even based on how people’s voices sounded. The problem with most of these algorithms was that ML models learned to identify not the virus, but the data sources. In one study, for example, they used data from chest X-rays of COVID patients from one hospital and X-rays of healthy people from a different hospital. As a result, the model learned only where the X-rays were taken.

It can be challenging to eliminate biases from models and, in some cases, companies compensate for this by taking actions to avoid potential risks. For example, Google and Apple removed the "gorilla" label from their photo search applications due to a past scandal involving incorrect classification of certain groups of people. Many companies are developing specific sets of rules to create fair models, some of which can be found at ai.google and microsoft.com. "Responsible AI" refers to the development and operation of such models. Additionally, there are emerging rules at the governmental level. For instance, the European Union has made attempts to describe how AI systems should be developed to ensure fairness.

The advancement of AI has greatly accelerated with the introduction of generative models such as GPT-3.5 and GPT-4. As a result, Europe is already considering the adoption of legislation to regulate and license such models. Despite the good intentions behind them, regulations imposed by governments can significantly hinder the development of the industry.

Human biases

— Still, the biggest problem with data may be people’s cognitive biases. An example would be looking at one dirty and disheveled person and deciding that all individuals of their race are dirty and disheveled. But that’s a cognitive error. Most problems with fairness in ML algorithms come from such cognitive biases.

And these mistakes can appear at different stages. The data we use for ML may already contain them. When we take these data, filter them, and load them into the model — the errors show up in the model. Finally, what do we do with the results ML gives us? In the coronavirus example above, the error was in our interpretation of the information that the model gave us.

How do we fight this? We have to know what cognitive biases exist and how to catch them before making important decisions. We need to put ML models to work with facts — not with our biases.

Internet search and biased results

— The problem in this situation may be an external factor — a government that can demand that inconvenient news be taken down. The resulting distorted news feed can then be baked into the ML model used by search engines by choosing data and filtering it using a blacklist of mass media outlets.

This goes beyond ethics — it’s about human freedom. Overregulation, blocking of certain opinions, and shifting the news cycle in a specific direction in a specific country…. We justify all the blocking by saying we’re fighting the ”bad guys,“ but who knows who the ”bad guys” will be next.

Driverless cars: who is at fault in a road accident?

— This is more a problem for the lawyers than for algorithms. There are people who remain skeptical of self-driving cars – even in IT. But you can argue that it’s really the same thing as with cars with drivers. When the first automobiles appeared on the roads, there were a lot of discussions regarding accountability. The US and UK even had red flag laws that required cars to have someone walking in front of them waving a red flag to warn of the danger. Of course, that couldn’t last long, and soon enough we transitioned to the traffic rules we know now.

With driverless cars, it’s logical enough to say that if the algorithm is at fault, then the company that created the algorithm is liable. If it’s a technical problem, then the person or people who approved the car during inspection are liable. In any case, it’s a question for the lawyers.

However, a lot depends on the country in question. In the US, for example, case law can play a significant role. If a judge in a particular jurisdiction makes a decision regarding an accident involving a Tesla, that decision may be used as precedent by other judges in similar cases. In Europe, it’s a matter of legislation, and nobody will put driverless cars on the road without first developing the regulatory framework to govern them.

How does society react to facial recognition systems?

— The former VP of Yandex, Grigory Bakunov, created special camouflage makeup to defend against facial recognition. Computer vision (CV) can make ridiculous mistakes that a person never would; for example, if you put a small sticker next to an apple on a table, CV can 100% identify the apple as a banana. Bakunov’s team came up with a way to make the algorithms determine that a face on video is someone else’s face — indeed, a specific someone else’s face.

Asia has stores that sell clothing specifically designed to confuse CV algorithms.

Here, I’d offer a word of caution: you might be able to hide now, but data can live for a very long time — and algorithms are likely to be improved in the future.

EngX AI-Supported Software Engineering

Integrate GitHub Copilot and ChatGPT into your daily work for streamlined, efficient development.

View course

Deepfakes: who’s at fault, and what can we do about them?

— Technology can create fakes that are extremely difficult to differentiate from reality. Facebook has an entire department responsible for creating content moderation algorithms. It’s a constant struggle: you create a new model to identify fakes, and soon enough there’s a new voice distortion algorithm.

It becomes a war between different ML systems. And if a video is of poor quality, and people are watching it on their phones, it’s really difficult to say what’s real and what’s not. This is an entirely new form of fraud.

Deepfakes pose a new challenge for humanity, and there are already companies trying to address this issue. Much of the problem lies in how people consume content. Technical solutions can provide additional information and warn about dangers, but they are not perfect. It's akin to a race between viruses and antivirus software. Therefore, other factors such as education must be taken into account. The world is becoming more complex, with new technologies emerging every day that can improve the lives of billions of people. At the same time, these very technologies can take a nightmarish turn.