IsiXhosa is an interesting language that has over 9 million speakers. It is a language often associated with clicks. Our famous musician, the late Mama Africa, Miriam Makeba, made isiXhosa famous by introducing the Click Song, also called Qongqothwane to the world.
Despite the stereotype, isiXhosa is not a clicking language but a Bantu language. (Bantu language is a linguistic classification of a group of languages and should not be confused with the apartheid definition of the word).
Joseph Greenberg, the US linguist classified African languages into four stocks, one of which is the Bantu language that is spoken from Tanzania to South Africa.
Bantu languages belong to the Niger-Congo group of languages. Those of us who speak a Bantu language, and for me Venda, isiXhosa is completely comprehensible.
Malume in Venda is malume in isiXhosa meaning “uncle”, makazi in isiXhosa is makhadzi in Venda meaning “aunt”, while iza apha in isiXhosa is ida hafha in Venda meaning “come here”.
Even though isiXhosa is not that difficult for Bantu language speakers it is very difficult for artificial intelligence machines.
Artificial intelligence (AI) is taking over the world. President Vladimir Putin of Russia has stated that AI is the new arms race. There are many types of AI and one of these is called machine learning. One approach in machine learning is called neural network.
Neural network is inspired by how the brain works. A large neural network is what is called deep learning.
Deep learning is able to do complex tasks such as taking spoken words and translating them into another language.
Deep learning can take words spoken in Chinese and translate them into English and vice versa and it can do this well. IsiXhosa is difficult for these machines to understand.
For those of us who have been working in AI for over 20 years, this is an interesting challenge that needs to be tackled.
Why is isiXhosa such a difficult language for machines? One needs to understand the classification of the Xhosa language.
The people who are today called Xhosas are a mixture of a Bantu tribe that is generally called Nguni and the Khoisan. The Khoisan are the people who spoke the clicking language that uses the four click Khoisan system.
A study that was conducted by the Human Science Research Council (HSRC), found that in terms of genealogy, the Xhosas have the highest Khoisan genes than any other African ethnic group in South Africa. It turns out, that the Xhosa language has the highest incidence of clicks among African languages in South Africa.
This correlation between Xhosas and Khoisan means that among Africans in South Africa, the cross pollination between the first nation, the Khoisan, was the highest with Xhosas linguistically and genetically, than any other ethnic group.
Despite this maximum interaction between the Khoisan and the Xhosas, the Xhosa language is not generally a clicking language. When AI translates spoken words from one language to another, it takes the spoken words, which are in the form of signals and decodes them.
So the EFF Commander in Chief, Julius Malema was right at Mama Winnie Madikizela-Mandela’s funeral, as all spoken words are signals!
These signals need to be deconstructed so that the machine can understand them.
The French mathematician Joseph Fourier in 1822 was the first person to come up with a method of understanding signals. The University of Johannesburg has a course called “Signals and Systems” that does the trick.
Fourier understood that all signals can be represented as a combination of cycles and these in maths language are called sinusoidal functions.
The idea of a signal being represented as a combination of cycles, was also observed by Karl Marx in his critical work Capital. The word that is spoken in isiXhosa is deconstructed using Fourier’s method so that it can be broken down into cycles that the AI machine can be able to understand.
To convert the broken down words into cycles, the signals are put through a window which makes sure that out-of-the-ordinary characteristics are eliminated. This is where the difficulty of the Xhosa language is encountered!
Actually, Xhosa is only 15% clicks and 85% Bantu language. So the window technique thinks the clicks are not part of the language but background noise and thus eliminates them.
But isiXhosa is an important language to be locked out of the fourth industrial revolution. We ought to develop a new window which will not treat these clicks as noise but as an integral part of the language.
The other way to handle this situation, is to discover the new version of Fourier’s method which can take the signals directly and not disregard clicks as noise.
Another technique is to discover new types of AI machines that take the spoken words raw without any pre-processing and not treat these as noise. We will have to discover new forms of algorithms that are decolonised and that take into account the uniqueness of our languages such as in isiXhosa.
What are some of the ideas that need to be explored? In psychology there is a problem called the cocktail party problem which was first described by Colin Cherry in 1953.
Cherry observed that when one is in the middle of a noisy room one is able to hear words from a person she is talking to. So the human ear is able to filter out noise.
With the Xhosa language one should turn the cocktail problem on its head and not filter out the clicks, which are conventionally deemed as noise by AI machines, but be able to take them into account so that the artificial intelligence machine can hear them, understand them and transmit them.
A great deal of work has been done to make artificial intelligence machines understand the identity of the African people.
One example, at the University of Johannesburg, Gugulethu Mabuza-Hocquet has just completed her doctorate on designing algorithms that are able to understand the fact that the difference between the pupil and iris of the eye is sharper among people of European descent than among people of African descent.
These algorithms, therefore, allow biometric systems based on the iris of the eye not to implicitly discriminate Africans in favour of Europeans.
The next step should be to develop better algorithms that understand the Xhosa language. Taking our languages into the digital and the fourth industrial ages is our responsibility.
We cannot just import technology, such as speech recognition machines, but we should adapt them to our particular environments.
If adaptation is not an option, we should discover our own versions of the Fourier theory. This will require our funding agencies, such as the National Research Foundation, to sponsor projects that are rich in local content rather than us solving the problems of other nations and thus subsidising them.
This would require a new sense of confidence and a realisation that the African market is big enough to define its own technological problems and solutions. Any other way will reinforce colonial economic, political, social and technological systems.
* Marwala is the vice-chancellor and principal of the University of Johannesburg and the author of the book Causality, Correlation and Artificial Intelligence for Rational Decision Making. He writes in his personal capacity.