Thanks to advances in speech and natural language processing, hopefully one day you can ask your virtual assistant what the best salad ingredients are. At present, you can let your family gadget play music or open it through voice commands, which is a feature that already exists in many devices.
If you speak Moroccan, Algerian, Egyptian, Sudanese, or any other Arabic dialects, these dialects vary by region and some of them are mutually incomprehensible, which is another matter. If your native language is Arabic, Finnish, Mongolian, Navajo, or any other language with high morphological complexity, you may feel left out.
These complex structures stimulated Ahmed Ali’s interest in finding solutions. He is the chief engineer of the Arabic Language Technology Group of the Qatar Computing Research Institute (QCRI), which is part of the Hamad bin Khalifa University of the Qatar Foundation and the founder of ArabSpeech, which is a “for the Arabic language” A community where speech science and speech technology exist”. “
Many years ago, when Ali was at IBM, he was attracted by the idea of talking with cars, appliances, and gadgets. “Can we build a machine that understands different dialects-an Egyptian pediatrician automatically prescribes a prescription, a Syrian teacher helps the children get the core part of the course, or a Moroccan chef describes the best couscous Recipe?” he said. However, the algorithms that power these machines cannot filter about 30 Arabic languages, let alone understand them. Today, most speech recognition tools only support English and a few other languages.
The coronavirus pandemic has further increased the reliance on speech technology, and natural language processing technology helps people comply with home guidelines and measures to maintain physical distance. However, although we have been using voice commands to help e-commerce purchases and manage our homes, there will be more applications in the future.
Millions of people around the world use the open access and unlimited participation of Massive Open Online Courses (MOOC). Speech recognition is one of the main functions of MOOC. Students can search within a specific area of the spoken content of the course and translate it through subtitles. Speech technology enables digital lectures to display spoken language as text in university classrooms.
According to a recent article in Speech Technology magazine, by 2025, the speech and speech recognition market is expected to reach 26.8 billion U.S. dollars, as millions of consumers and companies around the world begin to rely on voice robots to not only interact with their appliances or cars, but also It also improves customer service, promotes healthcare innovation, and increases the accessibility and inclusion of people with hearing, speech, or movement impairments.
In a 2019 survey, Capgemini predicted that by 2022, more than two-thirds of consumers will choose voice assistants instead of going to stores or bank branches; considering that the epidemic has forced the world for more than a year and a half For life and business at home and keeping a distance from others, this ratio may rise reasonably.
Nevertheless, these devices cannot provide services to large areas of the world. For these 30 Arabic languages and millions of people, this is a seriously missed opportunity.
Machine in Arabic
Voice robots that speak English or French are far from perfect. However, teaching machines to understand Arabic is particularly tricky for several reasons. These are three generally recognized challenges:
- The diacritics are missing. Arabic dialects are dialects, mainly spoken. Most of the available text is non-diacritic, which means that it has no accents, such as acute (´) or accent (`) that indicate the sound value of a letter. Therefore, it is difficult to determine the whereabouts of vowels.
- Lack of resources. Lack of labeled data for different Arabic dialects. In general, they lack standardized spelling rules that indicate how to write a language, including norms or spelling, ligatures, word segmentation, and emphasis. These resources are essential for training computer models, and the fact that they are too few hinders the development of Arabic speech recognition.
- Morphological complexity. Arabic speakers do a lot of code conversion. For example, in regions colonized by French—North Africa, Morocco, Algeria, and Tunisia—the dialects contain many borrowed French words. Therefore, there are a large number of so-called foreign words, which cannot be understood by speech recognition technology because these words are not in Arabic.
“But the field is developing at lightning speed,” Ali said. This is a joint effort between many researchers to make it develop faster. Ali’s Arabic Language Technology Laboratory is leading the ArabSpeech project, which combines Arabic translation with the native dialects of each region. For example, Arabic dialects can be divided into four regional dialects: North African dialects, Egyptian dialects, Gulf dialects, and Levantine dialects. However, given that the dialects do not fit the boundaries, this can be refined like a dialect per city; for example, a native Egyptian can distinguish between a person’s Alexandrian dialect and compatriots from Aswan (a distance of 1,000 kilometers on the map) ).
Building a tech-savvy future for everyone
At this point, machines are as accurate as human transcribers. This is largely due to the advancement of deep neural networks. Deep neural networks are a subfield of artificial intelligence machine learning, which relies on the biology of the human brain. And algorithms inspired by functional working methods. However, until recently, speech recognition was a bit hacked. The technology has a history of relying on different modules for acoustic modeling, building pronunciation dictionaries, and language modeling; all modules that require separate training. Recently, researchers have been training models to directly convert acoustic features into text transcription, possibly optimizing all parts of the final task.
Even with these improvements, Ali is still unable to issue voice commands to most devices in Arabic, his native language. “It’s 2021, and I still can’t talk to many machines in my dialect,” he commented. “I mean, now I have a device that can understand my English, but machine recognition of multi-dialect Arabic speech has not happened yet.”
Achieving this goal is the focus of Ali’s work. It finally realized the first converter for Arabic speech recognition and its dialects; one achieved unparalleled performance so far. The technology is called QCRI Advanced Transcription System and is currently being used by Al Jazeera, DW and BBC to transcribe online content.
Ali and his team are now able to successfully build these speech engines for several reasons. He said, first of all, “we need resources across all dialects. We need to accumulate resources before we can train models.” Advances in computer processing mean that computationally intensive machine learning now takes place on the graphics processing unit, which can be processed and displayed quickly Complex graphics. As Ali said, “We have excellent architecture, excellent modules, and have data that represents reality.”
Researchers from QCRI and Kanari AI recently established a model that can achieve human equality in Arabic broadcast news. The system demonstrates the impact of adding subtitles to Al Jazeera’s daily reports. Although the English human error rate (HER) is about 5.6%, studies have shown that due to the complexity of the language and the lack of standard spelling rules in dialect Arabic, the HER of Arabic is significantly higher, reaching 10%. Due to the latest developments in deep learning and end-to-end architecture, Arabic speech recognition engines outperform native speakers in broadcast news.
Although modern standard Arabic speech recognition seems to work well, researchers from QCRI and Kanari AI have focused on testing the boundaries of dialect processing and have achieved great results. Since no one at home speaks modern standard Arabic, we need to pay attention to dialects so that our voice assistant can understand us.
This content was written by a member of the Qatar Foundation and the Qatar Institute of Computing at Hamad bin Khalifa University. It was not written by the editors of MIT Technology Review.