World Kiswahili Language Day: A Call To Expand Africa’s Digital Linguistic Presence?

By Soko Directory Team / Published July 8, 2024 | 12:48 pm



Swahili

Infolge of the foregoing over the past few weeks, Kenya has seen a rapid rate of Civic Tech development at speeds never experienced before. As the beloved Gen Z took to the streets to express their displeasure with the Kenya Finance Bill 2024, a digital movement was coming together on the sidelines to empower even more Kenyans with the why behind the protests.

In a matter of days, the Kenya Finance Bill 2024 was chunked into bite sizes and explainers hurriedly provided in short video format on TikTok, X, and Instagram. These explainer videos then made their way to WhatsApp groups, and while the trend started in English, and was mainly targeted at Gen Z, an urgent need to speak to parents and Gen Z in the rural areas arose, and we saw videos being done in over 10 local languages and spread out across all social media platforms. And mostly voluntary. In contrast, an LLM model (Large Language Model) for the Finance Bill was released and made available in English and Kiswahili.

As we observe the growth of digital activism, it highlights the stark gap of an all-inclusive AI ecosystem for Africa. Our observations were two-fold; African languages are still very oral, and critically underrepresented in the realm of AI, as observed in the LLMs that could only do English and Kiswahili, and second, there is a need for a different approach to the creation of the language datasets – as observed in the viral explainer videos.

As fate would have it, Kenya has for the first time hosted the World Kiswahili Language Day celebrations from July 5 to 7, which will feature a series of events, key among them being Usiku wa Mswahili. World Kiswahili Language Day, observed annually on July 7, was established by the General Conference of the United Nations Educational and Scientific Organization (UNESCO) in November 2021, during its 41st session.

The proclamation of the day was in recognition of the role played by the language in cultural preservation, awareness creation, expression, and social participation. This proclamation was not without merit, given the prominence of the language. Kiswahili is the most widely spoken language in Sub-Saharan Africa, with more than 200 million speakers spanning more than 14 countries. Moreover, it is one of the 10 most spoken languages globally. It is the official working language of several organizations: East African Community, Southern African Development Community, and the African Union.

Even with this reach, Kiswahili’s digital presence is limited; as such it is considered a low-resource language. Consequently, the availability of text and speech data available in a digitized format is limited for Kiswahili. Many African languages suffer the same fate i.e. either a limited digital presence or none at all. Africa is considered one of the most linguistically diverse continents in the world, with the number of spoken languages estimated to exceed 2000. The limited availability or lack thereof of data in a digitized form is exclusionary in many ways, especially in this era of the artificial intelligence (AI) revolution, which has seen a proliferation of language applications (text-to-speech, machine translation), virtual assistants such as Alexa and Siri, and tools (ChatGPT, Llama2 and Mistral AI). The development of such tools and applications essentially entails the following: 1) collecting of language data; 2) tool development i.e. training models with collected data; and, 3) deployment of developed tools. Given the dearth of African language datasets, the ability of AI researchers and Natural Language Practitioners (NLP) practitioners to leverage AI techniques to build African language tools is severely limited. This limited availability intensifies the digital divide, mutes the digital presence of millions of Africans, and limits economic opportunities.

Given the centrality of datasets to the innovation of bespoke African models and AI tools, the creation and expansion of language datasets remains imperative. The creation or expansion of language datasets, especially within the context of African languages, is a mammoth task due to the diversity of languages exhibited on the continent. Further, this task becomes explosive when dialects within languages are taken into account. With limited resources (time, talent, and finances), the need for an efficient data creation/expansion pathway becomes pertinent. Indexes constitute an important barometer in the realization of efficiency-oriented efforts. Already indexes such as the Government AI Readiness Index 2023 are proving to be useful tools in assessing the extent to which countries are prepared to integrate AI within the public sector. A similar index geared towards languages such as the Global Language Readiness Index would prove extremely useful to priority setting.

Through collaborative efforts across the spectrum of AI and Natural Language Processing practitioners, such an index would outline critical pillars and indicators for gauging language readiness. Through such an index, indicators measuring the extent to which a given African language is machine-ready concerning text and speech data will prove useful in several ways: identifying gaps in getting African languages AI-ready, identifying priority areas for data collection/expansion efforts, and designing efficient language data collection and expansion strategies. An index affords a systematic efficient approach for the creation/expansion of African language datasets, which in turn will accelerate the innovation of African language tools and applications. Such advancements will ensure that speakers of African languages can access language tools in their tongue, leading to a host of benefits; for instance narrowing of the digital divide, integration of African voices, and accelerated economic growth.

As we continue witnessing the unfolding digital civic engagement spilling across Africa, it is important to ensure that the tools being created are inclusive for all Africans.

Read Also: Google Bard launches In Swahili – First African Language

By Kavengi Kitonga and Dr. Shikoh Gitau




About Soko Directory Team

Soko Directory is a Financial and Markets digital portal that tracks brands, listed firms on the NSE, SMEs and trend setters in the markets eco-system.Find us on Facebook: facebook.com/SokoDirectory and on Twitter: twitter.com/SokoDirectory

View other posts by Soko Directory Team


More Articles From This Author








Trending Stories










Other Related Articles










SOKO DIRECTORY & FINANCIAL GUIDE



ARCHIVES

2024
  • January 2024 (238)
  • February 2024 (227)
  • March 2024 (190)
  • April 2024 (133)
  • May 2024 (157)
  • June 2024 (145)
  • July 2024 (107)
  • 2023
  • January 2023 (182)
  • February 2023 (203)
  • March 2023 (322)
  • April 2023 (298)
  • May 2023 (268)
  • June 2023 (214)
  • July 2023 (212)
  • August 2023 (257)
  • September 2023 (237)
  • October 2023 (264)
  • November 2023 (286)
  • December 2023 (177)
  • 2022
  • January 2022 (293)
  • February 2022 (329)
  • March 2022 (358)
  • April 2022 (292)
  • May 2022 (271)
  • June 2022 (232)
  • July 2022 (278)
  • August 2022 (253)
  • September 2022 (246)
  • October 2022 (196)
  • November 2022 (232)
  • December 2022 (167)
  • 2021
  • January 2021 (182)
  • February 2021 (227)
  • March 2021 (325)
  • April 2021 (259)
  • May 2021 (285)
  • June 2021 (272)
  • July 2021 (277)
  • August 2021 (232)
  • September 2021 (271)
  • October 2021 (305)
  • November 2021 (364)
  • December 2021 (249)
  • 2020
  • January 2020 (272)
  • February 2020 (310)
  • March 2020 (390)
  • April 2020 (321)
  • May 2020 (335)
  • June 2020 (327)
  • July 2020 (333)
  • August 2020 (276)
  • September 2020 (214)
  • October 2020 (233)
  • November 2020 (242)
  • December 2020 (187)
  • 2019
  • January 2019 (251)
  • February 2019 (215)
  • March 2019 (283)
  • April 2019 (254)
  • May 2019 (269)
  • June 2019 (249)
  • July 2019 (335)
  • August 2019 (293)
  • September 2019 (306)
  • October 2019 (313)
  • November 2019 (362)
  • December 2019 (318)
  • 2018
  • January 2018 (291)
  • February 2018 (213)
  • March 2018 (275)
  • April 2018 (223)
  • May 2018 (235)
  • June 2018 (176)
  • July 2018 (256)
  • August 2018 (247)
  • September 2018 (255)
  • October 2018 (282)
  • November 2018 (282)
  • December 2018 (184)
  • 2017
  • January 2017 (183)
  • February 2017 (194)
  • March 2017 (207)
  • April 2017 (104)
  • May 2017 (169)
  • June 2017 (205)
  • July 2017 (189)
  • August 2017 (195)
  • September 2017 (186)
  • October 2017 (235)
  • November 2017 (253)
  • December 2017 (266)
  • 2016
  • January 2016 (164)
  • February 2016 (165)
  • March 2016 (189)
  • April 2016 (143)
  • May 2016 (245)
  • June 2016 (182)
  • July 2016 (271)
  • August 2016 (247)
  • September 2016 (233)
  • October 2016 (191)
  • November 2016 (243)
  • December 2016 (153)
  • 2015
  • January 2015 (1)
  • February 2015 (4)
  • March 2015 (164)
  • April 2015 (107)
  • May 2015 (116)
  • June 2015 (119)
  • July 2015 (145)
  • August 2015 (157)
  • September 2015 (186)
  • October 2015 (169)
  • November 2015 (173)
  • December 2015 (205)
  • 2014
  • March 2014 (2)
  • 2013
  • March 2013 (10)
  • June 2013 (1)
  • 2012
  • March 2012 (7)
  • April 2012 (15)
  • May 2012 (1)
  • July 2012 (1)
  • August 2012 (4)
  • October 2012 (2)
  • November 2012 (2)
  • December 2012 (1)
  • 2011
    2010
    2009
    2008
    2007
    2006
    2005
    2004
    2003
    2002
    2001
    2000
    1999
    1998
    1997
    1996
    1995
    1994
    1993
    1992
    1991
    1990
    1989
    1988
    1987
    1986
    1985
    1984
    1983
    1982
    1981
    1980
    1979
    1978
    1977
    1976
    1975
    1974
    1973
    1972
    1971
    1970
    1969
    1968
    1967
    1966
    1965
    1964
    1963
    1962
    1961
    1960
    1959
    1958
    1957
    1956
    1955
    1954
    1953
    1952
    1951
    1950