Balochi in the digital age: Inside the struggle to preserve a language
In the main city center of Quetta, a decades-long struggle continues inside the building of the Balochi Academy. Here, amidst shelves lined with manuscripts, brittle journals, and newly printed books, a race against time is underway—one that seeks not only to preserve a language but to prepare it for survival in a world dominated by artificial intelligence. The Balochi language, spoken across Pakistan, Iran, Afghanistan, Oman, and among global diaspora communities, stands at a critical turning point. The question is no longer whether the language will survive, but whether it will evolve fast enough to remain relevant in a digital future.
Seventy Years of Cultural Stewardship
For Haibtan Omar, Chairman of the Balochi Academy, the mission his institution carries is both cultural duty and historical responsibility. Since its establishment in 1958, the Academy has worked relentlessly for the preservation and development of Balochi language, literature, and history. Over the course of nearly seventy years, it has published more than seven hundred books, organized seminars and conferences, and supported academic research by publishing M.Phil and PhD theses. It has translated major literary works from around the world into Balochi, produced comprehensive dictionaries, and built a growing archive of linguistic and literary materials. Yet today, its most monumental challenge lies not in preserving the past, but in integrating Balochi into the technologies that shape the present and future.
The Digital Disadvantage: Where Balochi Stands in the AI Era
As artificial intelligence reshapes global communication, only a handful of languages have been successfully embedded into the digital ecosystem—English, French, Chinese, Hindi, Urdu, among others. These languages have met the requirements of machine learning models that depend on massive amounts of text data, structured grammar, and technological investment. Balochi, Haibtan explains, remains at a “basic level” of this evolution. AI language models like ChatGPT, Grok, or Copilot can process and generate text in dozens of languages, but Balochi is hardly recognized by such systems. Even Google Translate, where Balochi has made a modest appearance, offers translations with extremely low accuracy. The issue, Haibtan stresses, is not simply technological; it is infrastructural. “AI models rely on the internet,” he says, “and the internet has very little Balochi.”
Digitizing a Language From Scratch
It is this scarcity of digital data that threatens to push Balochi further into the margins of the modern world. In an era where schools, courts, media, and businesses move rapidly toward digital platforms, having a language that AI systems cannot process means becoming invisible. But invisibility is not an option for the Academy. Instead, it has embarked on one of the most ambitious linguistic projects ever attempted in Balochistan—the development of a comprehensive Natural Language Processing (NLP) system for Balochi.
The Academy’s NLP project aims to make the language machine-readable and AI-compatible. This involves building an enormous digital corpus of Balochi text, something that has never existed at this scale. According to Haibtan, billions of data are required: written material, audio and video recordings, structured grammar rules, and expression patterns. This is the raw material that allows AI systems to learn, analyze, and interact in a language. Without it, no digital tool can recognize or generate Balochi. But with it, the language could one day have its own text-to-speech systems, translation engines, digital libraries, and educational apps.
Inside the Lab: The Linguists Building Balochi for Machines
Inside this project, linguists like Sharaf Shad perform the painstaking labor required to shape a language for the digital world. “In the beginning,” he recalls, “we simply created sentences—two hundred a day—from Balochi to English.” Over time, the work expanded into multilingual modules where every word was categorized by its part of speech, translated into English and Urdu, and embedded into example sentences across all three languages. This process forms the backbone of linguistic databases used in machine learning. “This is how a language becomes readable for AI,” Sharaf explains. “It’s how the system understands grammar, meaning, sentiment, and structure.”
The Academy has also partnered with Thaheer Productions for a two-year global data collection initiative where Baloch communities worldwide will contribute texts, audio files, and video recordings. The aim is to gather enough content to train AI systems effectively and give Balochi a foothold in the digital domain. Complementary tools such as Optical Character Recognition (OCR)—which converts scanned images of Balochi texts into machine-readable formats—are also under development. Alongside this, the Academy has built Balochi-language websites and a mobile app containing texts for students and researchers. But these efforts, though transformative, face severe financial constraints.
Ambition Without Funding: The Cost of Progress
The estimated budget for the NLP project is between three and four crores—a relatively modest sum compared to global language-tech projects, yet far beyond the Academy’s means. “We began in 2021,” Haibtan says, “but limited resources keep delaying our progress.” Despite submitting proposals to the Balochistan government, the federal government, and literary institutions like the Pakistan Academy of Letters, sustained funding has not materialized. For now, the Academy has carved a portion out of its already small annual grant to keep the project alive. But the pace remains slow.
Writing a Nation Into Existence: The Balochi Encyclopedia
Parallel to its technological ambitions, the Academy is working on another monumental undertaking: a multi-volume Balochi Encyclopedia. This project, expected to span seven years and fifteen volumes, aims to gather authentic knowledge about the Baloch people, their history, geography, literature, and culture. In the absence of such centralized knowledge, Balochi has long lacked the academic infrastructure enjoyed by larger languages. Haibtan emphasizes that major encyclopedias—Britannica, Americana, Iranica—serve as identity anchors for nations. Without comparable work, a language remains academically orphaned. Yet, this project too requires substantial funding, estimated at four to five billion rupees. Proposals have been sent to government agencies, foreign embassies, and prominent Baloch figures, but official support remains limited. Still, the Academy has begun its work independently, refusing to let scarce resources halt progress.
Dialect Diversity: A Strength, Not a Problem
One of the persistent internal challenges for Balochi has been the question of script and dialect. With major dialects like Rakhshani, Makrani, and Koh-e-Sulemani spoken across vast regions, the language lacks a standardized script taught uniformly in schools. But Haibtan insists that this diversity is an asset, not a weakness. “A script should represent all dialects,” he says. “Dialect is not a bad thing. It only becomes harmful when we ignore it—because that causes words to go extinct.” For the Academy, the mission is not to homogenize Balochi but to preserve its full linguistic richness and ensure that all dialects contribute to the future digital corpus.
The Educational Void: A Crisis Ignored
Yet the most painful obstacle lies not in technology or dialects, but in education policy. UNESCO mandates that children be taught in their mother tongues in early education—a principle widely practiced around the world. But in Balochistan, Balochi is not taught at the primary level. This disconnect is not new. When the Academy was founded in 1958, its first resolution demanded the introduction of Balochi as a medium of instruction. For decades since then, Balochi advocates have repeatedly submitted resolutions and letters urging the government to implement these changes. In 2013, the provincial government introduced Balochi as a school subject, similar to Urdu or Arabic. But even that modest step has quietly been reversed. Today, Balochi has vanished from government classrooms once again.
A Vanishing Media Landscape
The decline extends to media as well. There was a time when Pakistan Television (PTV) and Radio Pakistan aired high-quality Balochi programs—dramas, feature stories, and youth-driven content. These platforms inspired young talent, strengthened cultural narratives, and nurtured writers and actors. But much of that has faded. Haibtan laments how Balochi programming slots are now filled by shows in other languages, and how young Balochi creators no longer receive opportunities. Even the Culture Department, he adds, has shown minimal interest in collaborating on academic activities, seminars, or conferences. “They focus on optics,” he says. “Festivals and showpieces. Not sustainable development.”
Political Will: The Missing Ingredient
This frustration is echoed by Sangat Rafeeq, former chairman of the Academy and lecturer in the Balochi Department at the University of Balochistan. He believes that both nationalist political parties in the province and federal authorities have contributed to the neglect of the language. “They talk about identity in speeches,” he says, “but did nothing practically to develop Balochi.” At the federal level, he argues, there is a misplaced fear that promoting regional languages might weaken the state narrative. “It is a misunderstanding,” he insists. “The world shows that countries flourish when they strengthen their diverse identities.”
Rafeeq dismisses the provincial government’s argument that language departments lack job opportunities and therefore should be closed or downsized. The University of Balochistan’s Balochi and Brahui departments, he points out, are run by scholars who have produced significant literary work—an accomplishment not often matched across other departments. He argues that governments around the world invest heavily in preserving and promoting languages through funded institutions. Sindh, for instance, has multiple well-funded language and cultural institutes. In contrast, the Balochi Academy operates with a fraction of the resources allocated to Urdu institutions in Islamabad. “If they gave even ten percent of that funding to Balochi departments,” Rafeeq says, “the progress would be extraordinary.”
According to him, if the government genuinely wishes to develop Balochi as a medium of instruction, all the expertise required already exists within the province. “The Balochi Department has produced enough graduates and experts,” he explains. “They can develop syllabi for every grade level. This is how languages like Urdu were developed. But it depends entirely on political will. If they can establish Chinese centers because of CPEC, why not strengthen Balochi?”
A Community That Refuses to Give Up
The financial strain on the Academy exacerbates the problem. For the past seven or eight years, the institution has been distributing books free of cost, simply to encourage readership. Sales, once a modest revenue source, have dwindled due to limited readership. Today, the Academy is almost entirely dependent on government grants—grants that are sometimes reduced instead of increased. Proposals for international funding have been submitted, but the responses remain uncertain.
Despite all these challenges, what emerges from the Academy’s corridors is not despair but endurance. The staff and scholars continue their work quietly, often without recognition, determined to ensure that the language they inherited remains alive for future generations. Their efforts reflect a deeper truth: that languages do not die from lack of speakers alone—they die from lack of systems, institutions, and political will. Balochi has millions of speakers, but without integration into education, media, and technology, it risks fading from formal, intellectual, and digital spaces.
A Future Worth Fighting For
The struggle to digitize Balochi is therefore more than a linguistic project—it is a fight for identity and continuity. If successful, the NLP initiative will provide tools that future generations can use to read, write, study, and even program in Balochi. The encyclopedia will give Baloch children a documented record of their heritage. And if the language finds its way back into classrooms and screens, it will secure its place not just in history, but in the modern world.
Inside his office, surrounded by manuscripts waiting to be digitized and dictionaries waiting to be updated, Haibtan Omar remains quietly hopeful. “The language belongs to the people,” he says. “If the people contribute, if the government supports, if our institutions understand the importance—Balochi will not only survive, it will grow.”
For now, the revolution continues—not in the streets, but in the careful digitization of old manuscripts, the creation of corpora, the slow assembling of sentences, and the conviction that a language as old and rich as Balochi deserves not just preservation, but progress. The outcome of this effort will determine not just how Balochi is spoken, but how it is remembered—and whether it will be heard in the world of tomorrow.