Big data can shame grand corruption
Future of anti-corruption efforts is linked to the collection, consolidation and analysis of big data
“You are a fraud, this state is hypocritical, this is not real money, you think we are fools,” she was shouting while walking towards me when the security guards took hold of her. I approached her, freed her from the guards and sat at her feet. She was a woman of my mother’s age, carrying so much distrust, anger and despair in her eyes. She opened her envelope containing payment from the Benazir Income Support Programme, took out a bank note of Rs5,000 and a few Rs1,000 notes and asked me why the government was handing out fake money to the people? Smart cards for digital transfers were not yet ready and Balochistan’s beneficiaries had not been paid for the last six months. As an interim solution, the government at the time had decided to pay the lump sum amount for six months after biometric verifications were done by mobile banks. But the beneficiary woman interrupted the proceedings by returning the payment, claiming that the money was fake! I asked her what was real money, then? She opened the careful knots of the left corner of her dupatta, showing me bank notes of Rs10, 50, 100 and 500, and shouted, “This is real money.” It appeared she had never seen a Rs-5,000 bank note before. Government officials burst into laughter, but the sombre reality left me sad and speechless.
A few days ago, when I saw how stashed cash boxes overflowing with bank notes of Rs5,000 were recovered from the residence of the Balochistan finance secretary, I had a flashback of this incident with the poor Balochi woman. She was right: the real money was indeed the one held by the rent-seeking mafia, both within and outside the government. Money carries a different meaning for the poor who barely survive from one day to the next. Was this the hypocrisy of the state that the Balochi woman was referring to?
The Panama leaks, one of the biggest data leaks of our time, have stripped down such hypocrisy from top to bottom. Around 2.6 terabyte of data consisting of 11 million sensitive files and emails reveal patterns that, if individual countries were to investigate, would provide enough evidence to implicate the elite for corruption and state capture of resources. While the Panama leaks provide a rare glimpse into how big data can expose big corruption, there is another leak closer to home that routinely goes unnoticed and is primarily geared towards protecting the powerful. This is the leakage of information that results from a fragmented bureaucratic structure, which regularly collects information on all aspects of private life, ranging from taxes and property to security and travel, but refuses to consolidate and deploy this information in aid of public governance.
Let me explain how and why this is important. Consider the key repositories for collection and storage of private data: the Federal Board of Revenue (FBR), the Federal Bureau of Statistics, the National Accountability Bureau (NAB), the State Bank of Pakistan (SBP) and NADRA. Individually, they possess vast amounts of information on individuals, including their ownership of assets and transactions. Collectively, they represent a big resource that can be used for accountability. The integration of these largely complementary data sources can unmask important patterns that routinely go unnoticed under the state’s radar screen. Such integration, however, requires a greater role for NADRA, which possesses the technical capacity through its expertise in data analytics. A couple of years ago, it was precisely through the consolidation of diverse government databases that NADRA was able to identify about 3.5 million tax evaders. It was estimated at the time that if a basic minimum tax rate were to be applied on these individuals, Pakistan would have collected more resources than its annual bilateral aid flows. If the names of prominent tax evaders were to be released from this database, we will forget the expose from Panama leaks. In a similar vein, it is useful to recall that, prior to the 2013 elections, NADRA had developed an integrated scrutiny system for the Election Commission of Pakistan to inspect nomination papers of election candidates. The system had consolidated prime data sources of NADRA, the FBR, the SBP and NAB’s information repository. This allowed us to identify candidates who were loan defaulters, tax defaulters or NAB convicts. Why election officials, empowered with a wealth of data on candidates, looked the other way or who gave orders to circumvent the system is another story of our dysfunctional electoral process. But the point is that while the debate on accountability rages on, it will do us well to remember that if there were sufficient political will, we do have the technical capacity to support such a process. Many eminent politicians, including mostly recently, Mahmud Khan Achakzai, have called for the creation of a new accountability institution. We can create as many permutations of anti-corruption agencies as we want but this would not solve our problem as long as we fail to enhance their capacity to consolidate and analyse data.
This is where a totally autonomous NADRA’s role is both pertinent and critical. Big data analytics offer the promise of arming smart forensic investigators with the ability to identify patterns of fraud, using both structured and unstructured data. Through analytical models, these data-driven insights can trigger meaningful forensic investigations, allowing the state to prosecute culprits. The integration of multiple data sources provides much richer visibility into various types of analyses of financial transactions, and enhances the connection between government agencies, individuals and companies. Other information from unstructured data, such as emails and interaction on social and electronic media can be integrated to connect the dots. Additionally, text and contextual analytics can be deployed to identify significant transactions with related parties and prominent conflict of interest cases. The resulting predictive and prescriptive models can be used to successfully investigate potentially corrupt practices.
In the future, embedding these models within banking and financial applications will provide operational intelligence that could help users anticipate tax evasion, fraud and abuse, while improving the service levels provided to citizens. More importantly, this can allow tax collectors to take advantage of these opportunities within a real-time window. These models are currently being used in the US, allowing the state of Michigan, for example, to save $1 million daily. Missouri has saved $627 million in delinquent payments since 2004, while Texas has collected $600 million dollars in additional taxes by applying data analytics.
To conclude, the future of anti-corruption efforts is linked to the collection, consolidation and analysis of big data. Through NADRA’s expertise, Pakistan possesses a state-of-the-art technical capacity to do this. But to initiate any anti-corruption programme, we would need to empower NADRA by making it truly independent from the clutches of political incumbents and government diktats. The million dollar question is whether we have the necessary political resolve to deploy this capacity in support of good governance.
Published in The Express Tribune, May 29th, 2016.
A few days ago, when I saw how stashed cash boxes overflowing with bank notes of Rs5,000 were recovered from the residence of the Balochistan finance secretary, I had a flashback of this incident with the poor Balochi woman. She was right: the real money was indeed the one held by the rent-seeking mafia, both within and outside the government. Money carries a different meaning for the poor who barely survive from one day to the next. Was this the hypocrisy of the state that the Balochi woman was referring to?
The Panama leaks, one of the biggest data leaks of our time, have stripped down such hypocrisy from top to bottom. Around 2.6 terabyte of data consisting of 11 million sensitive files and emails reveal patterns that, if individual countries were to investigate, would provide enough evidence to implicate the elite for corruption and state capture of resources. While the Panama leaks provide a rare glimpse into how big data can expose big corruption, there is another leak closer to home that routinely goes unnoticed and is primarily geared towards protecting the powerful. This is the leakage of information that results from a fragmented bureaucratic structure, which regularly collects information on all aspects of private life, ranging from taxes and property to security and travel, but refuses to consolidate and deploy this information in aid of public governance.
Let me explain how and why this is important. Consider the key repositories for collection and storage of private data: the Federal Board of Revenue (FBR), the Federal Bureau of Statistics, the National Accountability Bureau (NAB), the State Bank of Pakistan (SBP) and NADRA. Individually, they possess vast amounts of information on individuals, including their ownership of assets and transactions. Collectively, they represent a big resource that can be used for accountability. The integration of these largely complementary data sources can unmask important patterns that routinely go unnoticed under the state’s radar screen. Such integration, however, requires a greater role for NADRA, which possesses the technical capacity through its expertise in data analytics. A couple of years ago, it was precisely through the consolidation of diverse government databases that NADRA was able to identify about 3.5 million tax evaders. It was estimated at the time that if a basic minimum tax rate were to be applied on these individuals, Pakistan would have collected more resources than its annual bilateral aid flows. If the names of prominent tax evaders were to be released from this database, we will forget the expose from Panama leaks. In a similar vein, it is useful to recall that, prior to the 2013 elections, NADRA had developed an integrated scrutiny system for the Election Commission of Pakistan to inspect nomination papers of election candidates. The system had consolidated prime data sources of NADRA, the FBR, the SBP and NAB’s information repository. This allowed us to identify candidates who were loan defaulters, tax defaulters or NAB convicts. Why election officials, empowered with a wealth of data on candidates, looked the other way or who gave orders to circumvent the system is another story of our dysfunctional electoral process. But the point is that while the debate on accountability rages on, it will do us well to remember that if there were sufficient political will, we do have the technical capacity to support such a process. Many eminent politicians, including mostly recently, Mahmud Khan Achakzai, have called for the creation of a new accountability institution. We can create as many permutations of anti-corruption agencies as we want but this would not solve our problem as long as we fail to enhance their capacity to consolidate and analyse data.
This is where a totally autonomous NADRA’s role is both pertinent and critical. Big data analytics offer the promise of arming smart forensic investigators with the ability to identify patterns of fraud, using both structured and unstructured data. Through analytical models, these data-driven insights can trigger meaningful forensic investigations, allowing the state to prosecute culprits. The integration of multiple data sources provides much richer visibility into various types of analyses of financial transactions, and enhances the connection between government agencies, individuals and companies. Other information from unstructured data, such as emails and interaction on social and electronic media can be integrated to connect the dots. Additionally, text and contextual analytics can be deployed to identify significant transactions with related parties and prominent conflict of interest cases. The resulting predictive and prescriptive models can be used to successfully investigate potentially corrupt practices.
In the future, embedding these models within banking and financial applications will provide operational intelligence that could help users anticipate tax evasion, fraud and abuse, while improving the service levels provided to citizens. More importantly, this can allow tax collectors to take advantage of these opportunities within a real-time window. These models are currently being used in the US, allowing the state of Michigan, for example, to save $1 million daily. Missouri has saved $627 million in delinquent payments since 2004, while Texas has collected $600 million dollars in additional taxes by applying data analytics.
To conclude, the future of anti-corruption efforts is linked to the collection, consolidation and analysis of big data. Through NADRA’s expertise, Pakistan possesses a state-of-the-art technical capacity to do this. But to initiate any anti-corruption programme, we would need to empower NADRA by making it truly independent from the clutches of political incumbents and government diktats. The million dollar question is whether we have the necessary political resolve to deploy this capacity in support of good governance.
Published in The Express Tribune, May 29th, 2016.