The authors have declared that no competing interests exist.
The goal of this paper is to analyze the registered cases of people who have been infected with Covid-19 registered from throughout the world, using a digital forensic analysis technique that is based on Benford's Law. Twenty-three countries were randomly chosen for this analysis: China, India, Germany, Brazil, Venezuela, Netherlands, Italy, Colombia, Russia, Norway, South Africa, Portugal, Singapore, United Kingdom, Chile, Ecuador, Egypt, Denmark, Ireland, France, Belgium, Australia and Croatia.. We calculate on the p-values based on Pearson χ2 and Mantissa Arc Test according to the results obtained with the first digit. If any country fails these two tests, a third proof will be carried out based on the Freedman-Watson test. The results indicated that results from Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile are suspicions of data manipulation because the numbers fail the Benford’s Law according to the results obtained until April 30, 2020. However, it is necessary to carry out further studies in these countries in order to ensure that they countries manipulate or altered the information.
In December 2019, the first cases of a new coronavirus (2019-nCoV) responsible for atypical pneumonia began to be registered in Wuhan (China). As of April 30, there are more than three million people infected individuals and there have been almost 230,000 deaths in 180 countries throughout the world. For that reason, On March 11, the disease was declared a pandemic by the World Health Organization.
There is currently no vaccine against this disease, and social distancing measures have been the main recommendation of the World Health Organization to prevent the spread of this disease. Recently, a study (written in Spanish) based on differential equations that simulate the transmission dynamics of the disease was presented from the reported cases of infection in four different countries, according to data recorded at Johns Hopkins University
For this reason, it is necessary to validate the data obtained from the infected cases of Covid-19, and thus, we can indicate that the data have not been altered or manipulated or even poorly transcribed for unknown reasons. Remember that the Benford's Law has been used in various scenarios to detect, for example, fraud in campaign finances
In the scientific literature, we only found one paper published in a repository (arXiv) where the author studied the first contagion outbreaks occurred in China until February 13, 2020 using Benford's Law
For this reason, we carry out a more complete study to determine if it is possible to validate the data of people infected by covid-19 using Benford's Law based on Pearson χ2 and the Mantissa Arc Test, and eventually, the Freedman-Watson test to verify that the data has not been manipulated.
where i corresponds to the values that go from 1 to 9 see details in 9. With this distribution, we calculate the Pearson value X2, which means the goodness of fit statistics according to this equation:
where
In the Mantissa Arc Test, itwas necessary to calculate a center of mass of the set of values obtained from the mantissa values when considering that the data is distributed in a unit circle, where the center of the circle is given by:
where x1, x2, …, xNare the data values.
The next step is to determine the length of the mean values L2,which is given as
And finally, the p-value is simply.
And remember that the p-value should be greater than 0,05 that indicates that the data has not been altered or manipulated.
In
China | Italy | Brazil | Colombia | Venezuela | India | Russia | |||||||||||||||
X2 | 3,450 | 33,383 | 6,785 | 16,974 | 8,557 | 12,560 | 22,709 | ||||||||||||||
S. size | 109 | 71 | 58 | 52 | 34 | 62 | 54 | ||||||||||||||
p-value (X2) | 0,903 | 10-5 | 0,560 | 0,030 | 0,381 | 0,128 | 0,004 | ||||||||||||||
p-value (Mantissa) | 0,522 | 10-6 | 0,354 | 0,061 | 0,868 | 0,002 | 0,118 | ||||||||||||||
Germany | Norway | S. Africa | Portugal | Singapore | Netherlands | UK | Chile | ||||||||||||||
X2 | 12,425 | 7,952 | 6,619 | 16,623 | 4,373 | 22,725 | 55,074 | 26,363 | |||||||||||||
S. size | 75 | 63 | 54 | 60 | 91 | 64 | 70 | 58 | |||||||||||||
p-value (X2) | 0,133 | 0,438 | 0,578 |
|
0,822 |
|
10-6 | 10-4 | |||||||||||||
p-value (Man) | 0,386 | 0,331 | 0,372 |
|
0,935 | 10-8 | 10-6 |
|
|||||||||||||
Ecuador | Egypt | Denmark | Ireland | France | Belgium | Australia | Croatia | ||||||||||||||
X2 | 9,408 | 10,194 | 25,535 | 9,174 | 14,025 | 24,605 | 5,011 | 7,868 | |||||||||||||
S. size | 55 | 54 | 64 | 59 | 72 | 62 | 77 | 62 | |||||||||||||
p-value (X2) | 0,309 | 0,252 |
|
0,328 | 0,081 |
|
0,756 | 0,447 | |||||||||||||
p-value (Man) | 0,557 | 0,142 | 10-4 | 0,167 | 0,139 |
|
0,445 | 0,001 |
The countries that pass the two tests which means that the p-value greater than 0,05, are China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France and Australia. This means that the information these countries is valid. In fact, China, Singapore and Australia perfectly are agreed with the Benford's Law. On the other hand, Colombia, India, Russia and Croatia pass at least one of the two tests as shown in
However, Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile do not pass either of the two tests (their values have been highlighted and in red color in the
However, it is necessary to wait until the end of the pandemic to be able to analyze all the data and to ensure that these countries have been able to manipulate the data, or perhaps there are failures due to the omission of registered cases.
The results obtained from the analysis based on Benford's Law of infected cases with Covid-19 obtained that China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France, Australia, Colombia, India, Russia, Croatia don’t manipulate the information register in the Jonhs Hopking dataset. However, Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile do not pass three tests carried out in the paper, and therefore, it is necessary to carry out further studies in these countries in order to ensure that they countries manipulate or altered the information.
In fact, we consider that we must wait until the end of the pandemic until all cases have been registered in all countries, and thus we must ensure the lack of credibility of the data provided in a given country in the world.
I’d like to acknowledgment to Karl E. Longreen for your comments in this manuscript.