Can Benford’s law can help the IRS detect tax evasion just by looking at the first digit of figures entered on tax returns?
Benford’s law — also called the Newcomb–Benford law, the law of anomalous numbers, and the first-digit law — is an observation about the distribution of first digits in unmanipulated numerical data sets. Benford’s law states that the first digit in naturally occurring collections of numbers is more likely to be small than large.
A Bit of History
In 1881, American astronomer Simon Newcomb was reviewing books of logarithm tables when he noticed that pages beginning with the numeral 1 were more worn and dirty than later pages. He concluded they were used more.
Four decades later, General Electric
GE
Benford concluded that, in a population of naturally occurring multidigit numbers, those numbers beginning with 1, 2, or 3 must appear more frequently than those beginning with 4 through 9. The numbers’ first digits will be distributed in a predictable manner; and patterns will also occur in the second and third digits.
Benford’s Law Explained
In the number set 0 to 99, 11% of the numbers start with 1 — likewise, 11% start with every other digit from 2 to 9. In the number set 0 to 199, more than 50% of the numbers start with 1 and less than 6% start with 2 to 9. In the number set 0 to 299, 37% start with 1, 37% start with 2, and 3.7% each start with 3 through 9. Over a large enough data set, the distribution of leading digits follows a pattern.
When number sets obey Benford’s law, the number 1 is the first digit about 30% of the time, while 9 is the leading digit less than 5% of the time. The distribution of first digits in a series of numbers is not uniform. If the digits were distributed uniformly, then every number would occur as the leading digit about 11% of the time — a 1 out of 9 chance that each number 1 through 9 will occupy the first spot — but Benford observed that the first digits of naturally occurring multidigit numbers follow a different pattern.
In 1938, Benford tested his hypothesis with data across 20 domains, including 20,229 unique observations. His diverse data sets included the surface areas of 335 rivers, the sizes of 3,259 U.S. populations, and 1,800 molecular weights. The data analysis supported Benford’s theory that the first digit of a data set follows a logarithmic progression.
Using Benford’s Law to Detect Fraud
Benford’s law offers a tool for fraud detection. Offenders rarely stop to consider Benford’s law when creating false transaction documents.
Benford’s law can uncover fictitious numbers in random data sets because it detects manual intervention in otherwise automated transaction activity. Data manipulated for tax evasion purposes will likely deviate from Benford’s law.
A Benford’s law analysis presents a null hypothesis and an alternate hypotheses. The null hypothesis prevails when there is no statistically significant difference between the observed and expected frequencies of the first digit — that is, the set obeys Benford’s law — which suggests that the data are not compromised.
The alternate hypothesis is the opposite: It prevails when there is a statistically significant difference between expected and observed frequencies.
The null hypothesis is what is assumed to be true absent evidence to the contrary. The starting assumption is that the data are not compromised. The expected frequency reflects what the sample data would be if the data were not compromised.
If there is a statistically significant difference between the observed and the expected frequencies, the data may have been manipulated. In that case, the null hypothesis is rejected in favor of the alternate hypothesis.
The difference between the expected and observed frequencies is determined through use of a chi-square test. A chi-square statistic measures how a model of expectations compares with actual observed data. The chi-square statistic compares the size of any discrepancies between the expected results and the actual or observed results, given the size of the sample and the number of variables in the relationship.
The chi-square test has the following steps:
- determine the expected frequency of the first digit;
- determine the observed frequency of the first digit;
- calculate the difference between the observed and expected frequencies;
- square the difference (to eliminate distortion from negative numbers); and
- divide the result by the number of expected frequencies.
When the value computed by the chi-square test exceeds a predetermined critical value, it is appropriate to reject the null hypothesis and accept the alternate one. On the other hand, if the chi-square calculated value is less than the critical value, then the null hypothesis is not rejected.
Benford’s Law and Tax Fraud
Benford’s law is a statistical tool that can flag anomalies and discrepancies in a transaction data set to detect interventions or compromises that undermine a data set’s integrity.
Tax authorities can apply a Benford’s law analysis to determine whether data on tax returns have been manipulated. It can also assist the IRS with resource allocation decisions: Deviations from Benford’s law in particular data sets could warrant special scrutiny of particular industries or types of transactions.