One of the theories for underdeveloped countries' continued poverty is the wide dispersion in productivity between firms. This is misallocation — the existence of small, unproductive firms taking away market share from efficient ones is really bad. Hsieh and Klenow (2009) looks at manufacturing plants in India and China, and shows a big tail of unproductive firms. If they were chased out of the market by competition, and the dispersion of firm productivity — not the average, merely the variance — were to be like the US, total factor productivity would increase 30-50% in China, and 40-60% in India. What is needed for India and China to grow is hard, but simple — any increase in competition will have enormous benefits. The allocative efficiency is so poor as to make improvement simple. So, it is to my chagrin that along comes this paper to kick it right where it hurts — in the measurements.
I think it worthwhile to digress into how we measure productivity. It is not something one wishes to see — never can you regain the easy confidence in your statistics afterwards — but, it is important. A statistical bureau, like the Census Bureau here in the United States, sends out a survey to all the firms that they can find, asking them to provide a lot of information. The big one in the US is the Census of Manufactures, which is done every five years, so 2002, 2007, 2012, and so on. It covers 300,000 manufacturing plants. They need how much you sell; how much you spend, broken down by input; how many people you employ; and so on and so on. This requires quite a lot of “shoe leather”, because firms screw up. They input the wrong values and leave things blank. The first check is when the statistical bureau asks questions whose answers overlap with each other — perhaps they ask for the total value of all products bought, and the values of each product bought. If these sum to different numbers, then you can adjust the stats. If a firm with characteristics very similar to another firm leaves some sections blank, the value can be imputed from the values found in other firms’ responses. Analysts need to go out and check the outliers, and indeed they physically check in on some of the firms and plants at random. Finally, you can check their responses against their tax data. Some of it might be tax avoidance, but it seems more likely that some of it is error. Either way, it can highlight what values are suspicious and need to be checked. The cleaning is far-reaching and substantial. 80% of manufacturing plants have a value in their cleaned data which is different from the raw data.