Alternatively, check out Figure 1 of Jean et al. (2016) for a picture of censuses on the African continent. For further reading, Morten Jerven has published an entire book about the incomplete and unreliable statistics in the African context.
That means that researchers and policy makers have had to rely on economic modelling to predict poverty which meant that accuracy heavily depended on the models’ assumptions. And this is where machine learning promises to be the next big thing, thanks to two recent developments. First, the amount of data from developing countries has increased dramatically thanks to cheap high-resolution satellite imagery and growing mobile phone penetration. Second, computing power has massively increased, making it possible to mine incredibly large datasets for patterns. Thanks to these recent advances in data availability and computing power, researchers have started to use big data and machine learning to predict income levels in developing countries.
The algorithm by Jean et al. (2016), for instance, learned to recognize features such as metal roofs or paved roads and associate these with higher incomes. As another example, Burke and Lobell’s (2017) algorithm managed to recognize and assess the yields of maize fields in western Kenya. Blumenstock et al. (2015) used mobile phone records to accurately predict wealth and asset distributions in Rwanda.
But how exactly did these algorithms learn that certain features in the data are good predictors of poverty or agricultural productivity?
The answer lies in the use of a second training dataset that these algorithms take as the true reality they are trying to arrive at just using satellite images or mobile phone records. Jean et al. (2016) fed the algorithm night-time satellite images (we know nocturnal light is a good predictor of income) so it could look for features in daytime images that correlated with more light at night, and thereby relative affluence. Both Burke and Lobell (2017) and Blumenstock et al. (2015) had a more tedious task: both their studies administered on-the-ground surveys asking about households’ farm yields and wealth, respectively. These were then passed to the algorithm as the information to be predicted using only the satellite photos of maize fields and mobile phone records respectively. All three papers managed to predict poverty or agricultural yields pretty well and further applications of machine learning in measuring socioeconomic outcomes spring up everywhere.
But that doesn’t mean policymakers can put away the census. Three major limitations to using machine learning to predict socioeconomic outcomes arise, all of which relate to the training data.