Frequently Asked Questions

Q: Why a new database of cultural/geographic/linguistic distances?

A: There are several reasons why we decided to update these variables. First, there is new and better data available to construct these measures. Second, there are fundamental changes that happened over time that warrant updates:  this applies in particular to geographic measures (i.e. changes in country borders and codes). Finally, we were able to develop better variable definitions and methodologies that improve the measurement accuracy as well as the interpretability of the resulting variables.

Q: What are the Differences between these Variables and the previous ones in Spolaore and Wacziarg (2016)?

A: The new data improves on the old one along four dimensions:

Q: What is the updated citation for this data?

A: Pellegrino, Bruno, Enrico Spolaore, and Romain Wacziarg. "Barriers to global capital allocation". NBER Working Paper, 2021.

Q: How big of a deal is this, really? Is it just about having a handful more observations?

A: The table below summarizes just how much more data was processed to produce the various measures, with respect to their predecessors. While each data point is a country pair, the data results from the aggregation of data at a more granular level: for cultural distance, it's the number of questions times country pairs times survey waves; for geographic distance, it is the number of city pairs; for linguistic distance, it is the number of language pairs (for the cognate measures, it's the number of language pairs times the number of concepts). The table below compares the number of data points processed in the previous version of the dataset against the new one. It is easier to see that - in terms of underlying data points processed, the new database contains information that is orders of magnitudes larger than the old one.

Data Comparison