Linguistic Distance

Given a family tree of languages, the (tree-based) linguistic distance between two countries, is defined as the expected normalized tree distance between the languages spoken by two individuals randomly drawn from the population of those two countries. The cognate-based linguistic proximity is defined as the expected lexical similarity between the languages spoken by two individuals randomly drawn from the population of those two countries

The linguistic distance data can be downloaded here:

Stata (.dta)
Comma-Separated Values (.csv)

Dataset Format:

Data type: cross-sectional.
Number of languages processed: 6,737
Number of countries covered: 242.
Number of dyads: 58,564.

Variables included:

Linguistic Distance, tree-based (users-weighted)
Linguistic Proximity, cognates-based, CogNet (users-weighted)
Linguistic Proximity, cognates-based, GLED (users-weighted)
Common Official Language (dummy variable)

Methodology

Page updated

Google Sites

Report abuse