According
to Idibon, a company that makes language processing applications, these
are the weirdest languages on different continents:
In
North America: Chalcatongo Mixtec, Choctaw, Mesa Grande Diegueño,
Kutenai, and Zoque; in South America: Paumarí and Trumai; in
Australia/Oceania: Pitjantjatjara and Lavukaleve; in Africa: Harar
Oromo, Iraqw, Kongo, Mumuye, Ju|’hoan, and Khoekhoe; in Asia: Nenets,
Eastern Armenian, Abkhaz, Ladakhi, and Mandarin; and in Europe: German,
Dutch, Norwegian, Czech, and Spanish.
But is
weirdness relative? Maybe the World Atlas of Language Structures
provides a source for objective evaluation. Here's what Idibon did with
it:
For each value that a language has, we
calculate the relative frequency of that value for all the other
languages that are coded for it. So if we had included
subject-object-verb order then English would’ve gotten a value of 0.355
(we actually normalized these values according to the overal entropy for
each feature, so it wasn’t exactly 0.355, but you get the idea). The
Weirdness Index is then an average across the 21 unique structural
features. But because different features have different numbers of
values and we want to reduce skewing, we actually take the harmonic mean
(and because we want bigger numbers = more weird, we actually subtract
the mean from one). In this blog post, I’ll only report languages that
have a value filled in for at least two-thirds of features (239
languages).
No comments:
Post a Comment