Inferring Prehistory from Language Genealogy
This chart depicts the branching of the Uralic language family into
its subfamilies. The numbers at the bottom indicate the number of
living languages in each subfamily. The expansion of this family was
a gradual process: almost every forking was into just two parts.
Each intermediate branch denotes an independent prehistoric
language whose existence can be inferred; of these only Finno-Ugric
is labeled. The chart is not ``drawn to scale'': linguists can infer
that the Finno-Ugric stage probably lasted for thousands of years.
Several thousand years ago there was a single language, Uralic,
which has diversified until today it has 33 living descendants.
Several explanations might be supposed for this expansion:
A complete explanation may involve a combination of these reasons.
For Uralic, the key reason is probably yet something else:
- Territorial expansion. (Uralic speakers migrated to new
- Cultural expansion. (Non-Uralic speakers adopted parts of
Uralic culture, including its language.)
- Chance. (Ancient Uralic may itself have been part of a family
with 30-odd languages, most have which happened not to survive.)
- Reduced mobility. (The early Uralic speakers were probably
nomadic hunters, who maintained homogenity over a large
territory. As they adopted a sedentary life-style, i.e. farming,
dialectical divergence would naturally occur on a smaller
The Austronesian language family, with 1227 living languages, is
much larger than Uralic, yet its overall family structure has fewer
major branchpoints. Here the branches aren't named, but the number
of languages is shown and their geographic range. (In some cases,
the geographic range may be broader than indicated. The subfamily
with 144 languages, called the Borneo branch of Western
Malayo-Polynesian, includes Malagasy, the national language of
Madagascar.) Of the four major branches of Austronesian, three are
located only in Formosa; the fourth, called Malayo-Polynesian,
comprises languages all the way from Madagascar to Easter Island.
Geographic barriers (the Malayo-Polynesian languages are spoken
on many different islands) are a major reason for the rapid
diversification of the family, but not the whole explanation. The
Philippines were populated before the Austronesian speakers arrived,
but their ancient languages have completely disappeared. The
Austronesian language expanded rapidly because the Austronesian
culture suddenly developed the advantages of agriculture and ocean
navigation, and overwhelmed less advanced cultures.
The most interesting thing about this chart is that you can see
at a glance that the Austronesian homeland was almost certainly
Formosa. Otherwise it would be very unusual that of the four primary
branches of such a vast language family, three would be isolated
among the aboriginal languages of present-day Taiwan.
It seems exciting that the original source of the ancestors of
the Polynesian expansion can be inferred, not from archaeology, but
from simple examination of a language tree.
The Volta-Congo language family in Africa has as many languages
as Austronesian; and its chart shows a pattern very similar to that
of Austronesian (even though there are no ocean barriers to explain
the diversity). Again the explosive form of the language tree would
be explained by the rapid expansion of a superior culture, in this
case the Iron-Age Bantu farmers. And again a glance at the tree
would reveal the original homeland of the Bantu farmers: the valley
of the upper Benue River in present-day Cameroon.
There is another major type of language family tree that needs to be
mentioned. Unlike the Malayo-Polynesian expansion, or the Iron-Age
Bantu revolution, Australia probably experienced no rapid cultural
revolution until modern times. Yet its family tree doesn't resemble
the gradual diversification model of Uralic either. Instead the
Australian languages are all related, but it is unclear which of the
relationships are genetic and which result from borrowing. The chart
is intended to depict languages migrating and borrowing traits from
It can be surmised that the linguistic situation in Australia
developed roughly in three phases:
- Shortly after the arrival of modern man, about 40,000 years
ago, there was a small number of languages. Although earlier man
(H. erectus) may have survived and interbred with modern
man, his languages (if he had advanced speech) became extinct.
For this reason, the languages of early Australia probably
formed a single genetic family, with a tree structure similar to
that of Uralic.
- The languages diversified. After 15 or 20,000 years, the
underlying family tree would no longer be discernable, and the
languages would appear to form several independent families.
Since languages of one family would not necessarily be
contiguous geographically, words and grammatical forms would
diffuse between languages of different families. Thus the
linguistic situation in Australia 20,000 years ago resembled the
situation in South America 500 years ago.
- With the passage of time, the distinction between genetic and
areal links became blurred. Australia linguistics entered an
equilibrium, with the natural tendency to diverge in balance
with diffusion of features between languages. Linguistics in
Africa or Asia may have resembled this equilibrium during the
Early Stone Age, before the equilibrium was punctured by events
like discovery of agriculture which led to dominance by one
language family, as we saw with Uralic, or the sudden explosion
of a single culture, as we saw with Malayo-Polynesian.
Solving Ancient Mysteries
It should be fun to solve ancient riddles by studying such family
trees. Unfortunately, the most useful evidence is often missing or
controversial. After the examples of Polynesians and Bantus, perhaps
the most cited homeland discovery is that of the Algonquins, but
this is exciting only to specialists. (In Columbus' time Algonquin
languages were prominent along the North Atlantic coast, but the
language tree suggests they probably originated much further West.)
Yet one of the most amazing stories of pre-history is hidden in
the most-studied language tree.
In Search of the Indo-Europeans
In Search of the Indo-Europeans is the title of a book by
Mallory in support of Marija Gimbutas' theory of Indo-European
origin. The early Indo-European speakers were the horse-riders of
the East European steppes (Ukraine and southern Russia) who invaded
Central Europe between 4500 and 2500 BC. There is much evidence
supporting the Gimbutas-Mallory theory, such as religious motifs
based on cattle and horses which are seen all the way from Ireland
to India, and the fact that the center of radiation for the
Indo-European languages is near Romania, right on the boundary
between the Eurasian steppes and the fertile breadbasket of Central
Europe. Nevertheless there is much reluctance to accept Gimbutas'
theory. Recently Nature magazine published a paper (though
not written by linguists) claiming to have proven that the
Indo-European break-up occured at least 4000 years earlier than in
the Gimbutas theory.
Two major theories ccompeting with the Gimbutas theory are the
Anatolian hypothesis and the Balkano-Danubian hypothesis. These
competing theories, however, can be ruled out by examining the
structure of the Indo-European language tree, as I now try to
Before trying to draw an inference from the structure of the family
tree, we must agree on the structure. For brevity, the charts to the
left show only six branches (or specima of branches) of the
Indo-European family (Hittite, Italic, Greek, Armenian, Sanskrit,
Baltic), but even when ten branches are shown the usual structure
shown is as in (a) -- Indo-European suddenly exploded, just like
Malayo-Polynesian. Obviously, Indo-European was associated with a
major cultural revolution.
Indo-European has been very extensively studied; yet the experts
have never fully agreed on a substructure other than the sudden
split into ten branches. Nevertheless, there have been attempts to
identify a substructure. These attempts are confused by areal
borrowings. Thus in (b) we indicate similarities between Greek and
Armenian, between Armenian and Sanskrit, and between Sanskrit and
Baltic, some of which may be due to areal rather than genetic
One universally recognized distinction among Indo-European
languages is the split between Centum and Satem
languages. In the charts the Satem branches are shown in yellow. It
is generally agreed that the ancient Proto-Indo-European was a
Centum language; Satem arose from a K-->S sound change.
Professor Ringe and others have used computer software to
determine the detailed structure of the I-E family; their result is
shown in (c). The branches are seen splitting off from a single
core. The time between branchings must be small: if there were more
than a few centuries between the first branching and the last, the
structure would be clear and well-known, rather than controversial
and clarified only after special statistical analysis.
Finally in (d) we pretend that I-E has a normal gradual fanout,
like Uralic. We show Centum subfamilies in one branch and Satem
subfamilies in another. While the Centum-Satem split (along with an
initial split of Hittite from Indo-European proper) may be the
closest thing I-E has to a major branching, even it is not given
that role by most theorists. Some linguists accept an ``Armeno-Greek
hypothesis,'' that Armenian and Greek are particularly close
subfamilies, yet Armenian is Satem and Greek is Centum.
We will argue that the Anatolian and Balkano-Danubian hypotheses
require a definite branching structure, like (d), and are
incompatible with either a tree structure like (a), or one like (c).
Gimbutas' theory involves successive waves of invasion from the
Ukrainian steppes to the Balkans or Central Europe. This is exactly
compatible with diagram (c): ``Kurgan wave 1'' (ca 4300 BC) led to
the splitoff of Hittite, ``Kurgan wave 2'' (ca 3600 BC) led to the
splitoff of Greek, and ``Kurgan wave 3'' (ca 3000 BC) led to the
final separation of the Satem subfamilies. This clearly locates the
change from Centum to Satem in time and space: it occured in the Pit
Grave (Yamnaya) culture of the Pontic-Caspian steppes during the
middle of the 4th millenium BC.
In the Gimbutas theory the close affinity of Greek and Armenian
is no surprise: they each migrated southwest from the Kurgan
homeland in Scythia, but a few centuries apart, Greek just before
and Armenian just after the K-->S sound change.
Anatolian and Balkano-Danubian Hypotheses
Because the Indo-European family fanned out so rapidly (see chart
(a)) into language branches that show up from Ireland to India, one
thing certain is that it was associated with a major technological
or cultural change. An obvious candidate for such a change is the
arrival of agriculture and both the Anatolian and Balkano-Danubian
hypotheses are based on that. That the language explosion didn't
occur before the advent of farming is indicated by reconstructed
words for farming terms represented across many I-E branches.
(Actually there are also such reconstructed words for stockbreeding
terms which suggest the I-E explosion occurred even later.)
Agriculture was invented near the upper Euphrates River, crossed
Anatolia and arrived in Greece by 7500 BC and southern France by
6500 BC. In the Anatolian hypothesis, it was these farming colonies
that brought Indo-European language to Europe.
There was a delay, while Europe's climate warmed, before farming
moved North. In addition to different climate, the different soil
conditions required new techniques, like forest-clearing, crop
rotation and, probably, manure fertilization by herding animals.
From a gestation stage in the Balkans, an early pig/cow/cereal
culture arose in Eastern Europe by 6000 BC, and a similar farming
culture moved into the Danube Basin 5500 BC, reaching northern
France before 4500 BC.
In the Balkano-Danubian hypothesis, the languages of the southern
European farmers (possibly including Etruscan) have become extinct
and the language of farmers in the Balkans and Danube basin was
proto-Indo-European. We will focus on this hypothesis as less
implausible than the Anatolian hypothesis.
The Balkano-Danubian Hypothesis is similar to the Anatolian,
except that one doesn't bother to push the Homeland back past ca
5500 BC. The Danubian Linear Ware culture and affiliated Balkan
cultures like Tripolye-Cucutenis spoke Indo-European in both these
hypotheses. Eventually Indo-European was spoken by the Kurgan people
of southern Russia but Balkano-Danubists may differ as to whether
the language was adopted in the early 6th millenium (Bug-Dniester
pig-breeders lending language to D-Donetz horse-breeders) or early
4th millenium (Tripolye-Cucuteni lending language to Sredny Stog/Pit
Gimbutist Kurgan Theory
Kurganists would agree that the demic (farming) migrations of the
8th and 7th millenia BC probably led to a major linguistic thrust,
from the eastern Mediteranean to southern and then central Europe,
but the languages (call them Old European), though they were
likely spoken by Lengyel, Triplye, Cucuteni, have not survived,
having been overwhelmed by westward thrusts from the Eurasian
The westward thrusts include:
- Kurgan wave 1 ca 4500-4000 BC (incl. poss. Ezero)
- Kurgan wave 2 ca 3800-3400 BC (incl. Glob Amph, Baden, Usatovo
- Kurgan wave 3 ca 3000-2500 BC
Indo-European's westward thrusts (and later those of
Phrygian/Thracians, Scythians, Huns, Magyars, Turks, Mongols) fit a
historical pattern: from the broad steppes of Central Eurasia,
mounted armies can overwhelm Europe's breadbasket. Invasions never
go the other way: it would be like trying to push water the wrong
way through a funnel.
The only pre-Indo-European language to survive in Europe is
Basque, supposed to descend from mesolithic (Solutrean) people.
The Gimbutas Theory fits the facts like a glove. For every
migration needed to explain the arrival of an I-E branch at an
appropriate place and time, there is archaeological evidence of just
such a migration. One doesn't have to guess when the early Greek
speakers left the Kurgan homeland: one sees cultures like Usatovo in
Romania that share cultural motifs with both Kurgans and the
earliest Greeks, and occur at just the right time to fit Gimbutas'
Kurgan Wave 2.
For competing theories, the appropriate migrations and
intermediate cultures are missing: this means different versions of
the Balkanist hypothesis will vary greatly in detail. Let's start by
identifying a few facts that are agreed by all serious theorists.
- The Indo-Iranians were present in southern Russia, with a
Kurgan-type culture, during the 3rd millenium BC and moved south
in two major waves: first the Indo-Aryans, later the Iranians.
(Thus the Balkanists agree that at least some Kurgans eventually
spoke Indo-European; the debate is over how many branches of I-E
descend from the Kurgans.)
- The Corded Ware culture of Northern Europe spoke Balto-Slavic
and proto-Germanic. The Bell Beaker culture in the Danube Basin
and further west also spoke Indo-European languages including
proto-Celtic. (Thus there is agreement about where major
Indo-European branches existed in the 3rd millenium BC. The
debate is about I-E travel in earlier millenia.)
- The Carpathian Mountains are a geographic feature around which
migrations pivot. To their north is the North European Plain, to
the south is the fertile Central European basin, to the east the
East European steppes. The expansion of the I-E language family
involved cultural transfers or migrations along the Black Sea to
the east of the Mountains and, to a lesser extent, across the
Plains to the north of the Mountains. (This much is agreed by
both Gimbutas and the Balkanists. The difference is that
Gimbutas sees the transfers as strictly east-to-west, while
Balkanists require language transfer in the opposite direction.)
Regarding this last point it should be noted that historic
invasions from the steppes into Central Europe are almost too
numerous to list: the Scythians, the Slavs, the Huns, the Magyars,
the Mongols, the Turks, but there is no example of an invasion in
the opposite direction. (Napolean tried it after the invention of
artillery but was defeated.) Moreover, there is much uncontroversial
evidence of prehistoric invasions by Kurgans into the North European
Plain, and to the West of the Carpathian Mountains, but no evidence
of intrusion from the West into the Kurgan homeland.
Defenders of the Anatolian and Balkanist hypotheses base their
case on three ideas:
- A unified Indo-European language ca 4500 BC is too late to
explain the diversity among I-E languages.
- The language explosion must have accompanied a powerful
technological revolution. If that revolution was neither the
Anatolian farming revolution ca 7500 BC, nor the Danubian
farming revolution ca 5500 BC, then what was it? Not the
``Secondary Products Revolution'' (use of copper, wagons, farm
animals) ca 4500 since these ideas were quickly shared by
disparent cultures. Not the Bronze or Iron revolutions which
were too late to explain I-E expansion.
- The homogeneous Danubian culture dominated much of Central and
Western Europe from 5500 BC to 4000 BC. Surely its language
would survive in some form. If that language was not
Indo-European, then how did it disappear so completely?
The first point is not one most linguists would take seriously.
If, as Gimbutas maintains, I-E overwhelmed a non-IE speaking Europe,
it would undergo faster change than an I-E surrounded by I-E
speakers. Language changed faster in prehistoric times, before
liturgical and written language acted as a brake on change.
The second point is to underestimate the horse-riding Kurgan
culture, with its military superiority, dominating social and
religious motifs (even Balkanists have to admit that India was
overwhelmed quickly, with the caste system and Hindu religion due to
Indo-European intruders), and much greater stress on individual
initiative compared with the farming villages of the Danubian
The third point ignores that language replacement does happen.
Celtic dominated West and Central Europe at the time of Caesar, yet
disappeared from the Continent completely. (The Breton language is
not a Continental language, but the result of back migration from
Britain.) Anyway, if the Balkanists assume that Italic and Celtic
subfamilies diverged soon after the Danubian expansion, good for
them! As we soon see, they will be ``hoisted on their own petard!''
We mentioned earlier that Balkanists must assume that the Kurgans
adopted Indo-European at some point. When did this happen? Different
variations of the Balkan hypothesis can put this date as early as
6000 BC (the earliest East European farmers supplied I-E to the
earliest Kurgans) or as late as 3000 BC (Kurgans finally adopted the
European lingua franca just in time for the Indo-Iranian expansion).
However none of the possible dates will be compatible with the I-E
family tree structure (chart (a) or (c) above).
Because the Danubian farmers and Kurgan stockbreeders had
completely different cultures and were isolated from each other, any
theory that has both cultures speaking I-E before 4500 BC would
require a clear division of I-E into 3 to 6 branches (in addition to
Danubian, Kurgan and presumably Tocharian, one would postulate a few
SE European branches to explain Hittite and perhaps Greek). The I-E
tree does not have that character. If this isn't clear, figure out
where Armenian and Greek would fit, remembering their close affinity
and the close affinity of Armenian and Indo-Iranian. You may end up
with a structure where I-E is all Kurgan except Hittite, Celtic and
Italic, yet that doesn't match linguistic evidence.
Therefore the Balkanists need to assume that Kurgans adopted I-E
after 4500 BC, after the I-E breakup was in progress. A powerful
culture might adopt an alien lingua franca but the new language
would surely be transformed greatly, preserving an old Kurgan
substrate. Again there will be a clear distinction between the
Kurgan and non-Kurgan branches of I-E (that is, something like chart
(d)), and again this would not match the linguistic evidence. (Nor
the cultural evidence, as Celtic preserves Kurgan horse-riding and
horse-worship motifs, but must be a non-Kurgan language in any
variation of the Balkanist hypothesis.)
Playing with the Balkanist hypothesis to make it fit the tree,
one inevitably concludes that most of the I-E fanout occured near
Romania and Bulgaria during the Copper Age and early Bronze Age
(during this era, the relatively homogeneous Balkano-Danubist
culture of that area was replaced with a variety of new cultures).
This is essentially the same time and place as the Gimbutas theory
(thereby forfeiting the main raison-d'etre of the Balkanist theory:
to give I-E an earlier more Westerly Homeland). The difference is
that in Gimbutas' theory all I-E branches are due to Kurgan
intrusion while the Balkanists contend that I-E was already spoken,
that Kurgan invaders adopted the Balkan language. This might make
sense if they didn't need to fit Indo-Iranian in. And what about
Balto-Slavic: it's very close to Indo-Iranian, was it also Kurgan?
We have not yet mentioned Tocharian, the exotic branch of I-E
located in China which is now extinct. This language occupies a
position similar to Italic or Celtic in the I-E tree, and its
culture shares motifs with Celtic. It poses no problem for Gimbutas
(part of `Kurgan Wave 2' went East instead of West) but cannot be
handled in any reasonable way by the Balkanists. Geographically it
belongs with the Kurgan branch of I-E but that doesn't fit
linguistic evidence. The Balkanists need to suppose an obscure very
early migration from Central Europe to Asia, with no linguistic
interaction with East Europe. Although Celtic and Italic were almost
adjacent until historic times, with Tocharian many thousands of
miles away, Tocharian, Celtic and Italic would be co-equal branches
of what might be called ``Western I-E'' (there is only weak support
for the so-called Italo-Celtic hypothesis).
Sherlock Holmes once said ``After eliminating the impossible,
whatever remains, however unlikely, is the mystery's solution.''
However bizarre it may seem, the Kurgan horse-riders are indeed the
source of the Indo-European languages now spoken all around the
by James D. Allen