Corpus of Historical Mapudungun

by Benjamin Molineaux
Prolegomena to the

Corpus of Historical Mapudungun

1606-1930

Welcome to the documentation page for the development of the Corpus of Historical Mapudungun (CHM)

This space contains information on the process, principles and progress behind generating a linguistically tagged corpus of the earliest writings in Mapudungun, spanning 1606 to 1930.

A Corpus of Historical Mapudungun is the main proposed outcome of a Leverhulme Early Career Fellowship awarded to Benjamin Molineaux at the University of Edinburgh's Angus McIntosh Centre for Historical Linguistics which run between April 2018 and March 2021.

About Mapudungun

Mapudungun is the ancestral language of the Mapuche people of south-central Chile and Argentina. Today Mapudungun is spoken mostly in pockets of Chile's 8th, 9th, 10th and 14th regions by an estimated 250,000 speakers. In Argentina, numbers of self-reportedly competent speakers are around 8,400. In both countries, monolingualism is vanishingly rare, with the range of interlocutors and topics for the use of the language having grown progressively smaller.

The genetic affiliation of Mapudungun is uncertain. A number of claims have been made, ranging from relation to near-neighbours to the north – such as Quechua, Aymara and Pano-Tacanan – and to the south – Kawésqar, Yaghan and Chon (Tierra del Fuego, now extinct) – as well as membership in more distant families such as Arawakan, Mayan, or Aztec and Uto-Aztecan. With strong evidence lacking for any of these theories, the language is often presumed to be an isolate. From a regional-typological perspective, however, it can be grouped with other Andean-type languages with agglutinating, strictly ordered morphology, with a special affinity towards Quechua and Aymara, as far as the tendency for suffixation goes.

Mapudungun is considered a polysynthetic, agglutinating, head-marking language. Insofar as these categories can be considered useful, their fundamental locus of instantiation in the language is the verb, which displays – aside from intricate (obligatory and optional) inflectional morphology – a wide array of derivational and compounding processes. This richness of verbal morphology is in stark contrast with the noun, where, barring compounding (which is highly productive), morphological structure is markedly sparse, displaying no gender and practically no case or obligatory number marking.

The Historical Record of Mapudungun

The first formal description of the language, by Jesuit Luys de Valdivia, was published in 1606 and held that ‘no other language than this runs down from the city of Coquimbo and its surroundings to the island of Chiloé and beyond, and from the foot of the great snow-covered mountain-range to the sea’(`To the reader'). In the same text, Mapudungun is claimed to be mostly homogenous, with some regional variation in vocabulary, though ‘the precepts and rules of this art are general for all the provinces’. Whether or not Valdivia’s assessment was correct, the past 410 years have seen drastic changes both in the language’s geographic distribution and range of use. The CHM aims to describe the variation within the historical record by transcribing and coding a large proportion of the written record for the language.

Source Texts

Most of the earliest material comes from Christian missionaries, though there are also texts from explorers and military men, as well as, later on, a few texts with more explicitly academic and cultural aims. Access to the source materials – both those already added and those yet to be transcribed/tagged for the CHM – can be found here (English) or here (Spanish).

Note that some of the original works are only partially transcribed, as the focus is on those parts where Mapudungun language samples are present, rather than those parts where broader descriptions of language or culture are given in Spanish or any other non-Mapudunun language.

Core CHM texts

1606 L. de Vadlivia Arte y Gramatica [PDF] [Web (Doctrina – Imperial)] [Web (Doctrina - Santiago)]
1621 L. de Vadlivia Sermon en Lengua de Chile [PDF] [Web]
1643 E. Herckmans Vocabula Chilensia [PDF] [Web]
1765 A. Febrés Arte de la Lengua General del Reyno de Chile [PDF]
1777 B. Havestadt Chilidúǵu [PDF] [Web (Vocabulary)] [Web (Indiculus Universalis)]
1897 R. Lenz Estudios Araucanos [PDF]
1903 F. de Augusta Gramática Araucana [PDF] [Web]
1903 F. de Augusta Compendio de historia sagrada (Nidollke dəŋu Dios ñi nùtram) [PDF] [Web]
1910 F. de Augusta Lecturas Araucanas [PDF] [Web]
1911 T. Guevara Folklore Araucano [PDF] [Web]
1911 M. Manquilef Comentarios del Pueblo Araucano [PDF]
1913 T. Guevara Últimas Familias [PDF] [Web]
1922 F. de Augusta Pismahuile [PDF] [Web]
1930 E.W. de Moesbach Pascual Coña [PDF] [Web]

Potential (bonus) CHM texts

1774 Th. Falkner A description of Patagonia
1778 S. Orbanel Doctrina Cristiana
1843 A. Hernández Calzada Confesionario por Preguntas
1863 G. Cox Viaje a la Patagoina
1876 Savino Pequeño manual del misionero para evangelizar a los indios fronterizos
1877 F. Barbará Manual o vocabulario lengua Pampa
1879 Birot/Schuller Pequeño catecismo Castellano-Indio (Araucano)
1888 L. Darapsky La Lengua Araucana
1898 R. de la Grasserie Grammaire Langue Auca
1899 I. Cañiumir Parlamento Imaginario
1899 R. Lenz Maunal de Piedad
1901 Ch. Sadleir and A. Paillalef Ngünechen ñi neyüntükumuyümchi chillka kiñeke trokin
1902 F. de Augusta Dios ñi dəŋu
1906 Ch. Sadleir and A. Paillalef Maleupan antü ta tfa! ñi pelomtuam ta pichi ke che
1910 A. Cañas Estudios en Veliche
1913 M. Manquilef A la raza araucana
1914 M. Manquilef Comentarios del Pueblo Araucano II
1918 Ch. Sadleir and A. Paillalef San Juan ñi chillkantukuelchi we küme dungu
1918 Ch. Sadleir and A. Paillalef San Marcos ñi chillkantukuelchi we küme dungu
1919 Ch. Sadleir and A. Paillalef Tfeichi adniel ta puliwen ka ta nagantü ngillatun meu
1925 F. de Augusta kiñéwn amuaiyu Vade Mecum!

Corpus encoding

As the objective of the corpus is to provide a view into the synchrony and diachrony of lexical, morphological and phonological features, texts are parsed at all three of these levels. The first stage of this process --- lemmatisation --- identifies the key root-elements, as well as the part-of-speech (POS) category for each word, providing a single identifiable label and reducing both morphological and spelling heterogeneity (see 1).

(1)

	Form	Lemma	Transl.	POS
a.	kude-kefuingu	kuden	'to play'	V
b.	kuthe-kalape	kuden	'to play'	V

XML

a.	<w xml:lang="arn" lemma="kuden" pos="V" corresp="play">kudekefuingu</w>
b.	<w xml:lang="arn" lemma="kuden" pos="V" corresp="play">kuthekalape</w>

The second stage is morphological parsing, which identifies individual morphemes beyond the root and labels them according to function (as in 2). The result of both these processes is a TEI-standard XML text with the relevant tags embedded.

(2)

	Form	Morphemes
a.	kude-ke-fu-ingu	ROOT(play)-HABIT-BROKEN.IMPLICATURE-IND.3.DUAL
b.	kuthe-ka-la-pe	ROOT(play)-CONT-NEG-IMP.3.SG

XML

a.	<w> <m baseForm="kude" type="root" corresp="play">kude</m><m baseForm="ke" type="habit">ke</m><m baseForm="fu" type="BI">fu</m><m baseForm="ingu" type="ind.3.d">ingu</m></w>
b.	<w> <m baseForm="kude" type="root" corresp="play">kuthe</m><m baseForm="ka" type="cont">ka</m><m baseForm="la" type="neg">la</m><m baseForm="pe" type="inp.3.sg">pe</m></w>

The final stage of the tagging is grapho-phonological parsing (cf. Kopaczyk et al. 2018), which entails providing sound values for each word (as in 3), following a list of spelling-based rules for each text. The results should effectively reconstruct the phonic structure of each text, such that it can be compared with others from different periods and locations, helping to map phonological variation and change from the bottom up.

(3)

	Form	Sound	Lemma	Transl.	Source	Dialect
a.	<vúta>	[vɨta]	fücha	'old/big'	Valdivia 1606	North
b.	<fücha>	[vɨʧa]	fücha	'old/big'	Augusta 1916	Central/Coast

XML

a.	<m> <c ipa="v">v</c><c ipa="ɨ">ú</c><c ipa="t">t</c><c ipa="a">a</c></m>
b.	<m> <c ipa="f">f</c><c ipa="ɨ">ü</c><c ipa="t">t</c><c ipa="a">a</c></m>

The front end of the corpus provides search options (in both English and Spanish) at all three levels of tagging (word, morpheme and sound), as well as allowing users to correlate these features across texts and with relevant non-linguistic metadata such as date, location, author, genre, etc. Tagged versions of the texts with popup linguistic coding will also be available, allowing for texts to be browsed and downloaded with parallel translations.

Towards a New World philology

The careful transcription and tagging of the historical material that is proposed for this project follows a long tradition at the Angus McIntosh Centre, working mostly on early English and Scots. The importance of these methods, however were not lost to some of the more prominent Mapudungun scholars, as evidenced in the following passage by Dr. Rudolf Lenz, who conducted some of the earliest explicitly academic studies of the language:

Aun el lenguaje vulgar que no tiene ninguna lengua literaria al lado puede ser una cosa mucho ménos determinada de lo que comunmente se cree, i no siempre podrá justificarse que en la edicion filolójica de un testo de siglos pasados se uniforme la ortografia del autor en todos los casos. Cuando la ortografía vacila en lenguas que se escriben poco, esto puede espresar el empleo de diferentes pronunciaciones en una misma palabra, o puede tener la causa de que ninguna de las diferentes maneras de escribir corresponda bien a la pronunciacion. Mucho mas raro será que el autor se haya simplemente equivocada al escribir lo que pronunciaba.

Rudolf Lenz (1897:132)

by Benjamin Molineaux Prolegomena to the Corpus of Historical Mapudungun 1606-1930