This space will contain information on the process, principles and source texts involved in generating a linguistically tagged corpus of the earliest writings in Mapudungun, spanning 1606 to 1930.
A Corpus of Historical Mapudungun is the main proposed outcome of a Leverhulme Early Career Fellowship awarded to Benjamin Molineaux at the University of Edinburgh's Angus McIntosh Centre for Historical Linguistics and due to run between April 2018 and March 2021.
Mapudungun is the ancestral language of the Mapuche people of south-central Chile and Argentina. Today Mapudungun is spoken mostly in pockets of Chile's 8th, 9th, 10th and 14th regions by an estimated 250,000 speakers. In Argentina, numbers of self-reportedly competent speakers are around 8,400. In both countries, monolingualism is vanishingly rare, with the range of interlocutors and topics for the use of the language having grown progressively smaller.
The genetic affiliation of Mapudungun is uncertain. A number of claims have been made, ranging from relation to near-neighbours to the north – such as Quechua, Aymara and Pano-Tacanan – and to the south – Kawésqar, Yaghan and Chon (Tierra del Fuego, now extinct) – as well as membership in more distant families such as Arawakan, Mayan, or Aztec and Uto-Aztecan. With strong evidence lacking for any of these theories, the language is often presumed to be an isolate. From a regional-typological perspective, however, it can be grouped with other Andean-type languages with agglutinating, strictly ordered morphology, with a special affinity towards Quechua and Aymara, as far as the tendency for suffixation goes.
Mapudungun is considered a polysynthetic, agglutinating, head-marking language. Insofar as these categories can be considered useful, their fundamental locus of instantiation in the language is the verb, which displays – aside from intricate (obligatory and optional) inflectional morphology – a wide array of derivational and compounding processes. This richness of verbal morphology is in stark contrast with the noun, where, barring compounding (which is highly productive), morphological structure is markedly sparse, displaying no gender and practically no case or obligatory number marking.
The Historical Record of Mapudungun
The first formal description of the language, by Jesuit Luys de Valdivia, was published in 1606 and held that ‘no other language than this runs down from the city of Coquimbo and its surroundings to the island of Chiloé and beyond, and from the foot of the great snow-covered mountain-range to the sea’(`To the reader'). In the same text, Mapudungun is claimed to be mostly homogenous, with some regional variation in vocabulary, though ‘the precepts and rules of this art are general for all the provinces’. Whether or not Valdivia’s assessment was correct, the past 410 years have seen drastic changes both in the language’s geographic distribution and range of use. The CHM aims to describe the variation within the historical record by transcribing and coding a large proportion of the written record for the language. Most of the earliest material comes from Christian missionaries, though there are also texts from explorers and military men, as well as, later on, a few texts with more explicitly academic and cultural aims.
Currently contains a list of relevant documents representing data for historical Mapudungun (1606-1930). It will eventually contain brief descriptions (metadata) for the texts, as well as links to the image-based PDF files. As Optical Character Recognition (OCR) outputs are hand checked, plain-text and web-based transcriptions of the materials are being published alongside the image-based PDFs.
Note that some of the original works are only partially transcribed, as the focus is on those parts where Mapudungun language samples are present, rather than those parts where broader descriptions of language or culture are given in Spanish or any other non-Mapudunun language.
Core CHM texts
- 1606 L. de Vadlivia Arte y Gramatica [PDF] [Web (Doctrina – Imperial)] [Web (Doctrina - Santiago)]
- 1621 L. de Vadlivia Sermon en Lengua de Chile [PDF] [Web]
- 1643 E. Herckmans Vocabula Chilensia [PDF] [Web]
- 1765 A. Febrés Arte de la Lengua General del Reyno de Chile [PDF]
- 1777 B. Havestadt Chilidúǵu [PDF] [Web (Vocabulary)] [Web (Indiculus Universalis)]
- 1897 R. Lenz Estudios Araucanos [PDF]
- 1903 F. de Augusta Gramática Araucana [PDF] [Web]
- 1903 F. de Augusta Compendio de historia sagrada (Nidollke dəŋu Dios ñi nùtram) [PDF] [Web]
- 1910 F. de Augusta Lecturas Araucanas [PDF] [Web]
- 1911 T. Guevara Folklore Araucano [PDF] [Web]
- 1911 M. Manquilef Comentarios del Pueblo Araucano [PDF]
- 1913 T. Guevara Últimas Familias [PDF] [Web]
- 1922 F. de Augusta Pismahuile [PDF] [Web]
- 1930 E.W. de Moesbach Pascual Coña [PDF] [Web]
Potential (bonus) CHM texts
- 1774 Th. Falkner A description of Patagonia
- 1778 S. Orbanel Doctrina Cristiana
- 1843 A. Hernández Calzada Confesionario por Preguntas
- 1863 G. Cox Viaje a la Patagoina
- 1876 Savino Pequeño manual del misionero para evangelizar a los indios fronterizos
- 1877 F. Barbará Manual o vocabulario lengua Pampa
- 1879 Birot/Schuller Pequeño catecismo Castellano-Indio (Araucano)
- 1888 L. Darapsky La Lengua Araucana
- 1898 R. de la Grasserie Grammaire Langue Auca
- 1899 I. Cañiumir Parlamento Imaginario
- 1899 R. Lenz Maunal de Piedad
- 1901 Ch. Sadleir and A. Paillalef Ngünechen ñi neyüntükumuyümchi chillka kiñeke trokin
- 1902 F. de Augusta Dios ñi dəŋu
- 1906 Ch. Sadleir and A. Paillalef Maleupan antü ta tfa! ñi pelomtuam ta pichi ke che
- 1910 A. Cañas Estudios en Veliche
- 1913 M. Manquilef A la raza araucana
- 1914 M. Manquilef Comentarios del Pueblo Araucano II
- 1918 Ch. Sadleir and A. Paillalef San Juan ñi chillkantukuelchi we küme dungu
- 1918 Ch. Sadleir and A. Paillalef San Marcos ñi chillkantukuelchi we küme dungu
- 1919 Ch. Sadleir and A. Paillalef Tfeichi adniel ta puliwen ka ta nagantü ngillatun meu
- 1925 F. de Augusta kiñéwn amuaiyu Vade Mecum!
As the objective of the corpus is to provide a view into the synchrony and diachrony of lexical, morphological and phonological features, texts are being parsed at all three of these levels. The first stage of this process --- lemmatisation --- identifies the key root-elements, as well as the part-of-speech (POS) category for each word, providing a single identifiable label and reducing both morphological and spelling heterogeneity (see 1).
|a.||<w xml:lang="arn" lemma="kuden" pos="V" corresp="play">kudekefuingu</w>|
|b.||<w xml:lang="arn" lemma="kuden" pos="V" corresp="play">kuthekalape</w>|
The second stage is morphological parsing, which identifies individual morphemes beyond the root and labels them according to function (as in 2). The result of both these processes is a TEI-standard XML text with the relevant tags embedded. A full 10% of the total word-types in the corpus texts has been tagged in this way, and a machine-learning algorithm is being developed to tag the remainder of the material both at the level of the lemma and the morpheme. Additional hand corrections will be necessary in order to complete the process.
|a.||<w> <m baseForm="kude" type="root" corresp="play">kude</m><m baseForm="ke" type="habit">ke</m><m baseForm="fu" type="BI">fu</m><m baseForm="ingu" type="ind.3.d">ingu</m></w>|
|b.||<w> <m baseForm="kude" type="root" corresp="play">kuthe</m><m baseForm="ka" type="cont">ka</m><m baseForm="la" type="neg">la</m><m baseForm="pe" type="inp.3.sg">pe</m></w>|
The final stage of the tagging will be grapho-phonological parsing (cf. Kopaczyk et al. 2018), which entails providing sound values for each word (as in 3), following a list of spelling-based rules for each text. The results should effectively reconstruct the phonic structure of each text, such that it can be compared with others from different periods and locations, helping to map phonological change from the bottom up.
|a.||<m> <c ipa="v">v</c><c ipa="ɨ">ú</c><c ipa="t">t</c><c ipa="a">a</c></m>|
|b.||<m> <c ipa="f">f</c><c ipa="ɨ">ü</c><c ipa="t">t</c><c ipa="a">a</c></m>|
The front end of the corpus --- soon to be available in beta form --- will provide search options (in both English and Spanish) at all three levels of tagging (word, morpheme and sound), as well as allowing users to correlate these features across texts and with relevant non-linguistic metadata such as date, location, author, genre, etc. A simpler browser version will also be available for non-linguists, allowing for texts to be browsed and downloaded with parallel translations.
Towards a New World philology
The careful transcription and tagging of the historical material that is proposed for this project follows a long tradition at the Angus McIntosh Centre, working mostly on early English and Scots. The importance of these methods, however were not lost to some of the more prominent Mapudungun scholars, as evidenced in the following passage by Dr. Rudolf Lenz, who conducted some of the earliest explicitly academic studies of the language:
Aun el lenguaje vulgar que no tiene ninguna lengua literaria al lado puede ser una cosa mucho ménos determinada de lo que comunmente se cree, i no siempre podrá justificarse que en la edicion filolójica de un testo de siglos pasados se uniforme la ortografia del autor en todos los casos. Cuando la ortografía vacila en lenguas que se escriben poco, esto puede espresar el empleo de diferentes pronunciaciones en una misma palabra, o puede tener la causa de que ninguna de las diferentes maneras de escribir corresponda bien a la pronunciacion. Mucho mas raro será que el autor se haya simplemente equivocada al escribir lo que pronunciaba.
Rudolf Lenz (1897:132)