Raziskave o tipografskih napakah v slovarjih


Research on typographical errors in twenty-two Spanish specialised bilingual paper dictionaries (Ariel and Gesti├│n 2000). An overview

Academic and professional background

My name is Santiago Rodr├şguez-Rubio. I live in Seville (southern Spain). I have a bachelorÔÇÖs degree in English Philology, a three-year undergraduate degree in Tourism, and a masterÔÇÖs degree in Translation and Interpretation.

I work as an English-Spanish, French-Spanish translator and interpreter. My fields of expertise are: tourism and gastronomy, travel, adventure, natural world, finance and accounting, car restoration shows. I have worked as a translator and liaison interpreter (both in-house and freelance) in several Spanish engineering companies having subsidiaries abroad. Those engineering companies operate in the following fields: wastewater and drinking water treatment plants, power generation (electric power, solar power, and wind power).

I have always been greatly interested in formal correctness, both in written and spoken language. I suppose this tendency comes from my taste for reading, and from a mental framework built upon the satisfaction you get from work well done.

My research on typographical errors in specialised bilingual dictionaries (materials)

Since 2016, I have been conducting research on typographical errors in a corpus of twenty-two Spanish specialised bilingual paper dictionaries. The corpus is around 15,000 pages long, and the works are mainly bidirectional English-Spanish. The dictionaries were published by two Spanish publishing houses belonging to ÔÇťGrupo PlanetaÔÇŁ media group, namely, ÔÇťArielÔÇŁ and ÔÇťGesti├│n 2000ÔÇŁ.

The dictionaries were chosen according to their academic standing and the prestige of the publisher. Fourteen of those twenty-two works conform the so called ÔÇťAlicante DictionariesÔÇŁ (Mateo 2018, in Fuertes-Olivera, The Routledge Handbook of Lexicography), including works on international trade, international taxation, banking, insurance, stock market, advertising and marketing, tourism, footwear industry, etc. These fourteen dictionaries were published by Ariel. They are linked to IPA (ÔÇťProfessional and Academic EnglishÔÇŁ) research group, from the University of Alicante, as well as to IULMA (ÔÇťInstitute of Applied Modern LanguagesÔÇŁ), from the Community of Valencia (eastern Spain). Some of those fourteen dictionaries are landmarks in Spanish specialised bilingual p-lexicography and English for Specific Purposes academia. The first two dictionaries of the Alicante series were Diccionario de T├ęrminos Jur├şdicos/A Dictionary of Legal Terms (Alcaraz and Hughes, 1993), and Diccionario de T├ęrminos Econ├│micos, Financieros y Comerciales/A Dictionary of Economic, Financial, and Commercial Terms (Alcaraz and Hughes, 1996). These two works were last reedited in 2012 and reprinted in 2014.

Origin and development of my research

My research originated by chance. In my translation work, I used (and continue using) the above-referred two dictionaries on a regular basis. It would not be long before I began to spot different typographical errors. At first, I supposed that it would be a matter of just a few occasional errors, but very soon I noticed there was a sort of ÔÇťmodel of errorsÔÇŁ, as errors appeared in a pervasive way in all positions of the text (lemma, definition, illustration, cross-reference, top of page, etc.), and some errors were repeated in different places within the same section or in another section of the works. As my research progressed, I found that not only some errors were repeated, but also that some of them were repeated in similar sentences, or even in the same sentences. I also found that some errors were not only repeated within the context of a particular dictionary, but also in different works. For instance, a particular mistyped term (variants not included) appeared eleven times in three different dictionaries. Another mistyped term (variants not included) appeared sixteen times in three different dictionaries. A mistyped term such as accomodation (or the variants accomodations, acommodation, accomodate, accomodated) appeared more than forty times in eight different dictionaries.

The corpus of twenty-two dictionaries was entirely analysed, page by page, in a homogeneous way, that is, using the same error detection criteria. Apart from the bodies (ÔÇťEnglish-SpanishÔÇŁ/ÔÇŁSpanish-EnglishÔÇŁ), the rest of the hyperstructure of the works was also analysed (i.e. cover pages, forewords, introductions, etc.). Given the length of the corpus, different sub-corpora were established for analysis and comparison purposes. More precisely, the establishment of sub-corpora or subgroups of dictionaries allowed us to depict mechanisms through which errors were repeated or reproduced in related works. Related dictionaries were established according to different criteria: works having the same authors, works belonging to the same fields of study, or works belonging to the same editorial sub-collection (e.g. The Dictionary of Terms of the Natural Stone and Allied Industries in Dictionary of Terms of the Footwear and Allied Industries make up the series ÔÇťDictionaries of Industrial TermsÔÇŁ by Ariel, and, therefore, they conform a relevant sub-corpus).

Typographical errors were classified in categories and subcategories, such as ÔÇťnon-word errorsÔÇŁ (letter omission, insertion, substitution, or transposition), and ÔÇťreal-word errorsÔÇŁ (word omission, insertion, substitution, or transposition). Within each category or subcategory, errors were grouped according to repetition or similitude criteria. This allowed us to establish the intratextual error repetition rate for each dictionary, from which a number of relevant conclusions were drawn.

Roughly speaking, two types of similar errors were distinguished. On the one hand, similar errors featuring the same underlying term (e.g. terapeutics, therapeutc, therpeutically, therapeutic ali spectrotophometer, spectophotometry). On the other hand, similar errors featuring different underlying terms (e.g. objetive, subjetive). In the latter case, there is a relationship of antonymy between the terms involved, but other mistyped terms feature other relationships, such as word structure (e.g. accomplisment, establihment ali markmanship, guardianshp, censorhip, sponshorship), or the fact that they belong to the same semantic field (e.g. terminatation, liquidatation, exhaustation).

One of the most shocking real-word errors are those resulting in the substitution of a word for another word. Two types of word substitutions were described in my research: intralingual substitution (e.g. gastronomija za gastrostomy), and interlingual substitution (e.g. dispense as griten za dispense as written). Some of those substitution errors are likely to have been generated automatically. In fact, the data I compiled may have an application in the area of machine learning for spell-checkers, as the latter could learn from the set of errors detected, provided that the correct terms are also entered in the same way as in a parallel corpus.

[expander_maker id=”1″ more=”Read more” less=”Read less”]

Sequences of errors are worthwhile mentioning, from a qualitative perspective. Sometimes, several errors were found in a particular sentence or in a particular entry. For instance, in the entry ÔÇťhypnosis (ÔÇŽ es una estado similar al sue├▒oÔÇŽ inducidaÔÇŽ en el que pueden surgirÔÇŽ alucionacionesÔÇŽ ÔŚŐÔÇŽ a full range of options from aspirin hypnosys)ÔÇŁ, four errors were found: two examples of gender disagreement (ÔÇťuna estadoÔÇŽ inducidaÔÇŁ for ÔÇťun estadoÔÇŽ inducidoÔÇŁ), an insertion error probably caused by psychomotor anticipation of segment ÔÇťcioÔÇŁ (ÔÇťalucionacionesÔÇŁ for ÔÇťalucinacionesÔÇŁ), and a substitution error probably caused by psychomotor perseveration of letter ÔÇťyÔÇŁ (hypnosys za hypnosis).

Interlingual interference is another psycholinguistic aspect of the compiled errors that could be studied in the future. Thus, a mistyped term such as iniciatives (for initiatives) could have resulted from the interference of the Spanish counterpart ÔÇťiniciativasÔÇŁ. Similarly, infectiones (for infections) could have derived from ÔÇťinfeccionesÔÇŁ, inestability (for instability) from ÔÇťinestabilidadÔÇŁ, fragance in fragant (for fragrance in fragrant) from ÔÇťfraganciaÔÇŁ and ÔÇťfraganteÔÇŁ, and so on.

Follow-up. Communication with the publisher and the authors of the dictionaries

I would like to clarify that there is no professional or academic relationship between the publishing houses or the authors and me. I am conducting an independent research, within the framework of a doctoral programme in the Pablo de Olavide University (Seville, Spain). In any case, I contacted the managing editor of Ariel and the director of IULMA, as well as the authors. I gave all of them thorough information about my research, and, needless to say, I offered my collaboration so as to correcting the errors found in their dictionaries.

I would recommend the more relevant dictionaries under study to be corrected. ÔÇťZero-errorsÔÇŁ is simply unattainable in such complex works. Any lexicographer would tell you so. It is also a matter of common sense. Still, I think a reference work should feature a very high degree of correctness. Besides, what I found is not just a mere accumulation of errors, but something I have referred to as a ÔÇťmodelÔÇŁ or ÔÇťparadigmÔÇŁ of errors. In A Terminological Dictionary of the Pharmaceutical Sciences (Dom├şnguez-Gil, Alcaraz and Mart├şnez, 2007), I found (among many other errors belonging to other categories that were also registered) nearly seven hundred non-word errors and real-word errors; in other words, one error belonging to any of those subcategories every 1.58 pages. Now, one error every 1.58 pages in a text 1.5 pages long would mean just one error; however, the same frequency of errors in a work 1,000 pages long (such as the dictionary involved) reveals a sort of ÔÇťerror consistencyÔÇŁ.

See details about the errors found in A Terminological Dictionary of the Pharmaceutical Sciences in my paper ÔÇťA quantitative analysis of typographical errors in A Terminological Dictionary of the Pharmaceutical Sciences, English-Spanish/Espa├▒ol-Ingl├ęs (Ariel, 2007)ÔÇŁ, published in Panace@, 2018:


In the corpus of twenty-two dictionaries, I found (among many other errors that were also registered), more than 4,000 non-word errors and real-word errors, including a wide range of grammatical errors (e.g. gender disagreement, number disagreement, wrong verb tense, etc.). See title and year of publication of those twenty-two works in the Appendix to my paper.


ÔÇťThe Alicante DictionariesÔÇŁ make up a relevant collection of specialised bilingual paper dictionaries, not only in Spain, but also at an international level (see ÔÇťThe Alicante DictionariesÔÇŁ, in Fuertes-Olivera 2018, The Routledge Handbook of Lexicography). Based on our findings in terms of frequency of errors and error repetition rate, we would recommend the more prominent works of the corpus to be revised, including A Terminological Dictionary of the Pharmaceutical Sciences (Ariel, 2007).

Our research is unprecedented, and we believe it provides a clear added value. It could definitely contribute to expanding knowledge on typographical errors in dictionaries, from a two-fold perspective: error generation and error detection/correction. More specifically, valuable insights are offered regarding mechanisms of repetition or reproduction of errors in related dictionaries.

As previously mentioned, our data could also be applied to Natural Language Processing (NLP), more precisely, to the field of machine learning for spell-checkers.


Napisal jih je Santiago Rodr├şguez-Rubio, translator (Spain)