Interview with Cristina Valentini

Multilingualism represents a challenge and an opportunity in that terminology is made available in WIPO Pearl in unusual language combinations and in languages in which terminology has to date had little currency.

Cristina Valentini

Cristina ValentiniCristina Valentini is Head of the Terminology Unit, Support Section, PCT Translation Service, at the World Intellectual Property Organization (WIPO), in Geneva. She has been working at WIPO since 2010 and her main tasks involve the design and development of terminology resources and guidelines, and terminology workflow management. In 2009 she received a Ph.D. in Languages, Cultures, and Intercultural Communication from the University of Bologna, Italy.

From 2002 to 2010, Cristina Valentini also worked as a researcher at the University of Bologna in topics such as terminology and simultaneous interpreting, occupational health and safety terminology, harmonization of data categories in multilingual and multidisciplinary terminology databases, multimedia corpora design, application of corpus linguistics methods to audiovisual translation research, and child language brokering.

1. You have a PhD in Multimedia Corpora and Audiovisual Translation from the University of Bologna. Why terminology?

My first field of research was terminology. It all started in the early 2000s when I was graduating from the Advanced School of Modern Languages of the University of Bologna at Forlí (former SSLMIT, now DIT – Department of Interpretation and Translation). Back then, I was required to choose an innovative subject of investigation for my Master’s dissertation in conference interpreting and found out that little had been written on terminology and interpreting, and particularly the terminology needs of simultaneous interpreters in the booth.

I decided to run a world survey via AIIC on this particular aspect of the profession. As a case study, I chose to work in the field of health and safety at work, a particularly hot topic at the time as a result of the incorporation of some European Directives into Italian law. This second part of the work eventually provided the basis for the development of “EOHS Term”, a multilingual database project funded by the Italian National Institute for Occupational Safety and Prevention (ISPESL).

After my graduation, I spent 5 months working as a trainee in the Terminology Group of the Translation Service of the European Commission in Luxembourg in the pre-IATE era. Subsequently, I continued doing research in the field of terminology at the University of Bologna focusing in particular on the standardization of data categories to harmonise terminology collections compiled by students in different fields. Meanwhile, my supervisor, Prof. Marcello Soffritti, had started working on a new project, the development of a multimedia corpus for conducting empirical research in the field of audio-visual translation (FORLIXT 1), which eventually became the topic of my doctoral thesis. The study of innovative research methods in the analysis of audio-visual texts has proved very useful for my work in terminology at WIPO. I am thinking in particular of the need to define corpus building criteria and classification principles for annotating linguistic data, two topics that are closely related to methods of term extraction and classification of concepts in terminology collections.

2. The section of WIPO dealing with patents is called the Patent Cooperation Treaty (PCT) Division. The PCT system has 10 official languages – Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian, and Spanish. What are the specific challenges for terminology in this multilingual setting?

WIPO administers many IP Treaties, amongst which is the Patent Cooperation Treaty (PCT). The PCT system provides for a unified procedure for filing patent applications internationally, and for improving access to patent applications worldwide, achieved by publishing international applications and translations of titles, abstracts, and drawings of such applications in PATENTSCOPE, the WIPO patent database. These translations are made available in English and French from the ten PCT publication languages, namely the six UN official languages -Arabic, Chinese, English, French, Russian and Spanish- and German, Japanese, Korean, and Portuguese, whilst translations of search and patentability reports are also published, in English.

Compiling multilingual terminology in this particular setting presents a twofold challenge that involves, on the one hand, language and, on the other, concept coverage. The ten official languages each need to be adequately represented from a numerical point of view, with the ideal objective being to provide a designation in each language for each concept; however, this is not easy to achieve. In addition, and more generally, whilst it is not uncommon to meet professional linguists (translators or terminologists) who have certain European PCT languages in their language combination (e.g. German, English, French, Spanish), it is more difficult to find language experts skilled in language combinations such as, for example, Chinese-Spanish, Korean-Arabic, Japanese-Russian who could assess equivalence of designations between such languages. Different academic approaches to the study of languages and translation, as well as geographical specificities, may pose additional problems: for instance, terminology is not a traditional component of translation and interpreting curricula in some Asian countries; Arabic tends to show a relatively low level of standardization of technical terms compared to other languages due to the existence of concurrent regional varieties. The situation is such that in many cases English becomes the pivot language for assessing cross-linguistic equivalence in our termbase, regardless of the fact that the concept may have originated in a different language.

The second challenge is related to the broad range of subject fields relevant for patents and the availability of literature in such fields in the different languages. Creating a database of interest for patents means including concepts from virtually any field in which human activities can develop. This is already a tremendous challenge, which is complicated further by the need to find evidence for terms in ten different languages. A correlation often exists between a language and a subject field, with repercussions on the documentation available for term extraction. Each language can be more or less associated with specific fields of activity, depending on the vitality of the businesses and economy at a certain point in time of the country or countries in which the language is spoken, and this is reflected in the numbers of patents filed nationally and internationally. In this context, English is currently the acknowledged lingua franca of science and technology (researchers are encouraged to publish their papers in English worldwide), and this can limit even more the resources available in the various languages and the development of specific terminologies.

Finally, it is worth remembering that, unlike other sectors of WIPO and other international organizations, in the PCT we mainly extract terminology from documents that we do not originally draft and which, because of their very nature, may include terms whose identification can be very demanding.

3. A patent is both a technical and legal text. How would you characterise the terminology of this text type?

Patents are complex text types. The rights that they confer are territorial and consequently patent language may be construed differently according to the jurisdiction. Moreover, a patent is composed of different sections (i.e. abstract, claims, description or specification) in which the legal and technical nature of patents is evidenced to a different extent, with the description normally being closer to technical and scientific texts, and claims conversely more similar to legal texts. In addition, there is a tension in patent language between the need for clarity and the need for protection. As such, patent drafters do use precise and accurate terminology in order for experts to understand and reproduce the invention, but they also often resort to vague and descriptive terminology as they seek to broaden the scope of protection for their invention. In fact, patent drafting manuals often recommend authors to become their own lexicographers and to define their terms so that they can convey the meaning they want them to convey.

In this context, it is obvious that the lexicalization of concepts and the choice of determiners and qualifiers in a designation is never random or neutral. Patents are characterized by a high degree of terminological variation, both conceptual and denominative: new concepts are often established only in order for the document to comply with the specific legal function sought; established terms may be given new meanings and accepted synonymy challenged. Whilst some of these terms may be regarded as “true neologisms”, reflecting the science-in-the-making process, others are only valid and accurate within the scope of the patent application in which they appear. All these considerations can make the extraction of scientific and technical terms from the patent literature and termbase compilation a very challenging endeavour. Conversely, patents offer the added value of containing definitions which – provided they are carefully checked – can be used to enhance terminology compilation.

4. WIPO Pearl is WIPO’s multilingual terminology portal, giving access to scientific and technical terms derived from patent documents. Could you tell us a bit more about this free multilingual terminology database?

WIPO Pearl constitutes the first attempt to make freely available a multilingual terminology database of scientific and technical terms extracted from patents in 10 languages. The portal currently includes data from the PCT Termbase, the terminology database being developed in the PCT Translation Service of WIPO.

The main challenge in designing WIPO Pearl was to develop an interface that would give a wide range of users direct access to patent terminology, and to use terminology as a relay point for searching other databases such as PATENTSCOPE. Our aim was to make key information immediately visible. Thus, in the traditional Linguistic Search, the view of the hit list results includes indication of the subject field and subfield, preferred term, synonyms/abbreviations and, where relevant, term usage. Definitions in the context field are a mouse-click away. Term reliability is clearly indicated by coloured flags, and the user can filter the results upstream or downstream, as desired. If fuller terminological information is desired for the record, this can be accessed in another tab.

Moreover, searched terms can be displayed innovatively in clusters in the Concept Map View, or searched and displayed directly via the Concept Map Search Interface, which offers a graphical representation of semantic relations existing between concepts in a specific subfield.

Hence, for the first time in an institutional terminology portal, WIPO Pearl combines traditional linguistic search with an ontology browsing option. Taken together, these features allow WIPO Pearl to be regarded more as a knowledge tool than a traditional terminology portal.

Concept-orientation, multidisciplinary and multilingualism are the features that make this resource especially interesting not only for language professionals but for the patent community and the public at large. In particular, multilingualism represents a challenge and an opportunity in that terminology is made available in WIPO Pearl in unusual language combinations and in languages in which terminology has to date had little currency. We hope that WIPO Pearl can become a reference for terminology in such languages.

5. All content in WIPO Pearl has been validated and given a term reliability score. How does the validation process work and who are its main stakeholders?

All content in WIPO Pearl is human-generated and validated. Validation is the first step in ensuring the quality of a termbase. Every contribution made to the PCT Termbase is subject to the scrutiny of a validator. No term block is published in WIPO Pearl until it has been awarded the status “validated”, the only exception being WIPO MT results, derived from WIPO’s patent trained machine translation engine, that are provided in case an equivalent in one of the target languages has not yet been entered in the PCT Termbase.

How does it work in practice? Concepts and terms are contributed daily by our staff terminologists, translators and short-term terminology trainees. Each term is then assigned to a validator (typically another fellow terminologist and/or translator, ideally a native speaker of the language of the term in question). Terminology validation involves confirming that a record accurately reflects the expression of a single concept and that the term in a given language is indeed the most accurate designation for that concept. In addition, validation also ensures that the content of each field is formally consistent with the principles established in our terminology guidelines.

Further, when terms are switched from “candidate” to “validated”, a term reliability score is assigned according to a scale of 1 to 4. The type of term and the nature of the validation are the two main criteria here. When the term is a proposed term (no reliable source is found) the score assigned can be 1 or 2 depending if the validation was carried out by the PCT Translation Service only or by the PCT Translation Service and external subject field experts. Conversely, if a reliable source can be found, the term is assigned a score of 3 (validation by the PCT Translation Service only) or 4, the maximum level of reliability in WIPO Pearl if both the PCT Translation Service and external subject field experts validated the entry.

Maintaining the necessary levels of quality in terminology validation is paramount to delivering a reliable product that users can trust. As such, before publication, a series of semi-automated checks is also run on validated term blocks with the aim of identifying any termbase fields that may have been filled in incorrectly. This further ensures quality.

6. WIPO Pearl was launched in September 2014 and currently contains some 16,300 concepts and over 110,000 terms. How do you envision WIPO Pearl in ten years’ time?

Grown, matured, both in quantity and quality, and more and more popular among a growing number and variety of users! WIPO Pearl is still a very young resource. We plan soon to add terminology collections from other sectors of WIPO (e.g. copyright, brands and designs) and this will broaden the scope of the portal.

We hope to develop a network of partners that would help further to position WIPO Pearl as a key resource in the field of patent and IP terminology worldwide. We want to involve subject field experts in the validation process and establish partnerships with scientific and technical institutions. We also want to extend our collaboration with students in universities in which terminology is taught, and we would welcome approaches for collaboration from universities or research establishments.

Finally, we hope that WIPO Pearl will continue to offer a number of distinct advantages by favouring enhancement of document search and access to knowledge embedded in patents and IP documentation via increasingly accurate and user-friendly concept maps.

In a world in which users do not want to be overburdened by unverified information, it is important that WIPO Pearl continues to strive for quality. Reliable language resources are essential to supporting the knowledge and content industries, and our aim is for WIPO Pearl to be perceived as a reference for multilingual scientific, technical and legal terminology.

7. How do you see the future of terminology not only in multilingual public organizations, in particular in multilingual business organizations?

Terminology is a key component of the world as we know it and will know it. I can see two complementary ways in which terminology can play an important role for public institutions and private businesses alike. On one hand, the third generation Web which is synonymous with semantic web, natural language processing, data mining, and knowledge, language independent, networks in which concept-based terminology can provide a common framework for comparing and bringing together the different languages; on the other, the need to go towards the locale in which accurate terminology compilation in the most disparate languages can help to reach people in the most remote corners of the world. In both scenarios, trust and confidence should be the goals of any terminology endeavour.

About the interviewer

Foto ARRAna Rita Remígio holds a PhD in Linguistics – Terminology – from the University of Aveiro, Portugal. Her academic background also includes a graduation in English and German Teaching from the same university, and a Post-Graduation Course in Computer Assisted Translation from the Institute of Accounting and Administration of Oporto. Ana Rita has worked as Technology Transfer Project Manager at the University of Aveiro from 2009 to 2013, has also taught ‘Terminology and Translation Technologies’ at the Polytechnic School of Technology and Management of Águeda for several years, and has worked as a Patent translator. Currently Ana Rita Remígio is a Portuguese Patent and Trademark Attorney and a European Trademark and Design Attorney. Both worlds – Terminology and Intellectual Property – daily meet in patent translation, revision and drafting.