Aijmer, Karin and Anne-Marie Simon-Vandenbergen “A model and a methodology for the study of pragmatic markers: the semantic field of expectation: Pragmatics of Discourse.” Journal of Pragmatics vol. 36, n. 10 (2004).  pp.:

In this article, a model and a methodology are proposed for the study of pragmatic markers. The proposal not only contributes to the theoretical discussion but also has advantages from a descriptive point of view. As to the model, it draws on a number of notions which so far have not been brought together in a way which gives a satisfactory explanation for the existence and functioning of pragmatic markers. These notions, reflexivity and indexicality, are integrated here with a view of modality and evidentiality which, based on White (2003), goes beyond truth-functional accounts in its heteroglossic and rhetorical perspectives. From a methodological point of view, the article proposes the use of translations as a heuristic device for setting up lexical fields. Following Dyvik (in press), translation corpora are used to refine the descriptions of semantic and pragmatic relations between items in the same field. To illustrate the methodology, we have opted for the semantic field of `expectation’ in English, and we look at equivalents of the pragmatic markers in Swedish and in Dutch.

Ajunwa, Enoch “Generating a Corpus-Based Metalanguage:The Igbo Language Example.” The Translation Journal vol. 12, n. 1 (2008).  pp.:

The Igbo language (my mother tongue) could be described today as an ‘endangered species.’ For this reason, many calls have gone out to all language stakeholders and specialists such as linguists, creative writers, translators, etc. to save the Igbo language from extinction. To this effect, this article intends to contribute towards the generation of an Igbo metalanguage in the area of the computer science through the corpus-based hybrid method.

Ana, Frankenberg-Garcia “Learners Use of Corpus Examples.” International Journal of Lexicography vol. 25, n. 3 (2012).  pp. 273-273.

One of the distinguishing characteristics of corpus-based dictionaries is that most entries contain example sentences or phrases that have been copied or adapted from corpora. Although examples are generally regarded as positive and have high face validity among learners, the body of evidence about their actual benefits is limited and inconclusive. My aim in the present study is to revisit the idea of testing the usefulness of corpus examples. However, unlike previous research, in the present study different words are used to test language comprehension and the ability to correct typical language production errors, and a distinction is also drawn with regard to examples intended to facilitate decoding and examples meant to benefit encoding. In addition to this, because a single example might not be enough to help people understand what a word means or how it is used, in this study I also test the value of presenting learners with multiple corpus examples.

Baker, Mona “Réexplorer la langue de la traduction: une approche par corpus.” Meta vol. 43, n. 4 (1998).  pp.:

This paper discusses the need to develop a coherent corpus-based methodology for identifying the distinctive features of the language of translation. The aim of this endeavour is not merely to unveil the nature of the ‘third code’ per se, but most importantly, to understand the specific constraints, pressures, and motivations that influence the act of translating and underlie its unique language.

Barrière, Caroline “Building a concept hierarchy from corpus analysis.” Terminology vol. 10, n. 2 (2004).  pp. 241-264.

Corpus analysis is today at the heart of building Terminological Knowledge Bases (TKBs). Important terms are usually first extracted from a corpus and then related to one another via semantic relations. This research brings the discovery of semantic relations to the forefront to allow the discovery of less stable lexical units or unlabeled concepts, which are important to include in a TKB to facilitate knowledge organization.We suggest a concept hierarchy made of concept nodes defined via a representational structure emphasizing both labeling and conceptual representation

Bendazzoli, Claudio and Annalisa Sandrelli “Estudios de interpretación basados en corpus: primeros trabajos y perspectivas futuras.” Corpus-based Interpreting Studies: Early Work and Future Prospects vol., n. 7 (2009).  pp.:

Este artículo hace una revisión de los proyectos de investigación realizados en el pasado y en el presente dentro del campo de los Estudios de Interpretación basados en Corpus (EIC). Se analizan brevemente los obstáculos generales que aparecen en la creación de corpus electrónicos destinados al estudio de la interpretación. Además, se hace hincapié en las principales razones por las que el desarrollo de los Estudios de Interpretación basados en Corpus se encuentra aún en un estado menos avanzado que el de los Estudios de Traducción basados en Corpus (ETC). Todo lo que podemos aprender de las experiencias pasadas y presentes en el desarrollo de corpus de interpretación puede sugerir maneras de salvar el desnivel entre EIC y ETC. (A.)

Benito, Daniel “Tendencias futuras en memorias de traducción.” Future Trends in Translation Memory vol., n. 7 (2009).  pp.:

Este artículo repasa algunos de los avances más recientes en el campo de la tecnología de memorias de traducción, y analiza cómo se podría aplicar un enfoque desde la lingüística de corpus para ampliarlos y hacerlos más atractivos. El artículo también explora cómo la naturaleza de la industria de la traducción puede afectar a que las nuevas tecnologías sean, o no, adoptadas de forma generalizada. (A.)

Blanco Carrión, Olga “Framenet como una herramienta de corpus para el aprendizaje de segundas lenguas y para la comprensión léxica de la lengua materna.” Framenet as a Corpus Tool for the Learning of Second Languages and for the Lexical Awareness of One?S First LanguagePorta Linguarum. Revista Internacional de Didáctica de las Lenguas Extranjeras vol., n. 6 (2006).  pp. 67-76.

En este artículo se pretende mostrar como la base de datos FrameNet creada por el grupo de investigación dirigido por el profesor Charles Fillmore y basada en los principios de la Semántica de Marcos puede ser una herramienta útil para el aprendizaje del lexicon de una segunda lengua así como para la mejor comprensión de la lengua madre. (A)

Borja Albi, Anabel “Organització de corpus. l’estructura d’una base de dades documental aplicada a la traducció jurídica.” Revista de Llengua i Dret vol. 34, n. (2000).  pp.:

Bosseaux, C. “Who’s Afraid of Virginia’s You: a Corpus-Based Study of the French Translations of the Waves.” Meta vol. 51, n. 3 (2006).  pp.:

The present paper discusses issues related to the translation of the English personal pronoun you in the French translations of Virginia Woolf’s The Waves (1931). There are two published French translations of The Waves. The first one, Les Vogues, was translated by Marguerite Yourcenar and published in 1937. Around fifty years later, another version was published, under the same title but this time translated by Cecile Wajsbrot (1993). The two versions differ significantly when the use of tu and vous is concerned. This paper is concerned specifically with the original’s mind-style (Fowler 1977) in other words, the way the characters’ perceptions and thoughts, as well as their speech, are presented through language and how this is rendered in the translations. The quantitative analysis was realised using corpus-based studies tools which proved to be an asset in helping to identify the novels’ mind-style.

Bowker, Lynne “Using specializaed monolingual native-language corpora as a translation resource: a pilot study.” Meta vol. 43, n. 4 (1998).  pp.:

This article reports on the results of an interesting experiment comparing two translations produced by a group of translator trainees. One translation was carried out with the use of conventional resources; the other with the aid of a specialised monolingual corpus. The results reveal that the corpus-aided translations were of higher quality in respect to subject field understanding, correct term choice, and idiomatic expression. The author observes that although she did not find any improvement with regard to grammar or register, the use of the corpus was not associated with poorer performance.

Buendía Castro, Miriam and José Manuel Ureña Gómez-Moreno “¿Cómo diseñar un corpus de calidad?: parámetros de evaluación.” Sendebar: Revista de la Facultad de Traducción e Interpretación vol., n. 21 (2010).  pp. 165-180.

Campoy Cubillo, Mari Carmen “Frases verbales y verbos preposicionales en textos especializados: una técnica creativa.” Phrasal and prepositional verbs in specialised texts: a creative device vol., n. 4 (2002).  pp. 95-111.

Los verbos compuestos y preposicionales son uno de los recursos que utiliza la lengua inglesa para expresar nuevos conceptos. En este sentido, la investigación e innovación dentro del ámbito científico-técnico puede utilizar dichos verbos para expresar nuevos pensamientos o conceptos. El presente artículo analiza el uso de verbos con partícula en textos especializados. En la sección final se estudian las distintas combinaciones compuestas y preposicionales con las partículas up, down, off, over, y out en un corpus formado por 80 artículos de investigación que pertenecen al área de la Botánica. (A.) Lenguajes especializados. Lenguajes técnicos; Lingüística sincrónica. Morfología; Lingüística sincrónica. Sintaxis

Castillo Rodríguez, Cristina “La elaboración de un corpus ad hoc paralelo multilingüe.” Revista Tradumaticavol., n. 7 (2009).  pp.:

Este artículo presenta una propuesta de elaboración de un corpus ad hoc paralelo multilingüe, dividida claramente en una serie de fases. Por otro lado, se muestran las ventajas e inconvenientes a la hora de compilar un corpus de estas características, incidiendo sobre todo en la fase de alineación de los bitextos, imprescindible para este tipo de corpus. (A.)

Charles, Maggie “The Construction of Stance in Reporting Clauses: A Cross-disciplinary Study of Theses.”Applied Linguistics vol. 27, n. 3 (2006).  pp.:

Using a corpus-based approach, this paper investigates the construction of stance in finite reporting clauses with that-clause complementation. The data are drawn from two corpora of theses in contrasting disciplines: a social scienceùpoliticsùand a natural scienceùmaterials science. A network for the analysis of reporting clauses is presented which sets out the major alternatives available to academic writers and enables stance to be linked systematically to grammatical and semantic patterns of use. Quantitative and qualitative analysis of the data leads to the identification of an important, but somewhat under-researched, function of reporting clauses in academic writing: their use to report the writer’s own work. Drawing on the notions of averral and attribution, the paper shows how writers can emphasize or hide their responsibility for their own propositions and thereby construct a stance which differs according to the epistemology and ideology of the discipline concerned. These reporting clauses play a key role in the construction of major claims, with greater writer visibility seen in politics than materials. However, despite the superficial objectivity and impersonality of writing in the natural sciences, it is argued that skilled exploitation of the interplay between averral and attribution allows writers to construct a stance that is both clear and pervasive.

Christiane, Hnmmer and Stathi Katerina “Polysemy and Vagueness in Idioms: A Corpus-based Analysis of Meaning.” International Journal of Lexicography vol. 19, n. 4 (2006).  pp.:

This paper presents a corpus-based approach to the meaning of verb phrase idioms and proposes a set of parameters for the systematic description of their meaning in different contexts. It also discusses polysemy and vagueness in relation to idioms and offers criteria for the operationalisation of this distinction.

Churcher, Gavin, Eric Atwell, et al. “Developing a Corpus-based Grammar Model within a Commercial Continuous Speech Recognition Package.” WEB-SLS: The European Student Journal of Language and Speechvol. 3, n. (1997).  pp.:

This paper is derived from experiments with a commercial ‘off-the-shelf’ continuous speech recognition system (PE500 – see note 1), applied to the apparently restricted domain of Air Traffic Control (ATC) for light aircraft. The system is required to transcribe key sub-phrases in a transmission by the ATC to a particular aircraft, the commercial speech recognition system providing the main recognition component. After the development of a corpus of transmissions, it was realised that key information is often interspersed with unconstrained English. Initial attempts at evaluating different types of language model focused on using a wildcard mechanism for the non-key sub-phrases. The mechanism, however, proved to be valuable only in simplistic grammars due to its overgenerative nature. The speech recognition system showed us that whilst useful mechanisms are provided, such as the wildcard mechanism, they tend to make over-simplistic assumptions about English grammar and dialogue structure. A set of experiments using the three different types of ‘grammar’ showed that a fully constrained context-free grammar provides markedly better results than the other two which make use of the built-in iterative mechanism.

Corpas Pastor, Gloria and Miriam Seghiri “Specialized Corpora for Translators: A Quantitative Method to Determine Representativeness.” The Translation Journal vol. 11, n. 3 (2007).  pp.:

Nowadays, there can be no doubt as to the importance or the necessity of using corpora in translation. Equally, given the short deadlines and speed that are now demanded in the translation industry, the virtual corpus has undeniably proved itself a most useful tool. Many authors have explored the possibilities offered by corpora for specialized language teaching and translation (cf. Bernardini and Zanettin, 2000; Corpas, 2001 and 2004, Bowker and Pearson, 2002, to name but a few). Ad-hoc, specialized corpora mined from electronic resources available on the Internet have proved to be a first-class documentary resource, as well as a valuable tool in decision-making and in revision. However, there is a surprising scarcity of studies devoted to analyzing the quality of the corpora that are being used in translation.

Curado Fuentes, Alejandro “Factores de representatividad y significación en textos de inglés para fines específicos.” Representativeness and significance factors in ESP texts vol., n. 2 (2000).  pp. 43-56.

De Vecchi, Dardo “La traduction d’un corpus atypique : les annuaires téléphoniques professionnels.” Metavol. 56, n. 2 (2011).  pp. 301-317.

Les listes sont des textes particuliers tant dans leur forme que dans leur contenu. Elles peuvent, sous certaines conditions, être traitées comme des corpus de textes constitués de phrases. Les rubriques des annuaires téléphoniques en sont des exemples riches d’informations à plusieurs titres. Leur traduction est un exercice aux exigences particulières où sont conjuguées les compétences linguistiques et la connaissance des activités économiques et professionnelles. Culturellement, elles rendent compte des manières de classifier ces activités dans un lieu précis ou une région. Le traitement de ce type de corpus n’est pas sans importance pour l’observation des phénomènes économiques. Il n’est pas négligeable de noter, en outre, qu’un examen diachronique de ce type de corpus offre une lecture des pratiques économiques et terminologiques passées et présentes d’une société. Cependant, l’examen de ce corpus ne peut s’effectuer que sur les versions papier numérisées et non sur les versions en ligne car, pour ces dernières, les catégories classificatoires ne sont pas systématiquement visibles puisque le contenu se trouve dans des bases de données électroniques qui ne sont pas accessibles directement.

Ebeling, Jarle “Contrastive linguistics, translation, and parallel corpora.” Meta vol. 43, n. 4 (1998).  pp.:

This paper regards parallel corpora as suitable sources of data for investigating the differences and similarities between languages, and adopts the notion of translation equivalence as a methodology for contrastive analysis. It uses a bidirectional parallel corpus of Norwegian and English texts to examine the behaviour of presentative English there-constructions as well as the Norwegian equivalent det-constructions in original and translated English, and original and translated Norwegian respectively.

Edo Marzá, Nuria “The generation of active entries in a specialised, bilingual, corpus-based dictionary of the ceramics industry: what to include, why and how.” Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos (AELFE) vol., n. 18 (2009).  pp. 43-70.

The generation of useful dictionary entries is a complex task since it is normally complicated to decide what to include, and how to include it. Accordingly, this research presents as its main goal to show how “active entries” have been generated in the specific case of the elaboration of a specialised, bilingual, corpus-based dictionary in the field of industrial ceramics. Thus, this article illustrates how final entries have been designed and how decisions have been adopted depending on the prospective users of the dictionary –specialists and translators in the ceramic industrial field. It proceeds reflecting on how active entries complement previous terminological creations with the inclusion of additional, pertinent information and on the intricate decision-making processes involved in the generation of this kind of entries. On the first part of the article, the theoretical considerations adopted are posed whereas the second part deals with the active entries as such and the way their different fields have been filled in; that is, how different pieces of information regarding contexts of use, pragmatic implications, semantic classification and definitions, among others, have been included in the entries to meet the users’ needs.

Esther Monzó, Pilar Ordóñez “La investigación en traducción basada en corpus: aplicaciones profesionales y didácticas del proyecto GENTT.” Revista Tradumatica vol., n. 7 (2009).  pp.:

Fan, May and Xu Xunfeng “An evaluation of an online bilingual corpus for the self-learning of legal English.”System vol. 30, n. 1 (2002).  pp.:

Based on a relatively simple but innovative idea of inserting hyperlinks at the sentence level between parallel texts, a bilingual corpus of legal and documentary texts in English and Chinese has been created and made available online together with a web-based concordancer. In addition to introducing such a corpus, this paper reports a study which seeks to evaluate the usefulness of the corpus in the self-learning of legal English. The subjects involved were a group of Chinese students doing a degree in Translation in a university of Hong Kong, where English Common Law is still used after the handover in 1997 when the sovereignty of Hong Kong was restored from Britain to China. The instruments for data collection included two comprehension tasks, a questionnaire and a follow-up interview. Findings of the study indicate that students considered the bilingual corpus useful as they needed both language versions in the understanding of legal provisions though they were found to rely more on Chinese. Interesting data in relation to how users of the bilingual corpus switched between the two languages have also been obtained. This paper also investigates how the inherent characteristics of legal English contribute to the comprehension difficulty of L2 learners irrespective of the help obtained from the bilingual corpus.

Fellbaum, Christiane, Alexander Geyken, et al. “Corpus-based Studies of German Idioms and Light Verbs.”International Journal of Lexicography vol. 19, n. 4 (2006).  pp.:

We discuss the motivation as well as the design and development of a large lexical resource focusing on German verb phrase idioms and light verbs. Entries for a given phrasal unit permit detailed linguistic analyses coupled with the appropriate corpus data.

Ferraresi, Adriano “Google y más allá: metodologías basadas en web como corpus para traductores.” Google and beyond: web-as-corpus methodologies for translators vol., n. 7 (2009).  pp.:

Este artículo estudia los planteamientos actuales sobre el uso de la web como corpus lingüístico y enfatiza las ventajas (así como los inevitables riesgos) que éstos pueden introducir en el trabajo del traductor. Para ilustrar este punto, se ofrece un ejemplo de las diferentes formas en que un corpus derivado de la web se puede aplicar provechosamente a una tarea de traducción especializada. (A.)

Fuertes-Olivera, Pedro A. “A corpus-based view of lexical gender in written Business English.” English for Specific Purposes vol. 26, n. 2 (2007).  pp. 219-234.

This article investigates lexical gender in specialized communication. The key method of analysis is that of forms of address, professional titles, and ‘generic man’ in a 10 million word corpus of written Business English. After a brief introduction and literature review on both gender in specialized communication and similar corpus-based views of lexical gender in General English, the results obtained are explained. Mixed results were found. On the one hand, the ‘male-as-norm’ principle contributes to reinforcing typical gender stereotypes: for example, for each woman referred to in the corpus, there are more than 100 occurrences for man. On the other hand, advocates of non-sexist English have also influenced written Business English: for example, Ms is more than 9 times as frequent as Mrs. and Miss, which sustains the claim that equates Ms with professional settings. This article ends by discussing the ways in which the research findings of this study could positively impact upon the teaching of Business English.

Fung, Ascale and Athleen Mckeown “A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups.” Machine Translation vol. 12, n. 1 (1997).  pp. 53-87.

Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English+óGé¼GÇ£Japanese awk manual corpus with noise arising from text insertions or deletions and on the English+óGé¼GÇ£Chinese HKUST bilingual corpus. We obtain a precision of 55.35% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47% improvement in translating technical terms.

Gabrielatos, Costas and Paul Baker “Fleeing, Sneaking, Flooding: A Corpus Analysis of Discursive Constructions of Refugees and Asylum Seekers in the UK Press, 1996-2005.” Journal of English Linguisticsvol. 36, n. 1 (2008).  pp. 5-38.

This paper examines the discursive construction of refugees and asylum seekers (and to a lesser extent immigrants and migrants) in a 140-million-word corpus of UK press articles published between 1996 and 2005. Taking a corpus-based approach, the data were analyzed not only as a whole, but also with regard to synchronic variation, by carrying out concordance analyses of keywords which occurred within tabloid and broad-sheet newspapers, and diachronic change, albeit mainly approached from an unusual angle, by investigating consistent collocates and frequencies of specific terms over time. The analyses point to a number of (mainly negative) categories of representation, the existence and development of nonsensical terms (e.g., illegal refugee), and media confusion and conflation of definitions of the four terms under examination. The paper concludes by critically discussing the extent to which a corpus-based methodological stance can inform critical discourse analysis.1

García Izquierdo, Isabel and Tomás Conde Ruano “Investigating specialized translators: corpus and documentary sources.” Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos ( AELFE ) vol., n. 23 (2012).  pp. 131-156.

El presente artículo da cuenta de una investigación con encuestas electrónicas llevada a cabo con tres grupos de traductores que trabajan en distintas áreas de especialidad (legal, médica y técnica) y cuyo objetivo es descubrir distintas facetas de su perfil socio-profesional, su opinión tanto acerca de los corpus como de otras fuentes documentales, así como el uso que hacen de estas. El análisis de los datos ha puesto de relieve rasgos característicos en los tres grupos de población, concretamente en relación con los años de experiencia, las fuentes de documentación utilizadas y los clientes habituales. Por ejemplo, los traductores jurídicos parecen más satisfechos con las fuentes documentales disponibles; los traductores médicos no utilizan nunca las memorias de traducción, mientras que los traductores técnicos a menudo recurren a los tesauros. En cualquier caso, y con independencia del área de especialidad, la mayoría de sujetos valoraría positivamente un corpus especializado que combinara aspectos conceptuales, macro-estructurales, terminológicos y léxicos, y que incluyera asimismo información sobre el contexto. Por todo ello, creemos que el Corpus GENTT 3.0 respondería bien a las expectativas y necesidades de los traductores profesionales.

This paper describes research carried out through electronic surveys of three groups of translators working in different areas of expertise (legal, medical and technical) that aimed to discover their socio-professional profile, their opinions both on corpora and other documentary sources, and the use they make of them. Certain characteristic features emerged from the analysis of data on the three population groups, regarding years of experience, documentary sources used and most usual clients. For example, even if legal translators seem more satisfied with the documentary sources available, medical translators never use translation memories, and technical translators often refer to thesauri. In any event, regardless of their area of activity, most subjects feel the need for a specialized corpus combining formal, terminological-lexical, macrostructural and conceptual aspects, as well as contextual information. That is the reason why the GENTT 3.0 Corpus is believed to meet the expectations and needs of professional translators.

García-Ostbye, Ingrid “Dialogical surface text features in abstracts.” Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos (AELFE) vol., n. 15 (2008).  pp. 89-112.

A sample driven description of Research Article-Comment-Reply (RA-C-R) abstracts in terms of abstract sentence length, reference, possessive structures, modal verbs and word range was carried out to find out whether their surface text features showed some trace of a dialogical construction of knowledge within the psychology discourse community.

Gardner, Dee “Validating the Construct of Word in Applied Corpus-based Vocabulary Research: A Critical Survey.” Applied Linguistics vol. 28, n. 2 (2007).  pp.:

Corpus-based vocabulary research has had a profound impact on English language education, and there is abundant evidence that this will remain the case for the foreseeable future. Perhaps the greatest challenge of such research is the determination of what constitutes a Word for counting and analysis purposes. Decisions in this regard have important ramifications not only for the lexical findings themselves, but also for the pedagogical theories and practices that derive from them. This article surveys several fields of study in order to discuss this dilemma, with a particular focus on three problematic areas relating to computer-processed corpora: (a) morphological relationships between words, (b) homonymy and polysemy, and (c) multiword items. The article concludes with recommendations for assessing the validity of the Word construct in applied corpus-based vocabulary research.

Gehweiler, Elke “Going to the Dogs? A Contrastive Analysis of is Going to the Dogs and jmd./etw. geht vor die Hunde1Thanks to all of my colleagues who have read and commented on earlier versions of this paper, especially to Christiane Fellbaum, Patrick Hanks, and Undine Kramer.” International Journal of Lexicography vol. 19, n. 4 (2006).  pp.:

The paper discusses the origins of equivalent idioms across languages, and specifically the emergence of English is going to the dogs and German jmd./etw. geht vor die Hunde. Then a contrastive analysis of the two idioms is presented, departing from the assumption that superficially equivalent idioms must exhibit semantic and pragmatic differences. It will be shown that the two idioms differ not only with respect to frequency and register but prefer different external arguments, have different variants, and stand in different relations to other forms in the lexicon.

Gibson, Edward and Carson T. Schntze “Disambiguation Preferences in Noun Phrase Conjunction Do Not Mirror Corpus Frequency.” Journal of Memory and Language vol. 40, n. 2 (1999).  pp. 263-279.

The results of two self-paced reading studies of a syntactic ambiguity involving conjoined noun phrases to three potential noun phrase sites were compared to the corpus frequencies of the resolutions of the same ambiguity. The reading times for the attachment to the first noun phrase were faster than for the attachment to the second noun phrase, but, to the extent that any differences were observed in the corpus frequencies, attachments to the second noun phrase were more frequent. We therefore argue that the sentence comprehension mechanism is not using corpus frequencies in arriving at its preference in this ambiguity, and hence the decision principles of sentence comprehension and sentence production must be partially distinct. It is proposed that there is a factor operative in sentence comprehension that is not operative in sentence production, and this factor favors attachment to the first noun phrase.

Girard, Marie-Hélène “Beeby, Allison, Rodríguez Inés, Patricia et Sánchez-Gijón, Pilar (2009) : Corpus Use and Translating. Corpus Use for Learning to Translate and Learning Corpus Use to Translate. Amsterdam/Philadelphia : John Benjamins, 149 p.” Meta: Journal des traducteurs = translators’ journal vol. 56, n. 4 (2011).  pp. 1032-1034.

Revisión del libro: Beeby, Allison, Rodríguez Inés, Patricia et Sánchez-Gijón, Pilar (2009) : Corpus Use and Translating. Corpus Use for Learning to Translate and Learning Corpus Use to Translate. Amsterdam/Philadelphia : John Benjamins, 149 p

Goded Rambaud, Margarita “A descriptive algoritm for a wine tasting lexicon corpus.” Scire: Representación y organización del conocimiento vol. 15, n. 2 (2009).  pp. 39-62.

Se pretende mostrar los avances en las pruebas de validez de un procedimiento de identificación computacional de los componentes que constituyen el significado de las expresiones en el restringido subdominio de las notas de cata de los vinos. El procedimiento consiste en un algoritmo de enlace que incluye un conjunto de componentes etiquetados. Dichos componentes van desde los no lingüísticos, con etiquetas para la “entrada perceptiva” y el “conocimiento del mundo”, hasta los propiamente lingüísticos, tales como analizadores y definiciones de diccionario. Se utiliza la metodología Clashing Identification Procedure (CIP), que permite la reducción progresiva del corpus a un tamaño manejable. El interés de diseñar un sistema de etiquetado semántico reside en su contribución a la identificación de las expresiones metafóricas y sinestésicas que se usan frecuentemente en las notas de cata, y también a las tareas de desambiguación. En definitiva, se trata de mostrar cómo deducir computacionalmente la información relevante para la construcción de las metáforas en las que se basan las notas de cata y cómo un diseño de este tipo permite conectar conocimiento lingüístico y enciclopédico de una forma efectiva.

Gray, Bethany “On the use of demonstrative pronouns and determiners as cohesive devices: A focus on sentence-initial this/these in academic prose.” Journal of English for Academic Purposes vol. 9, n. 3 (2010).  pp. 167-183.

A key concern for writers is the creation of cohesion in a text, and writers are told by style manuals to avoid the use of demonstratives (this, that, these, those) as pronouns in order to maintain cohesion. However, previous corpus-based investigations have already revealed that authors of academic texts use demonstratives as both determiners and pronouns. Using a corpus of academic research articles in Education and Sociology, I investigate the extended linguistic environments in which the demonstratives this and these are used with the goal of understanding how expert writers employ demonstratives as pronouns and determiners to create cohesion. The results of the study indicate that pronominal uses of this/these most overwhelmingly refer to antecedents that are complete clauses (but not extended discourse that spans sentence boundaries). When the demonstratives are followed by a noun, shell nouns and abstract nouns are used most of the time. Shell nouns, in contrast to other abstract nouns, most often refer to antecedents that are complete clauses or that are extended, meaning that the antecedent spans sentence boundaries. The implications of these results for teaching academic writing are discussed.

Guzmán, Josep R. and Alvar Serrano “Alineamiento de frases y traducción: AlfraCOVALT y el procesamiento de corpus.” Sendebar. Boletín de la Facultad de Traductores e Interpretes de Granada vol., n. 17 (2006).  pp. 169-186.

La aparición de los corpus en el campo de la traductología ha motivado la necesidad de generar instrumentos que permitan manejar toda esta información de forma rápida y efectiva. En este orden de cosas, este trabajo se ocupa de la presentación de una herramienta de alineamiento de textos paralelos integrados en el corpus COVALT (Corpus Valencià de Literatura Traduïda). Así pues, tras la realización de un pequeño repaso de los diversos métodos y instrumentos de alineamiento, se analizan las características del programa, AlfraCOVALT, especialmente por lo que se refi ere a la utilidad para el investigador de la traducción y su necesidad de alineamientos ajustados.

Halverson, Sandra “Translation studies and representative corpora: establishing links between translation corpora, theoretical-descriptive categories and a conception of the object of study.” Meta vol. 43, n. 4 (1998).  pp.:

This paper discusses the issue of representativeness in the creation of general translation corpora. In the course of a in-depth examination of the stages involved in the selection of texts which adequately represent the target population, it demonstrates that prototype categories are better suited than the all-or-none classical ones to reconcile existing theoretical statements about what constitutes legitimate data in translation studies with the new methodology developed by the corpus-based approach.

Hanks, Patrick, Anne Urbschat, et al. “German Light Verb Constructions in Corpora and Dictionaries1We would like to thank Christiane Fellbaum and Katerina Stathi for their comments on earlier drafts of this paper.” International Journal of Lexicography vol. 19, n. 4 (2006).  pp.:

In this paper we explore the collocations and semantics of light verbs in German and propose a new kind of monolingual German dictionary entry for such verbs, paying equal attention to phraseology and meaning. As a rule of thumb, we follow the semantic criterion that the meaning of a light verb is more than usually bound up with its collocations (which, typically, are few in number and high in frequency). We consider problems of terminology and criteria. A corpus-based analysis of three verbs (leisten, erteilen, and hegen) is presented.

Hartmann, R. R. K. “Contrastive textology and corpus linguistics: On the value of parallel texts.” Language Sciences vol. 18, n. 3-4 (1996).  pp. 947-957.

The paper sketches the development from contrastive lexicology to contrastive textology, distinguishes a number of different types of ‘parallel texts’, shows how computer-assisted corpus linguistics is coming to grips with text typological issues, and mentions some applications.

Harwood, Nigel “‘We Do Not Seem to Have a Theory … The Theory I Present Here Attempts to Fill This Gap’: Inclusive and Exclusive Pronouns in Academic Writing.” Applied Linguistics vol. 26, n. 3 (2005).  pp.:

This paper is a qualitative and quantitative corpus-based study of how academic writers use the personal pronouns I and inclusive and exclusive we. Using a multidisciplinary corpus comprising of journal research articles (RAs) from the fields of Business and Management, Computing Science, Economics, and Physics, I present data extracts which reveal how I and we can help writers create a sense of newsworthiness and novelty about their work, showing how they are plugging disciplinary knowledge gaps. Inclusive pronouns can act as positive politeness devices by describing and/or critiquing common disciplinary practices, and elaborating arguments on behalf of the community. They can also organize the text for the reader, and highlight the current problems and subject areas which preoccupy the field. The quantitative analysis reveals that while all instances of we in the Business and Management articles and all but one of the instances of we in the Economics articles are inclusive, only a third of the instances in the Computing articles and under 10 per cent of the instances in the Physics articles are inclusive. The study ends with a brief discussion of what a few English for Academic Purposes (EAP) textbooks tell students about inclusive and exclusive pronouns, and offers some suggestions for EAP classroom activities.

Hernández Guerra, Concepción and Juan M. Hernández Guerra “Discoursive analysis and pragmatic metadiscourse in four sub-areas of Economics research articles.” Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos (AELFE) vol., n. 16 (2008).  pp. 81-108.

English for Specific Purposes (ESP) and English for Academic Purposes (EAP) are two disciplines whose importance has been growing lately. This is due to the ever-increasing interest in the language that describes the most recent developments in varying disciplines and the need to communicate in and understand that language. One of the means for the spreading of those new developments in the different technologies is through research articles in specialised journals. These impose certain rules that must be fulfilled by all researchers who want to see their papers published. Much literature has been written about this, covering most linguistic areas (see, for instance, Bazerman, 1988; Bhatia, 1993; Dudley-Evans, 1994; Fortanet, 2002). Many articles contrast different genres but rarely distinguish different sub-areas within a genre (Bridgman & Carlson, 1984; Malcolm, 1987; Hyland, 1988; Neff Van Aertselaer, 2006). In the present case, we will consider texts within the discipline of Economics. The aim of this paper is to show a structural, grammatical and metatextual analysis of ten articles recently published in very prestigious specialised publications, covering the most important areas of study in Economics. We conclude by making a contrastive rhetorical analysis of the four sub-genres analysed here. These are: Applied Economy, Quantitative Economy, Financial Economy, and Management and Business.

Herrera Soler, Honesto “A metaphor corpus in business press headlines.” Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos (AELFE) vol., n. 15 (2008).  pp. 51-70.

In linguistics a corpus typically involves a finite body of texts which are considered to be representative of a particular variety of language at a specific time (McEnery & Wilson, 2001). Those are the assumptions we have had in mind in this metaphor corpus based on business press headlines. Our body of texts is a finite number of headlines drawn from the specific field of the business sections of three newspapers: Financial Times, El País and El Mundo, published over a period running from January to July 2003. Compiling a small corpus of non-literal instantiations as different authors have done (Cortés de los Ríos, 2001; Kövecses, 2002; Charteris-Black, 2003; Koller, 2004; Deignan, 2005; and others) will enable us first to identify whether the contextual meaning of a word or a multiword unit of headline contrasts with its basic meaning and whether the contextual meaning can be understood by comparison with that basic meaning, and then to categorize, both in the Spanish and in the British press, the different linguistic realizations of a headline in terms of their syntactic structure, metaphor foci and source domains.

Isaiah Wonho, Yoo “A Corpus Analysis of (The) Last/Next + Temporal Nouns.” Journal of English Linguisticsvol. 36, n. 1 (2008).  pp. 39-61.

Many reference grammars cover the use of last and next, but none pays overt attention to when and why those words combine with ° or the before temporal nouns; for example, (a) I came to Boston °last year and (b) I’ve been in Boston for the last year. Based on three theoretical notions of predicated time, extensivity, and the null article, and a corpus analysis of the tokens of (the) last/next from the Brown Corpus, the 1996 LA Times Corpus, and the Michigan Corpus of Academic Spoken English, this article presents a detailed account of when the determiners last and next combine with null or the; why last/next followed by singular temporal nouns occur with null, as in (a), or with the, as in (b); and why only singular temporal nouns, but not plural temporal nouns or non-temporal nouns, can combine with null + last/next.

Jensen Mlis, Bruce “The Case for Non-Intrusive Research: A Virtual Reference Librarian’s Perspective.” The Reference Librarian vol. 41, n. 85 (2004).  pp.:

Electronic reference facilitates analyses not possible in face-to-face and telephone transactions. Texts of e-mail and chat reference sessions disambiguate issues of accuracy, interview discourse, and, to a lesser extent, patron satisfaction. Authentic transcripts are here advanced as superior instruments for study of AskA services, with significant applications also in better understanding other modes of reference. Clandestine questioning by colleagues, MLIS students, and researchers afflicts online reference services; it is argued here that unobtrusive study techniques useful in traditional settings are inappropriate for online reference, generating dubious data while undermining service quality. This paper examines how research affects the work of virtual reference librarians, and suggests appropriatemeans of assessing virtual reference services for scholarly as well as library management purposes.

Jim and Miguel A. Nez-Crespo “The future of general tendencies in translation: Explicitation in web localization.” Target vol. 23, n. 1 (2011).  pp. 3-25.

Explicitation has long been considered a tendency in translation and has been empirically investigated by a number of scholars. This paper responds to Chesterman´s (2004a: 47) call to test explicitation phenomena on different translation modalities and types, and tests the explicitation hypothesis against a comparable web corpus containing 40m words. The fast evolving field of web localization was selected given that (1) if explicitation is a potential universal or general tendency, it should be equally present in current and future translation types; (2) localization is a specific case of translation in which explicitation might not be expected due to space constraints and web usability guidelines; (3) research using comparable web corpora has produced evidence contradicting other proposed general tendencies, such as conventionalization (Jiménez-Crespo 2009a; Kenny 2001). The results of the study confirm that despite specific constraints, localized texts show explicitation if contrasted with non-translated web texts belonging to the same digital genre.

Jiménez Crespo, Miguel A. “El uso de corpus textuales en localización.” Revista Tradumatica vol., n. 7 (2009).  pp.:

El proceso de digitalización en la sociedad contemporánea ha supuesto un crecimiento exponencial en los procesos de localización de hipertextos, software o videojuegos. En la actualidad, estos procesos se encuentran altamente estructurados en la industria de la localización, siendo fundamental el papel que desempeñan los sistemas de memoria de traducción y las bases terminológicas como recursos de apoyo al traductor. Sin embargo, un tercer recurso de amplio impacto en los Estudios de Traducción no se encuentra plenamente incorporado a este proceso tecnológico, los corpus textuales (Bowker y Barlow, 2008; Shreve, 2006). El presente artículo defiende la pertinencia de su uso en el proceso de localización y posterior evaluación (análisis de calidad), con el objetivo de producir textos localizados de más calidad que en mayor medida (1) se ajusten a las expectativas de la audiencia meta y (2) se aproximen al objetivo de ser recibidos como si se hubieran producido originalmente en la región lingüística de destino (LISA, 2003). Sus posibles usos y ventajas se ilustrarán mediante análisis de concordancias usando el Corpus Web del español recopilado por el autor (Jiménez-Crespo, 2008a). (A.)

Jiménez Crespo, Miguel Angel and Maribel Tercedor “Applying Corpus Data to Define Needs in Web Localization Training.” Meta: Journal des traducteurs = translators’ journal vol. 56, n. 4 (2011).  pp. 998-1021.

Localization is increasingly making its way into translation training programs at university level. However, there is still a scarce amount of empirical research addressing issues such as defining localization in relation to translation, what localization competence entails or how to best incorporate intercultural differences between digital genres, text types and conventions, among other aspects. In this paper, we propose a foundation for the study of localization competence based upon previous research on translation competence. This project was developed following an empirical corpus-based contrastive study of student translations (learner corpus), combined with data from a comparable corpus made up of an original Spanish corpus and a Spanish localized corpus. The objective of the study is to identify differences in production between digital texts localized by students and professionals on the one hand, and original texts on the other. This contrastive study allows us to gain insight into how localization competence interrelates with the superordinate concept of translation competence, thus shedding light on which aspects need to be addressed during localization training in university translation programs.

Johansson, Stig “Why change the subject? On changes in subject selection in translation from English into Norwegian.” Target: International Journal on Translation Studies vol. 16, n. 1 (2004).  pp.:

This paper reports on a study of syntactic changes in alternative translations of a short story and a scientific article, each translated by a group of ten professional translators. The subject is kept in approximately nine cases out of ten, with a somewhat higher degree of change in the scientific article. Where changes occur, they can very often be traced to differences between the languages on the lexical or syntactic level, but absolute differences signalled by identical behaviour of a whole translator group are as good as non-existent. After more features have been studied, it may be possible to identify profiles for the individual translators ­ and the two translator groups ­ showing to what extent their choices are guided by adequacy in relation to the source text vs. acceptability in relation to the target language.

Kast-Aigner, Judith “Terms in context: a corpus-based analysis of the terminology of the European Union’s development cooperation policy.” Fachsprache: Internationale Zeitschrift für Fachsprachenforschung -didaktik und Terminologie vol. 31, n. 3 (2009).  pp. 139-152.

Terms in context: a corpus-based analysis of the terminology of the European Union’s development cooperation policy. Judith Kast-Aigner. Fachsprache: Internationale Zeitschrift für Fachsprachenforschung -didaktik und Terminologie, ISSN 0256-2510, Vol. 31, Nº. 3-4, 2009, pags. 139-152

Kretzschmar, William “Habeas Corpus?” Journal of English Linguistics vol. 37, n. 1 (2009).  pp. 88-92.

At one point early in my career, in the early 1980s, I sat in a basement adjunctfaculty office at Loyola University in Chicago and made a 3 × 5 notecard for every time that William Caxton used the word history (or its aphetic variant, story) in his own writing, as published in Blake (1973). I wanted to know what Caxton meant by the word, what he could mean by it, because he had claimed that his edition of the King Arthur legend was a work of history. I later published what I discovered in Kretzschmar (1992), but I learned far more about words in use than what I published about Caxton’s sense of history: that the plural form of a word could mean something essentially different from its singular form; that different senses of a polysemous word might have different rates of occurrence; and that the same word might typically mean something different in different types of texts. In short, I discovered some of the basic findings of corpus linguistics

Kwasniak, Renata “Wer Hat Nun Den Salat? û Now Who’s Got the Mess? Reflections on Phraseological Derivation: From Sentential to Verb Phrase Idiom.” International Journal of Lexicography vol. 19, n. 4 (2006).  pp.:

The paper investigates a case of phraseological derivation for the sentential idiom Da haben wir den Salat (æthere we have the messÆ). Corpus data show the development over time of a new verb phrase idiom jmd. hat den Salat (æs.b. has the messÆ).

Laviosa, Sara “Core patterns of lexical use in a comparable corpus of English narrative prose.” Meta vol. 43, n. 4 (1998).  pp.:

This paper investigates the linguistic nature of English translated texts. The author’ corpus consists of a sub-section of the English Comparable Corpus (ECC). It comprises two collections of narrative prose in English: one is made up of translations from a variety of source languages, the other includes original English texts produced during a similar time span. The study reveals four patterns of lexical use in translated versus original texts.

Lawick, Heike Van “El corpus paralelo bitextual en la enseñanza de traducción: identificación y soluciones para DOCH.” Congreso AIETI vol. 2, n. (2006).  pp.:

Los corpora textuales como instrumento del aprendizaje autónomo Numerosos estudios han subrayado la utilidad de combinar la lingüística contrastiva, los estudios sobre traducción y la lingüística aplicada, basándose en metodologías compartidas de explotación de córpora (Laviosa, 2003). En los estudios traductológicos predomina el uso de córpora comparables1 bilingües o multilingües y de córpora paralelos2 bilingües o multilingües.

Lobanova, Anna, Tom Van Der Kleij, et al. “Defining Antonymy: A Corpus-based Study of Opposites by Lexico-syntactic Patterns.” International Journal of Lexicography vol. 23, n. 1 (2010).  pp. 19-53.

Using small sets of adjectival seed antonym pairs, we automatically find patterns where these pairs co-occur in a large corpus of Dutch, and then use these patterns to extract new antonym pairs. Evaluation of extracted pairs by five human judges showed that automatic scores correlate with human evaluation and that pattern-based methods can be used to extract new antonym pairs. The majority of extracted pairs were noun-noun pairs, contrary to expectations based on previous research. Additionally, the method identifies a subgroup of co-hyponyms that frequently function antonymously, and together with more traditional antonyms makes up a wider class of incompatibles, suggesting that antonymy is a diverse relation that includes pairs of different types and categories that are not captured by any single linguistic theory. Comparison with Dutch WordNet and an online Dutch dictionary shows that only a handful of extracted pairs are currently listed in these existing resources, emphasizing the usefulness of the project.

López Sanjuán, Victoria “Integración de los corpus como herramienta de apoyo en la enseñanza de ESP.”Porta Linguarum. Revista Internacional de Didáctica de las Lenguas ExtranjerasPorta Linguarum. Revista Internacional de Didáctica de las Lenguas Extranjeras vol., n. 10 (2008).  pp. 115-136.

En los últimos años se ha estudiado la aplicación de los corpus a la enseñanza de lenguas, dada su condición de textos reales. Sin embargo, para ver la validez de la aplicación de los corpus a la enseñanza de lenguas es necesario mostrarlo desde dos perspectivas diferentes: la de los aprendientes y la de los docentes que son quienes van a hacer uso de ellos en el proceso de aprendizaje. Además, aquí se propone su integración en el proceso de aprendizaje como herramienta de apoyo y se muestra un modo sencillo de compilar un corpus de ESP a través de Internet. (A)

Macken, Lieve, Orphée De Clercq, et al. “Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus.” Meta vol. 56, n. 2 (2011).  pp. 374-390.

This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright. We discuss the importance of parallel corpora in various research domains and contrast the Dutch Parallel Corpus with existing parallel corpora. The Dutch Parallel Corpus distinguishes itself from other parallel corpora by having a balanced composition and by its availability to the wide research community, thanks to its copyright clearance. All texts in the corpus are sentence-aligned and further enriched with basic linguistic annotations (lemmas and word class information). Approximately 25,000 words of the Dutch-English part have been manually aligned at the sub-sentential level. Rich metadata facilitates the navigability of the corpus and enables users to select the texts that satisfy their needs. The entire corpus is released as full texts in XML format and is also available via a web interface, which supports basic and complex search queries and presents the results as parallel concordances. The corpus will be distributed by the Flemish-Dutch Human Language Technology Agency (TST-Centrale).

Maia, Belinda “Word order and the first person singular in Portuguese and English.” Meta vol. 43, n. 4 (1998).  pp.:

From the perspective of contrastive linguistics, this article analyses the frequency and nature of the SVO sentence structure in English and Portuguese, particularly in those cases where the subject is realised by the first person pronoun I and eu respectively or by a name

Malmkjaer, Kirsten “Love thy neighbour: will parallel corpora endear linguists to translators?” Meta vol. 43, n. 4 (1998).  pp.:

This paper analyses the advantages and difficulties that the study of parallel corpora presents when attempting to answer questions arising specifically from within translation studies.

Maniez, Fran Ois “Text Corpora and Multilingual Lexicography.” Terminology vol. 14, n. 2 (2008).  pp. 266-272.

Reseña del libro: Wolfgang Teubert (ed.). 2007. Text Corpora and Multilingual Lexicography, Amsterdam: John Benjamins. ISBN-13: 9789027239655

Marco, Josep “Normalisation and the Translation of Phraseology in the COVALT Corpus.” Meta vol. 54, n. 4 (2009).  pp. 842-856.

Le présent article traite de l’hypothèse selon laquelle l’emploi d’unités phraséologiques dans les textes traduits peut être considéré comme un indicateur d’une tendance vers la normalisation. En effet, les unités phraséologiques sont des formes conventionnelles de la langue cible appartenant au répertoire lexical de cette dernière. Des données puisées dans le sous-corpus anglais-catalan de COVALT (corpus valencien de textes littéraires traduits) indiquent que les textes traduits en catalan sont moins que les textes sources anglais. Toutefois, cette différence est faible, ce qui semble témoigner d’un effort, de la part des traducteurs, pour préserver ou recréer une phraséologie significative dans les textes cibles. Cependant, il faudra mener d’autres études pour identifier les motivations sous-jacentes à cette pratique.

Marco, Josep “The translation of wordplay in literary texts: Typology, techniques and factors in a corpus of English-Catalan source text and target text segments.” Target vol. 22, n. 2 (2010).  pp. 264-297.

The present study aims to analyse wordplay translation on the basis of the three aspects mentioned in the title ­ wordplay typology, translation techniques and relevant factors. The theoretical framework is eclectic but draws particularly on Delabastita (1996, 1997) and Lladó (2002). Empirical analysis is based on three English source texts and six Catalan translations, and focuses on two main issues: the frequency distribution of pairs of ST + TT segments across translation techniques, and the possible correlation(s) between translation techniques and factors influencing decision-making. It is observed that translators tend to use techniques implying a negative punning balance, i.e. resulting in some degree of loss of punning activity. Moreover, some factors identified in the literature are seen to correlate with the use of particular translation techniques. Finally, in the last section an attempt is made to go beyond description and explanation and to assess wordplay translation techniques in terms of their suitability as translation solutions.

Maryam Mohammadi, Dehcheshmeh “Specialized Monolingual Corpora in Translation.” The Translation Journal vol. 11, n. 2 (2007).  pp.:

In the new world of technology, the translation profession, like other disciplines, cannot be deprived of modern tools such as electronic corpora. Recently, large monolingual, comparable and parallel corpora have played a crucial role in solving various problems of linguistics, including translation. In this study we shall attempt to show the effectiveness of a specialized monolingual corpus in translating various collocations usually found in political texts from English into Persian. This experiment compares the accuracy in translating collocations using a specialized monolingual corpus to the conventional resources (e.g. monolingual as well as bilingual dictionaries). The results show how the quality of translation can be improved using corpus-based translation tools.

Melby, Alan K. “MT+TM+QA : the future is ours.” Tradumática vol. 4, n. (2006).  pp.:

Aquest article fa prediccions sobre el futur de la traducció automàtica i els sistemes de memòries de traducció i el paper dels traductors com a garantia de qualitat. Els ordinadors adquiriran un paper cada cop més important en el processament mecanitzat de corpus de bitextuals existents, mentre que als traductors se’ls demanarà funcions de més nivell.

Mendoza García, Inmaculada and Nuria Ponce Márquez “Proposal for the analysis of the source text in the comprehension phase of the translation process: contextualization, and analysis of extra-linguistic and intra-linguistic aspects.” redit: Revista electrónica de didáctica de la traducción y la interpretación vol., n. 2 (2009).  pp. 128-150.

This paper underlines the importance of textual analysis in the comprehension phase of the translation process. It proposes a teaching activity model for first year Translation students, consisting mainly of three different stages focused on detecting and classifying translation problems in a specific text: contextualization of the source text and analysis of extra-linguistic and intra-linguistic aspects related to the translation process. For this purpose, we present a table-based methodology to be applied to the teaching of Basic Concepts for Interpreter and Translator Training.

Mosavi Miangah, Tayebeh “Constructing a Large-Scale English-Persian Parallel Corpus.” Meta vol. 54, n. 1 (2009).  pp. 181-188.

Au cours des derniFres annqes, l’exploitation de grands corpus de textes pour rqsoudre des problFmes linguistiques, notamment des problFmes de traduction, est devenue une pratique courante. Jusqu’a rqcemment, aucun corpus bilingue anglais-persan a grande qchelle n’avait qtq constituq, en raison des difficultqs qu’implique une telle entreprise.In recent years the exploitation of large text corpora in solving various kinds of linguistic problems, including those of translation, is commonplace. Yet a large-scale English-Persian corpus is still unavailable, because of certain difficulties and the amount of work required to overcome them.

Mukherjee, J. “Principles of Pattern Selection: A Corpus-Based Case Study.” Journal of English Linguistics vol. 29, n. 4 (2001).  pp.:

Analyses of linguistic corpora have revealed that natural language is to a very large extent based on (semi-)preconstructed phrases. Drawing on corpus-based ap-proaches to the description of such lexico-grammatical patterns in language use, the present study puts into perspective the question of why one and the same lexical item occurs in different patterns. The question of pattern selection (i.e., the analysis of factors that lead the language user to prefer a specific pattern in a given context) deserves further consideration. The present corpus-based case study is intended to illuminate this aspect of authentic language behavior.

Munday, Jeremy “A computer-assisted approach to the analysis of translation shifts.” Meta vol. 43, n. 4 (1998).  pp.:

This article is an analysis of shifts in Seventeen Poisoned Englishmen, Edith Grossman’s English translation, of a novel in Spanish by Gabriel García Márquez. It makes use of a variety of basic tools of corpus linguistics as aids to the inductive exploration of texts.

Nenadic, Goran, Irena Spasic, et al. “Mining term similarities from corpora.” Terminology vol. 10, n. 1 (2004).  pp. 55-81.

In this article, we present an approach to the automatic discovery of term similarities, which may serve as a basis for a number of term-oriented knowledge mining tasks. The method for term comparison combines internal (lexical similarity) and two types of external criteria (syntactic and contextual similarities). Lexical similarity is based on sharing lexical constituents (i.e. term heads and modifiers). Syntactic similarity relies on a set of specific lexico-syntactic co-occurrence patterns indicating the parallel usage of terms (e.g., within an enumeration or within a term coordination/conjunction structure), while contextual similarity is based on the usage of terms in similar contexts. Such contexts are automatically identified by a pattern mining approach, and a procedure is proposed to assess their domain-specific and terminological relevance. Although automatically collected, these patterns are domain dependent and identify contexts in which terms are used. Different types of similarities are combined into a hybrid similarity measure, which can be tuned for a specific domain by learning optimal weights for individual similarities. The suggested similarity measure has been tested in the domain of biomedicine, and some experiments are presented.

Nicaise, Laurent “On Going Beyond the Literal: Translating Metaphorical Conceptualizations in Financial Discourse.” Meta vol. 56, n. 2 (2011).  pp. 407-423.

This article, through a bilingual French-Dutch corpus, looks at how identical financial-economic concepts are articulated distinctively in the two language communities. The article shows that understanding why an author makes a particular (metaphorical) choice in the lexical repertoire of the discipline could provide language learners and translators with knowledge that will foster their understanding of financial-economic discourse and raise their awareness of ideological, pragmatic and cognitive differences between languages in the role of metaphors. The data indicate that a corpus analysis may contribute to explaining the impact of culture and other communication variables on the lexical realizations of financial-economic concepts in the press.

Pastor Enríquez, Verónica “Corpus et dictionnaires de langues de sp??cialit??” Terminology vol. 15, n. 2 (2009).  pp. 291-298.

Maniez, François, Pascaline Dury, Nathalie Arlin and Claire Rougement (eds.). 2008. Corpus et dictionnaires de langues de spécialité. Grenoble (France): Presses universitaires de Grenoble. ISBN: 978–2-7061-1481-6 Reviewed by Verónica Pastor Enríquez

Pecman, Mojca “Tentativeness in term formation: A study of neology as a rhetorical device in scientific papers.” Terminology vol. 18, n. 1 (2012).  pp. 27-59.

The study on term formation presented in this paper is related to the problem of determining the function of neologisms in scientific communication and to the issue of processing the concomitant variation, typical of such new denominations. Our analysis of scientific texts shows that neologisms can have quite a different role in scientific communication than they are generally credited with in terminological studies. The well-known referential role, consisting of the creation of a new designation for naming a new concept is overshadowed in scientific texts by a more rhetorical role. Here the scientist resorts consciously to variation, hence creating a “neology effect”, specifically for the reason of emphasising various novel aspects of his thought. This function of neology as a rhetorical device is generally glossed over in terminology studies, in much the same way as the analysis of variation used to be, due to the expected stability that neologism should eventually gain in line with well-established terms. Consequently, in this article, we try to place the phenomenon of neology within the framework of discourse analysis.

Pérez-Paredes, Pascual “Examen de la utilización del vocabulario por estudiantes de inglés con fines académicos: Análisis de un corpus especializado y de un corpus producido por estudiantes.” Examining English for academic purposes students’vocabulary output: corpus-aided analysis and learner corpora vol., n. monográfico (2005).  pp. 201-212.

La adquisición del léxico de una lengua extranjera es de suma imporratcia en el proceso de aprendizaje de la misma. No en vano, el vocabulario de un idioma se configura como uno de los elementos más sustantivos en la caracterización y representación de lo fenomenológico (Alcaraz 2000). El presente artículo se vale de los procedimientos de trabajo propios de la lingüística del corpus con una doble finalidad: favorecer las destrezas lectoras de los estudiantes y, a la vez, mejorar la capacidad de aprendizaje del léxico propio de un lenguaje especializado. Como parte de nuestra investigación, recopilamos un corpus del Joumaa of Psychotherapy. Este corpus se puso a disposición de los aprendices de inglés para fines específicos de la rama de Psicología. Asimismo se les pidió que redactasen un texto sobre la sub-especialidad en cuestión. Con estas redacciones recopilamos un corpus de aprendices de inglés que, posteriormente, usamos para comparar la utilización del léxico con el corpus anteriormente citado. Los resultados de nuestro trabajo confirman las conclusiones de investigaciones previas en el campo de la lingüística del corpus. Los estudiantes utilizaron en exceso el vocabulario muy técnico y el vocabulario general, delatando así su “presencia” como autores en mayor medida que los expertos en la lengua de especialidad. (A)

Picton, Aurélie “Picturing short-period diachronic phenomena in specialised corpora: A textual terminology description of the dynamics of knowledge in space technologies.” Terminology vol. 17, n. 1 (2011).  pp. 134-157.

This article presents a first description and a proposal for the classification of the evolution phenomena involved in short-term diachrony in the field of space. It is based on the principles of Textual Terminology and relies on a tool-based analysis of two diachronic corpora. The linguistic methodology is briefly described but the emphasis is on the list of evolution phenomena revealed through our analysis. These results present an original description of knowledge evolution: 17 types of evolution are listed, revealing the heterogeneity and richness of terminology dynamics and offering a descriptive basis to start new subsequent research that would complete this typology.

Possamai, Viviane “Catálogo de acceso libre a corpora relacionados con la traducción.” Catalogue of Free-Access Translation-Related Corpora vol., n. 7 (2009).  pp.:

La utilización de corpus ha aportado nuevas perspectivas al campo de la traducción durante los últimos 10 años. Se han utilizado corpus en distintas áreas relacionadas con la traducción, como por ejemplo en los estudios sobre la traducción y, en concreto, la didáctica o la práctica de la traducción. Con tan distintas aplicaciones y tantos usuarios, resulta comprensible porqué cada corpus disponible en Internet a día de hoy presenta características únicas y ofrece distintos tipos de información, herramientas o tipos de texto. En este catálogo se han incluido hipervínculos y sintetizado datos sobre corpus monolingües y corpus paralelos que pueden resultar útiles tanto en estudios de traducción como en la práctica de la traducción. (A.)

Puurtinen, Tina “Syntax, readability and idelogy en children’s literature.” Meta vol. 43, n. 4 (1998).  pp.:

