Category Archives: Corpus linguistics

Plenary: From data literacy to AI literacy: examining engagement with corpora and technology in language education

Prof Pascual Pérez-Paredes will deliver a keynote at the 2nd EUt+ International Conference on Languages  EUt LC 2024 -Merging New Trends and Consolidating Good Practices in Languages for Specific Purposes – UPCT, Cartagena, Spain, June 26-28, 2024

Abstract

In this talk, I discuss recent developments in what I have described elsewhere as Broad scope DDL (BsDDL) (Pérez-Paredes, 2024), an alternative approach that situates learners’ learning ecology (Pérez-Paredes, 2022b) at the centre of a learning process and where a variety of language data sources such as corpora, Gen AI and Large Language Models (LLMs) coexist. This ecology acknowledges the important role of new digital literacies and the symbolically mediated practices involving different types of knowledge and skills when engaged with texts in electronically mediated environments (Kern, 2021).

References

Boulton, A. (2021). Research in data-driven learning. In Pérez-Paredes, P., & Mark, G. (Eds.) Beyond concordance lines: Corpora in language education. John Benjamins, pp.9-34.

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta‐analysis. Language learning, 67(2), 348-393.

Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time. Language, Learning & Technology, 25(3), 66-89.

Boulton, A., & Vyatkina, N. (2023). Expanding Methodological Approaches in DDL Research. TESOL Quarterly.

British Council, The. (2023). Artificial intelligence and English language teaching: Preparing for the future. URL: https://www.teachingenglish.org.uk/publications/case-studies-insights-and-research/artificial-intelligence-and-english-language

Curry, N., Baker, P., & Brookes, G. (2024). Generative AI for corpus approaches to discourse studies: a critical evaluation of ChatGPT. Applied Corpus Linguistics, 4(1).

Kern, R. (2021). Twenty-five years of digital literacies in CALL. Language Learning & Technology, 25(3), 132–150.

Mizumoto, A. (2023). Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use. Applied Corpus Linguistics, 3(3), 100074.

Pérez-Paredes, P. (2010). Corpus Linguistics and Language Education in Perspective: Appropriation and the Possibilities Acenario. In T. Harris & M. Moreno Jaén (Eds.), Corpus Linguistics in Language Teaching (pp. 53-73). Peter Lang.

Pérez-Paredes, P. (2022a). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning, 35(1-2), 36-61.

Pérez-Paredes, P. (2022b). How learners use corpora. In R. R. Jablonkai & E. Csomay (Eds). The Routledge Handbook of Corpora and English Language Teaching and Learning (pp. 390-405). Routledge.

Pérez-Paredes, P. (2024) Data-driven learning in informal contexts? Embracing Broad Data-driven learning (BDDL) research. In Crosthwaite, P. (Ed.). Corpora for Language Learning: Bridging the Research-Practice Divide. Routledge.

Breaking ground: Discussing the present and the future of Data-driven learning – Online seminar

December 4, 2023

Click below to register. It’s free.

https://umurcia.zoom.us/webinar/register/WN_DuBwB0RnSUGzGfsxJ2i8jw#/r

Check out this event here

This online seminar is hosted and organized by Universidad de Murcia and Université de Lorraine.

Free registration

https://umurcia.zoom.us/webinar/register/WN_DuBwB0RnSUGzGfsxJ2i8jw#/registration

ICAME45 – Universidad de Vigo – 18-22 June 2024

International Computer Archive of Modern and Medieval English – ICAME 45 – Interlocking Corpora and Register(s): Diversity and Innovation 

Submission instructions:

SUBMISSION OF ABSTRACTS
The call for abstracts will close on 1 December 2023. Notification of acceptance will be sent out in early February 2024.

Abstracts should be between 400 and 500 words (excluding references) and should clearly state research question(s), approach, data, method, and (expected) results. Submission should be anonymous and via EasyAbs.

Authors may submit a maximum of two abstracts if at least one paper is co-authored.

We invite abstracts for the following presentation formats:
– Full paper: 20 + 10 mins discussion
– Short paper/Work-in-progress report: 10 + 5 mins discussion
– Software demonstration: 20 + 10 mins discussion
– Poster: on display for the duration of the conference and brief presentations (c.5 mins) during a special session. Maximum A0 size (portrait).

PRE-CONFERENCE WORKSHOPS
We also invite proposals for pre-conference workshops on Tuesday 18 June 2024. If you would like to convene a workshop, please send your proposal by 15 November 2023 directly to the organisers <icame45@uvigo.gal>.

The proposal should contain the title of the workshop, name and contact details of the organisers and of the proposed participants, and a short description of the topics to be discussed (between 400 and 700 words, excluding references). If the workshop is accepted, workshop conveners will be responsible for putting the individual abstracts together, including peer reviewing. This should be completed and notified to the conference organisers before 15 January 2024.

URL: https://icame45.webs.uvigo.es/

International Conference for Learner Corpus Research LCR 2024 Tartu 26-28 September 2024 

Keynote speakers: 

    Gaëtanelle Gilquin (Université Catholique de Louvain, Belgium) 
    Ilmari Ivaska (Turun Yliopisto, Finland) 
    Cristóbal Lozano (Universidad de Granada, Spain) 

Event Details: 

Date: 26-28 September 2024 
Location: Institute of Foreign Languages and Cultures and the Institute of Estonian and General Linguistics. University of Tartu, Estonia. 
Topics: Areas of interest include, but are not limited to, the following:  
        Language for academic purposes  
        Language for specific purposes  
        Language teaching, assessment and testing  
        Learner corpus-based SLA studies  
        Corpora as pedagogical resources  
        Multimodal learner corpora  
        Software for learner corpus analysis  
        Corpus-based translation studies  
        English as a Medium of Instruction (EMI)  
        English as a Lingua Franca (ELF)  
        Data mining and other explorative approaches to learner corpora  
        Statistical methods in learner corpus studies  
        Discourse analysis and pragmatics  
        Studies related to lexis: semantics, metaphor, etc.  
        NLP approaches  
        Complexity, accuracy and/or fluency (CAF) analysis  

Abstracts:  

A short summary of the intended presentation, capturing the central idea along with the research questions, methods of research and the (possibly tentative) key conclusions, also citing any relevant previous work or theoretical background of the field. Limited to 300 words, excluding keywords and references . Anonymous: the abstract itself should hold no reference to the author or their affiliation  

Further information is available at our webpage:https://lcr2024.ut.ee/main__;!!D9dNQwwGXtA!TUReXuBJrwvE8MwfDBbsYfDDv-8M-oqLj0P7oyev5SizIbp9MEDJtjrXsNNs5Xsv4AcNKfUqCXd0LcAsG_s$

Contact information: lcr2024@ut.ee

The Core Metadata Schema for L2 data

The Core Metadata Schema for L2 data: Collaborative efforts towards improved data findability, metadata quality and study comparability in L2 research

Dr Magali Paquot, UCLouvain

October 30, 18:00 (Madrid time) / 17:00 (UK time)

Registration: https://umurcia.zoom.us/webinar/register/WN_a6Wkw7llSG2HrvJ9yIGKvQ

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

Abstract

The Core Metadata Schema for L2 data consists in a comprehensive set of variables that encapsulate crucial information about L2 data. It is organized into several sections that describe specific aspects of a learner corpus. These include administrative details (e.g. authors or license), corpus design, text-related variables, learner-related variables, in-built annotation(e.g. details about manual or automatic annotation), information about annotators or transcribers (e.g. native language or language repertoire) and task-related details (e.g. instructions, time constraints) (Paquot et al., 2023). It is the result of extensive collaboration between learner corpus compilers at the Centre for English Corpus Linguistics (UCLouvain, Belgium) and EURAC Research (Bolzano, Italy), and a research data infrastructure expert and member of CLARIN’s metadata taskforce (König et al., 2022; Frey et al. 2023).

In this presentation, I will discuss the underlying rationale for the development of such a resource and present its second version. This will give me the opportunity to clarify in what ways we have tried to embark learner corpus researchers into this initiative and reiterate our hope that the LCR community will collaborate with us to refine the schema and align it with the evolving needs of the field.

References

Frey, J.-C., König, A., Stemle, E. & M. Paquot (2023). A core metadata schema for L2 data. Paper presented at the 32nd Conference of the European Second Language Association (EUROSLA), 30 August – 2 September 2023, University of Birmingham, UK.

König, A., Frey J.-C., Stemle, E., Glaznieks, A. & M. Paquot (2022). Towards standardizing LCR metadata. Paper presented at Learner Corpus Research 6, 22-24 September 2022, University of Padua, Italy.

Paquot, M., König, A., Stemle, E. & J.-C. Frey (2023). Core Metadata Schema for Learner Corpora, https://doi.org/10.14428/DVN/4CDX3P

Dr Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, Institut Langage et Communication, UCLouvain, and an affiliate member of the Corpus Linguistics Lab, University of Florida. She holds a PhD in Linguistics (Université catholique de Louvain) and a degree in Natural Language Processing (Université de Liège). Her research interests include (but are not limited to) corpus linguistics, learner corpus research, vocabulary, phraseology (collocations, lexical bundles, …), pedagogical lexicography, electronic lexicography, terminology, EAP (English for Academic Purposes), ESP (English for Specific Purposes), EFL (English as a Foreign Language), SLA (Second Language Acquisition), linguistic complexity and L1 influence.

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón

Contrastive approaches in corpus linguistics research

Dr Niall Curry, Manchester Metropolitan University

October 11, 18:00 (Madrid time) / 17:00 (UK time)

This talk is part of the Corpus linguistics & applied linguistics research 2023 online event.

Registration: https://umurcia.zoom.us/webinar/register/WN_d68rw3V_TnOGNWDg6sXHnw

Abstract

Comparability is a core criterion underpinning corpus linguistics research. From using a reference corpus to determine keywords to comparing across time, space, and language, corpus linguistics often draws on different data sets to tell us what is special about the language we are studying. This view has become so naturalised within corpus linguistics methodologies that discussions of comparability in corpus research are quite uncommon. This challenge of addressing comparability is long-standing in fields like contrastive analysis, which came to prominence and fell to decline owing to advances and limitations in methodological approaches, in part related to issues of comparability. In its most recent rise, as corpus-based contrastive linguistics, research has sought to merge contrastive and corpus linguistics approaches to address the weaknesses identified in contrastive analysis methodologies and enhance perspectives on comparability in corpus linguistics research. Merging contrastive and corpus linguistics approaches, this talk presents case studies with a view to interrogating issues of comparability in corpus analysis and establishing theoretical bases from which to draw meaningful comparisons across multilingual discourses. Specifically, the talk sheds light on the methodological pitfalls we encounter in comparing corpora representing a range of contexts and variables, the impact that our methods of analysis can have on our findings, and the importance of contextually situating contrastive studies from epistemological and ontological perspectives. The findings of the talk are intended to offer points of reflection for anyone applying contrastive approaches in corpus linguistics research, both across languages and across language varieties.

Dr Niall Curry is a Senior Lecturer in TESOL and Applied Linguistics within the Department of Languages, Information and Communications, at Manchester Metropolitan University. Currently, he is researching language relating to global crises and global issues. He is particularly interested in investigating how knowledge of these issues and crises is socially and discursively constructed across contexts, times, languages, and cultures with a view to understanding better how global issues vary across local contexts, and for international and local audiences. His areas of focus include (but are not limited to) issues such as climate, health, economics, and education. In parallel, Niall is conducting research on applied linguistics and TESOL related issues, spanning foci on register, genre, metadiscourse, materials development, and digital pedagogies.

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón

Multiple correspondence analysis and corpus linguistics research

Dr Isobelle Clarke, Lancaster University

October 25, 17:30 (Madrid time) / 16:30 (UK time)

This talk is part of the Corpus linguistics & applied linguistics research 2023 online event.

Registration: https://umurcia.zoom.us/webinar/register/WN_s0aPEXAFTQe_App0qS7Erg

Abstract

In this talk, I will describe what Multiple Correspondence Analysis (MCA) is and how it can be used for the Multi-Dimensional Analysis of short texts, as well as for corpus-assisted discourse analysis in an approach called Keyword Co-occurrence Analysis, drawing on the results of my own research on tweets (Clarke and Grieve, 2019; Clarke, 2022) and discourses of Islam in the UK press (Clarke et al. 2021; 2022). I will then go on to demonstrate how the results can be used to track communicative functions and discourses over time in diachronic analyses. Finally, I will discuss the limitations of MCA in these tasks.

Dr Isobelle Clarke‘s research interests include corpus linguistics, forensic linguistics, sociolinguistics and news discourse and discourse analysis. Her previous research covers language variation on social media, especially Twitter, and authorship analysis. Her current research examines the representation of Islam in the press and second learner language and spoken language. Dr Clarke received a Leverhulme’s early career researcher fellowship to investigate anti-science discourses, such as anti-vaccination discourse, climate change denials, and anti-GMO discourse.

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón