The Core Metadata Schema for L2 data

The Core Metadata Schema for L2 data: Collaborative efforts towards improved data findability, metadata quality and study comparability in L2 research

Dr Magali Paquot, UCLouvain

October 30, 18:00 (Madrid time) / 17:00 (UK time)

Registration: https://umurcia.zoom.us/webinar/register/WN_a6Wkw7llSG2HrvJ9yIGKvQ

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

Abstract

The Core Metadata Schema for L2 data consists in a comprehensive set of variables that encapsulate crucial information about L2 data. It is organized into several sections that describe specific aspects of a learner corpus. These include administrative details (e.g. authors or license), corpus design, text-related variables, learner-related variables, in-built annotation(e.g. details about manual or automatic annotation), information about annotators or transcribers (e.g. native language or language repertoire) and task-related details (e.g. instructions, time constraints) (Paquot et al., 2023). It is the result of extensive collaboration between learner corpus compilers at the Centre for English Corpus Linguistics (UCLouvain, Belgium) and EURAC Research (Bolzano, Italy), and a research data infrastructure expert and member of CLARIN’s metadata taskforce (König et al., 2022; Frey et al. 2023).

In this presentation, I will discuss the underlying rationale for the development of such a resource and present its second version. This will give me the opportunity to clarify in what ways we have tried to embark learner corpus researchers into this initiative and reiterate our hope that the LCR community will collaborate with us to refine the schema and align it with the evolving needs of the field.

References

Frey, J.-C., König, A., Stemle, E. & M. Paquot (2023). A core metadata schema for L2 data. Paper presented at the 32nd Conference of the European Second Language Association (EUROSLA), 30 August – 2 September 2023, University of Birmingham, UK.

König, A., Frey J.-C., Stemle, E., Glaznieks, A. & M. Paquot (2022). Towards standardizing LCR metadata. Paper presented at Learner Corpus Research 6, 22-24 September 2022, University of Padua, Italy.

Paquot, M., König, A., Stemle, E. & J.-C. Frey (2023). Core Metadata Schema for Learner Corpora, https://doi.org/10.14428/DVN/4CDX3P

Dr Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, Institut Langage et Communication, UCLouvain, and an affiliate member of the Corpus Linguistics Lab, University of Florida. She holds a PhD in Linguistics (Université catholique de Louvain) and a degree in Natural Language Processing (Université de Liège). Her research interests include (but are not limited to) corpus linguistics, learner corpus research, vocabulary, phraseology (collocations, lexical bundles, …), pedagogical lexicography, electronic lexicography, terminology, EAP (English for Academic Purposes), ESP (English for Specific Purposes), EFL (English as a Foreign Language), SLA (Second Language Acquisition), linguistic complexity and L1 influence.

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón