Breaking ground: Discussing the present and the future of Data-driven learning – Research seminar

December 4, 202310:00 – 13:00 (Madrid/Paris/Rome time)

Click below to register. It’s free.

https://umurcia.zoom.us/webinar/register/WN_DuBwB0RnSUGzGfsxJ2i8jw#/registration

Scroll down for abstracts and speakers.

Download the webinar schedule.

Data-driven learning (DDL) is an approach to teaching and learning foreign languages in which learners are positioned as discoverers of linguistic patterns and rules (Johns, 1991: 2) by analyzing samples of authentic language, generally through the use of corpora.

DDL has gained traction in the last 30 years, as evidenced by an ever-growing body of research (Boulton & Vyatkina, 2021; Pérez-Paredes, 2022) highlighting DDL’s promotion of autonomous, personalized learning through the inductive exploration of authentic language. However, recent literature has seemingly identified issues with the consolidation of DDL as a major area of practice in CALL-related research (Pérez-Paredes, 2019), namely: (a) DDL needs to expand its toolset beyond hard-to-get, hard-to-use corpora; (b) DDL research seems too reliant on short-term, small-scale, quantitative analyses constrained to tertiary education students (Chambers, 2019; O’Keeffee, 2021a, 2021b; Meunier, 2020, 2022; Pérez-Paredes, 2019); (c) even though there have been attempts at connecting DDL with constructivism, usage-based theory, or sociocultural theory (Cobb, 1999; Flowerdew, 2015; Kirschner et al., 2007; Swain, 2006; Tomasello, 2003), there has been little direct exploration of the theoretical implications; and (d) there is still much work to be done in DDL based on LOTEs (Languages Other Than English). All of this would help DDL to reach a much-needed broader audience (Vyatkina, 2020). New perspectives are required in how to approach DDL methodology, its theoretical underpinnings and its coverage of languages other than English. A conversation is necessary to push forward new ideas: by exchanging views and discussing with experts we aim to construct a new DDL, built upon what has been done before but with an outlook to the future.


This online seminar aims to provide a stage for new voices to share their experiences on DDL research and their vision for its development. It is the result of a collaboration between the ATILF/University of Lorraine and the University of Murcia to bring together current and innovative research from current and future specialists in DDL . The webinar will take place on 4th December 2023 and it will consist of a 2-hour session in which early-career researchers will present their novel takes on DDL, followed by a 1-hour roundtable with experienced researchers (Alex Boulton-U. Lorraine, Elisa Corino-U. Torino and Pascual Pérez-Paredes -U. Murcia) and everyone involved in the webinar.

ABSTRACTS

Using DDL with large language models to increase linguistic input at lower proficiency levels (Szilvia Szita)

This contribution presents some possibilities to use large language models for DDL.

The principal aim of DDL is to offer opportunities for learners to observe used language in order to become a more competent language user. Corpora and related tools ensuring “condensed exposure to language” (Gabrielatos, 2005) reveal repetitions and variations across numerous speakers’ language use. However, learners of lesser taught, morphologically complex languages, such as Hungarian, cannot benefit from fully authentic large datasets at lower proficiency levels.

This presentation will show that using large language models trained by huge amounts of authentic data such as ChatGPT can significantly increase linguistic input for such languages. Since language learning usually starts with a topic (e.g., Braun, 2010), very small corpora of 5 to 10 short texts about the same topic can be generated at the learners’ proficiency level. Repetitions and variations in these natural-sounding texts draw learners’ attention to topic- and genre-related vocabulary. Since the datasets are small, they can be analysed both manually or with corpus tools.

Working with such texts right from the outset of language learning provides a good introduction for the concept of corpora and it raises learners’ awareness of what they can observe in large linguistic datasets. Furthermore, several texts on the same topic provide more opportunities to encounter similar vocabulary items in context and helps with their memorisation.

References

Braun, S. (2010). Getting past Groundhog day. Spoken multimedia corpora for student-centred corpus exploration. In Harris T. & Moleno Jaén, M. (dir.), Corpus linguistics in language teaching. Peter Lang, 75–98.

Gabrielatos, C. (2005). Corpora and language teaching: Just a fling or wedding bells? Teaching English as a Second Language – Electronic Journal, 8(4), 1–35.

Data-driven learning: Teaching French adjective placement based on Bayesian logistic regression (Jarvis Looi)

French adjectives can be anteposed or postposed with or without semantic changes. This paper exemplifies how adjective placement can be taught/learned using the techniques of data-driven learning (DDL), informed by a Bayesian mixed-effects binomial logistic regression model. This paper proposes DDL  as an alternative, an inductive method allowing learners to explore authentic language data, extract rules (cf. Johns, 1991) and learn lexico-grammatical features (Flowerdew, 2015). Using ancien, an adjective of the “third type” (cf. Schnedecker, 2002),as an example, we employ statistical methods to study how ancien is placed and used. Concordances are extracted from the reference corpus, Corpus d’Étude pour le Français Contemporain (CEFC) and manually annotated for seventeen variables of different natures (e.g., semantic, phonetic, etc.) Then, logistic regression is performed to study the associations between the predictors and the response (Gelman et al., 2020). This quantitative method respectively reveal general trends of associations and predictors with the strongest influence on the placement of ancien. Insights extracted from these analyses feed into the creation of exercises and activities in the form of paper-based DDL, covering areas like placement, collocations, colligations, etc.

Keywords: data-driven learning, French adjective placement, FLE, corpus, statistics.

DDL and automatic formative assessment for mathematics language learning (Cecilia Fissore)

Numerous studies in mathematics education have shown that the causes of disciplinary learning difficulties lie in the acquisition, understanding and management of one’s own language for specific purposes (LSP). DDL was born for the teaching and learning of foreign languages, but the exploration of corpora can effectively support reflection on LSP, even when it comes to the L1. I will present a research activity that combines DDL with automatic formative assessment in a digital learning environment in order to develop the language competences of Italian high school students in Mathematics. Some examples of corpus-based activities with automatic formative assessment carried out by students will be shown. These activities are adaptive questions with personalised, immediate and interactive feedback, providing information not only on how the DDL task was performed, but also on the process to be mastered, thus enabling self-regulation and self-monitoring of actions. The DDL approach provides students with the linguistic key to the content and, in the case of Mathematics, is proving effective in helping them to understand and manage an LSP.

Unveiling the processes of corpus use in learning evaluative language in scientific research articles: A case study in the EFL context in China (Jenny Lin Jiang)

To bridge the gap between numerous corpus-based studies on evaluative language in academic writing and limited empirical research into pedagogical applications of DDL to teaching it, this research explores corpus use in teaching evaluative language in scientific research articles in the EFL context in China. An empirical study was conducted with a class of Chinese EFL postgraduates during a 16-session course over 8 weeks in a university in China. The intervention employed a range of corpus tools, including a web-based concordancer, AntFileConverter, AntConc, TagAnt and AntMover to guide students to compare evaluative language use in published research articles in their fields and their own essays from both lexico-grammatical and discoursal levels. In this presentation, the presenter will report the results concerning the processes of corpus use, one of the three dimensions (perceptions, processes, products) of corpus use explored in this research. The findings from the analysis of search logs and screen recordings as well as the stimulated recalls of search logs, screen recordings and essays may shed some light on the application of corpus tools to teaching rhetorical functions in academic writing and the integration of corpus materials into regular EAP courses.

Data-Driven Learning on the acquisition of Spanish lexical items: near-synonyms and multiwords (Malena Abad Castelló)

While Data-Driven Learning (DDL) has proved to be an effective approach in language learning, empirical research in the field of Spanish as a foreign language (SFL) is scarce. This paper reports a study exploring the effects of DDL on the acquisition of lexical units by older British students of Spanish as well as their perception of this methodology. The target forms were five types of lexical items: near-synonyms, collocations, discourse markers, idioms and institutionalised expressions. 

26 students (from three different groups) took part in a longitudinal case study in two phases along three terms. In the first phase students worked over two terms on the target content with both direct and indirect use of two corpora (CORPES XXI and Corpus del Español).  A pre-test and a post-test were administered at the beginning and at the end of both terms respectively. In addition, students answered a questionnaire at the beginning of the study and at the end of each term. In the second phase in the third term students used corpora independently. Subsequently five students volunteered to respond to one-to-one interviews about their perception of corpora as a learning and reference tool.

Quantitative and qualitative data were analysed after the treatment with encouraging results and positive reception from the students.

Mobile assisted language learning and DDL: activities for young learners of Italian (Alessandra Cacciato)

The Data-driven learning approach exploits the use of software-based tools and the affordances of new technologies, including mobile devices. As a matter of fact, the penetration of mobile devices into daily life is increasing and mobile-assisted language learning (MALL) research is vibrant and in expansion. However, despite several calls in this direction, mobile DDL remains almost unexplored and few researchers have investigated its potential (Quan, 2016; Pérez- Paredes et al., 2019). 

To address this topic, a mobile DDL application for Italian in French secondary schools is currently under development and it will be piloted in 2024. The project aims to present an innovative tool to promote individual and personalized learning and to implement an active and learner-centred approach.

This presentation shows some mobile DDL activities, delving into the development of pedagogical interfaces for mobile language learning applications. The activities target high school studentsenrolled in the EsaBac program, who aim to obtain both a French and Italian diplomas and progress from A2 to a B2 level of competence in Italian, according to the CEFR. Therefore, to maximize the DDL benefits, the application is designed to be consistent with the school curriculum and to give learners significant autonomy in their progress.

References

Pérez-Paredes, P., Guillamón, C. O., Van de Vyver, J., Meurice, A., Jiménez, P. A., Conole, G., & Hernández, P. S. (2019). Mobile data-driven language learning: Affordances and learners’ perception. System84, 145-159. 

Quan, Z. (2016). Introducing ‘mobile DDL (data-driven learning)’ for vocabulary learning: An experiment in the context of EAP (English for Academic Purposes). Journal of Computers in Education, 3(3), 273- 287.

SPEAKERS

Malena Abad Castelló is a teacher and teacher trainer at Instituto Cervantes Manchester, UK, with headquarters in Spain. She holds a PHD in Applied Linguistics from the Universidad Internacional Iberoamericana, México. Her research focuses on the use of corpora in language learning and teacher. Other interests include teacher training, affective elements in language learning and vocabulary learning and teaching. 

Alessandra Cacciato is a PhD candidate in Applied Linguistics at the University of Lorraine, France and ATILF (Analyse et Traitement Informatique de la Langue Française, France). She is part of the research group Corpus & didactique des langues and her research focuses on mobile assisted language learning and data-driven learning for Italian with pre-tertiary learners.

Cecilia Fissore holds a degree in Mathematics and she is a researcher in Mathematics at the Department of Molecular Biotechnology and Health Sciences of the University of Turin. She collaborates on numerous research projects and she studies innovative methodologies for STEM education: automatic formative assessment with interactive feedback and data-driven learning for the study of specialised languages; collaborative learning in a Digital Learning Environment; problem solving with an Advanced Computing Environment. She is a member of the DELTA – Digital Education for Learning and Teaching Advances – Research Group of the University of Turin.

Jenny Lin Jiang is a PhD candidate at the University of Cambridge. Her research interest includes corpus linguistics, data-driven learning, English for academic purposes, and vocabulary acquisition. Her PhD research explores corpus use in teaching evaluative language in scientific research articles to postgraduates in the EFL context in China.

Jarvis Looi is a PhD candidate under a Dual PhD programme between the University of Lorraine, France and the University Malaya, Malaysia. His research focuses on data-driven learning (DDL) and corpus linguistics in French and English. His research explores the effects of DDL on the learning of epithet adjective placement among learners of French as a foreign language.

Carlos Ordoñana Guillámón holds a PhD degree in applied linguistics and is a part-time lecturer at the University of Murcia. His main interests cover the use of corpora for language education, Data-Driven Learning, and the application of new technologies in the classroom. He has published in international journals such as System and Computer Assisted Language Learning.

Szilvia Szita is currently working at the Faculty of Languages of University of Strasbourg, France, where I teach German, Hungarian and linguistics. She is a member of the LILPA (Linguistics, Language and Speech) research group, with a keen interest in new technologies and corpora for pedagogical purposes. She is also involved in the construction of several pedagogical corpora for Hungarian and German. Finally, she writes manuals and grammars for German and Hungarian and train language teachers.

Research seminar coordinators: Dr Carlos Ordoñana Guillamón (U. Murcia) & Alessandra Cacciato (ATILF/U. Lorraine).

Organising committee:

Dr Carlos Ordoñana Guillamón (U. Murcia)

Alessandra Cacciato (U. Lorraine).

Prof Alex Boulton (U. Lorraine)

Prof Pascual Pérez-Paredes (U. Murcia)

Dr Pilar Aguado (U. Murcia)

This online seminar is hosted and organized by Universidad de Murcia and Université de Lorraine.

Free registration

https://umurcia.zoom.us/webinar/register/WN_DuBwB0RnSUGzGfsxJ2i8jw#/registration

The presentations, Q&A and roundtable will be in English.