Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in Zeitschrift für interkulturellen Fremdsprachenunterricht, 2022
The assessment of written performance in German as a foreign language (GFL) instruction is often based on numerical scales according to descriptive rubrics. Despite the widespread use of human rubrics-based assessment, little is known about the linguistic features that have an impact on human rating. The present study analyses a longitudinal GFL corpus consisting of 60 texts that were rated along rubrics for cohesion and coherence. The human ratings were correlated with over 200 automatically computed linguistic features by means of a correlation analysis and a linear regression model. The results show that texts with a high cohesion score have a high degree of semantic and conjunctive relatedness and an increased use of coreferential cohesion through overlapping stems. A number of texts, however, did not fit this model. To investigate additional explanatory factors, an in-depth analysis of several texts was conducted, targeting the coreference relations at text level. This approach yielded promising results regarding its explanatory power. The results are compared with previous research on English as a FL and implications for writing pedagogy are discussed. Download paper here
Recommended citation: Wedig, H. & Strobl, C. (2022). Die Bewertung von Kohärenz und Kohäsion in narrativen DaF-Texten. Eine korpusbasierte Untersuchung sprachlicher Einflussfaktoren. Zeitschrift für Interkulturellen Fremdsprachenunterricht, 27(1), 369–396. https://ojs.tujournals.ulb.tu-darmstadt.de/index.php/zif/article/view/1172/1167
Published in Eurac Research CLARIN Centre, 2023
The Beldeko (Belgisches Deutschkorpus) Summary Corpus is a learner corpus that consists of summaries written by advanced L2 German learners (CEF level B2-C1) with L1 Dutch. It has been created with the aim of investigating the academic writing skills in L2 German of third-year students of two bachelor programmes in Applied Linguistics and Linguistics and Literature, respectively. The corpus consists of 301 summaries (70774 tokens) written by 115 students of three intact classes (convenience sampling).
Recommended citation: Strobl, C. & Wedig, H. (2023). Beldeko Summary Corpus v1.1.0, Eurac Research CLARIN Centre, http://hdl.handle.net/20.500.12124/68. http://hdl.handle.net/20.500.12124/68
Published in Bochumer Linguistische Arbeitsberichte (BLA), 2024
Bislang existieren nur sehr wenige deutsche L1-Korpora mit Texten von jungen Schulkindern und häufig sind diese nicht frei verfügbar oder liegen in den verschiedensten Formaten und mit unterschiedlich detaillierten Annotationen vor, was die Erforschung des Schriftspracherwerbs erschwert. Ziel dieses Projekts war deshalb die Erstellung eines deutschen Kinderreferenzkorpus mit Texten von und für Grundschulkinder(n) aus drei großen deutschen L1-Korpora (Osnabrücker Bildergeschichtenkorpus, H1 Children’s Writing Korpus, Litkey-Korpus) sowie zwei Internetressourcen (Grundschulwiki, Klexikon). Die fünf Subkorpora wurden semi-automatisch mit zahlreichen linguistischen Annotationen ange- reichert (Transkriptionen, orthographische und grammatische Zielhypothesen, POS-Tags, Dependenz- relationen, Satzgrenzen, direkte Rede, Phoneme, Grapheme, Silben, Morpheme, Rechtschreibfehler, Metadaten) und einheitlich im LearnerXML-Format gespeichert, das für diesen Zweck erweitert wurde. Die vorliegende Dokumentation gibt einen Überblick über die verschiedenen in diesem Projekt durchgeführten Verarbeitungsschritte und Ergebnisse. Sie enthält zudem eine Anleitung, wie weitere Daten zu dem Korpus hinzugefügt werden können.
Recommended citation: Ortmann, K. & Wedig, H. (2024). KidRef. Ein Kinderreferenzkorpus. Bochumer Linguistische Arbeitsberichte (BLA), 26. https://www.linguistics.ruhr-uni-bochum.de/forschung/arbeitsberichte/26.pdf
Published in Eurac Research CLARIN Centre, 2024
The GerSumCo (German Summary Corpus) is a learner corpus comprising syntheses written by L2 German writers (CEFR B2/C1) and writers of L1 German. The corpus has been created with the objective of conducting a comparative analysis of the academic writing of L1 German and L2 German students. The two subcorpora (L1 and L2) contain a total of 286 texts (178 L1 and 108 L2), written by 286 students at 14 universities and language schools in Germany (Bamberg, Bochum, Dresden, Hamburg, Hildesheim, Kiel, Leipzig, Magdeburg, Osnabrück, Potsdam, Trier, Wuppertal), Poland (Gdansk) and China (Hangzhou). The texts were collected between 2022 and 2024 as part of a PhD research project about a contrastive interlanguage analysis using GerSumCo and Beldeko to identify L1-dependent features in cohesion in L2/L1 German.
Recommended citation: Wedig, H. & Strobl, C. (2024). German Summary Corpus (GerSumCo) v1.0.0, Eurac Research CLARIN Centre, http://hdl.handle.net/20.500.12124/81. http://hdl.handle.net/20.500.12124/81
Published in Korpora Deutsch als Fremdsprache (KorDaF), 2024
Konnektive fungieren als ein zentrales Kohäsionsmittel und dienen der expliziten Verknüpfung von Propositionen. Bisherige Studien für Englisch als Fremdsprache weisen auf eine aufgabentyp-spezifische Nutzung von solchen kohäsiven Mitteln – insbesondere von Konnektiven – hin, die auch in den Texten von Deutsch-als-Fremdsprache-Lernenden (DaF-Lernenden) zu erwarten sind. Allerdings ist unerforscht, welche spezifischen Präferenzen im Gebrauch von Konnektiven in argumentativen und deskriptiven Texten von fortgeschrittenen DaF-Lernenden zu finden sind. Im vorliegenden Beitrag wird dieser Frage mithilfe einer korpusbasierten Untersuchung explorativ nachgegangen: Insgesamt wurden 212 Texte von erwachsenen DaF-Lernenden (GER-Sprachniveau: B2 und C1) sowie Schreibenden mit Deutsch als Erstsprache aus dem Essaykorpus (argumentative Texte) von Falkound dem Zusammenfassungskorpus (deskriptive Texte) GerSumCo untersucht. Die Ergebnisse der Pilotstudie verdeutlichen, dass einzelne Konnektive aufgabentyp-spezifisch benutzt werden und lernerspezifischer Konnektivgebrauch stattfindet.
Recommended citation: Wedig, H., Amet, B., Goschler, J. & Strobl, C. (2024). Aufgabentyp-spezifischer Konnektivgebrauch in schriftlichen Texten von DaF-Lernenden. Eine korpusbasierte Untersuchung. Korpora Deutsch als Fremdsprache, 4(2), 126–148. https://doi.org/10.48694/kordaf.4129. https://doi.org/10.48694/kordaf.4129
Published in Journal of open research software, 2024
Establishing the phonological Levenshtein distance (PLD) of words and pseudowords is useful for various psycholinguistic research applications, such as generating stimuli for experiments on language processing or analysing the PLD between erroneous and intended utterances. PLD2flex is a tool for establishing the PLD of (pseudo)words in pairwise, one-to-many and many-to-many comparisons of orthographic word forms, including one-to-many comparisons of (pseudo)word forms with databases of words generated from corpora. Specifically, one-to-many comparisons can be used to determine the average distance of a word to its 20 closest neighbours (PLD20), in analogy to the OLD20-measure for written words. PLD2flex makes use of the BAS Web Services API and several third-party python libraries. PLD2flex is freely available at https://github.com/FelixTheodor/PLD2flex.
Recommended citation: Wedig, H., Theodor, F., Wieler, J. & Belke, E. (2024). PLD2flex: Establishing the Phonological Levenshtein Distance for Pairs or Groups of (Pseudo)Words. Journal of open research software, 12 (1), https://doi.org/10.5334/jors.510. https://doi.org/10.5334/jors.510
Published in Continuing Learner Corpus Research: Challenges and Opportunities, 2025
Recommended citation: Wedig, H., Strobl, C., Ureel, J. J. J., & Mortelmans, T. (forthcoming). The use of connectives in L2 German writing by L1 Dutch students: A learner corpus study. In Katherine Ackerley & Erik Castello (Eds.), Continuing Learner Corpus Research: Challenges and Opportunities (pp. 213–243). Presses universitaires de Louvain.
Published in How to Do Things with Corpora - Methodological Issues and Case Studies. Empirical and Theoretical Linguistics, 2025
Recommended citation: Wedig, H., Strobl, C., Ureel, J. J. J., & Mortelmans, T. (2025). The Beldeko corpus as a resource to investigate cohesion in German learner language: A preliminary analysis of corpus homogeneity. In T. Leuschner, J. Barðal, G. Delaby & A. Vajnovszki (Eds.), How to Do Things with Corpora - Methodological Issues and Case Studies. Empirical and Theoretical Linguistics. J.B. Metzler. https://link.springer.com/book/9783662696897#overview
Published in Doctoral dissertation, 2025
Recommended citation: Wedig, Helena (2025). A corpus-based analysis of connectives in L2 German : insights into the effect of learners native language on academic writing in a foreign language. [Doctoral dissertation, University of Antwerp]. Repository UAntwerp. https://doi.org/10.63028/10067/2123600151162165141
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract The German Summary corpus (GerSumCo) is a new corpus for contrastive research into German as a second (L2) vs. first language (L1). GerSumCo was created to investigate cohesion in academic L2 German writing produced by advanced learners. There are several corpora for the contrastive investigation of German learner language available, targeting diverse acquisition levels, text types and L1 backgrounds (e.g., KOLAS: Knorr & Andersen, 2017; Falko: Lüdeling et al., 2008). However, whereas summary writing is an interesting genre for the analysis of cohesion (as seen in Walter, 2007), the only existing corpus of summaries to date is the Falko summary subcorpus (Lüdeling et al., 2008). Preliminary analyses of the Falko summary L2 subcorpus revealed a high degree of patchwriting, i.e., students copy-pasting larger chunks of text from the original text. Since this creates a bias in the data, we decided to compile a new summary corpus. The specificity of our corpus is twofold: First, students created summaries from two different source texts, i.e., they needed to create their own coherent flow, which diminishes the problem of patchwriting. Second, all summaries were written based on the same source texts and under comparable conditions: All students had to write a summary of two popular scientific texts about a topic related to language variation in contemporary German (e.g., Kiezdeutsch, Mundartdebatte in der Schweiz).
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract
Published:
Abstract