Pre-conference workshops

About

On Sunday 2^nd July 2023, we offered a range of half-day (3h), in-person, pre-conference workshops: practical sessions teaching participants useful skills in corpus linguistics such as corpus design or data analysis.

These workshops were free to registered attendees of the conference and were offered on a first come, first served basis. Refreshments were not provided for this day.

Workshops were scheduled to run parallel in either a morning (9.30-12.30) or an afternoon (13.30-16.30) slot. This means that participants had the option to attend one in the morning and/or one in the afternoon.

Morning workshops (9.30am-12.30pm)

Workshop 1: What can you do with the CLARIN research infrastructure?

Darja Fišer, Francesca Frontini, Paul Rayson and Martin Wynne
Fylde C17

This workshop will focus on practical issues for how corpus linguists can benefit from the well-developed ‘Common Language Resources and Technology Infrastructure’ (CLARIN) for language as social and cultural data (https://www.clarin.eu/). The organisers will present a brief overview of the CLARIN infrastructure including the easy to use language resources, the knowledge infrastructure, the participating consortia from across Europe, and a large set of resource families covering corpora, lexical resources, and NLP tools for tagging and annotation of corpora (https://www.clarin.eu/resource-families).

Participants will learn how to access and search the existing corpora and tools in the CLARIN research infrastructure and how to embed or deposit their own resources and tools in one of the centres of the infrastructure (with reference to annotation, formats and standards, metadata, licencing and documentation).

The organisers will present examples of key resources and tools such as those from the ParlaMint project (https://www.clarin.eu/parlamint).

Maximum number of participants: 40 people.

Workshop 2: #LancsBox X: A new powerful desktop tool for the analysis of millions and billions of words

Vaclav Brezina and William Platt
Hannaford Lab

In this workshop, the organisers introduce #LancsBox X: a new software package designed for very large corpora (millions and billions of words) with full support for XML. In #LancsBox X, one powerful search box combines the capabilities of simple searches, CQL and smart searches in user-defined subcorpora as well as corpora provide via #LancsBox such as the BNC2014. #LancsBox X includes a large number of innovative features such as the integration of the statistical package R, a new graphic engine (with D3 graphs) and a flexible user interface with multiple resizable windows.

#LancsBox X can be used by linguists and other social scientists as well as practitioners, lexicographers and teaching material developers. It is free to use for non-commercial purposes and works with any major operating system.

Maximum number of participants: 50 people.

Afternoon workshops (1.30pm-4.30pm)

Workshop 4: Spatial Humanities: Finding spatial and time narratives in corpus data

Ignatius Ezeani, Ian Gregory and Paul Rayson
Fylde C17

This workshop will explore practical solutions for corpus linguists, digital humanists and computing researchers to study time and spatial relationships in corpora. The workshop will be led by members of the ESRC-NSF funded Space Time Narratives project (https://spacetimenarratives.github.io/), which is developing approaches that allow us to identify, extract, visualise, and analyse qualitative and quantitative references to place and time. The organisers will demonstrate how these methods are currently being applied to analyse experiences of leisure travel and forced migration, generating complex cultural and experiential geographies.

Participants will explore a web based hands-on tutorial using a bespoke Python Notebook and Streamlit visualisation tool developed by the project team. All code and data will be open source and open access following the workshop. Participants will learn the basic skills needed to apply these techniques to the project data and, from there, to facilitate further application on their own corpora after the workshop.

Maximum number of participants: 40 people.

Workshop 5: Keyword Co-occurrence Analysis

Isobelle Clarke
Hannaford Lab

This workshop will introduce both keywords and a new approach to their analysis. Keywords offer analytical signposts to discourses in large volumes of text. Yet their interpretation often requires analysis of their use within their wider textual settings. Clarke et al. (2021) introduce Multiple Correspondence Analysis (MCA) as a new approach to organizing keywords statistically based on their co-occurrence across the texts of the corpus. The approach overcomes many of the issues in traditional keyword analyses and has proven to be effective for providing a more nuanced account of keywords that is sensitive to the various senses and discourses that a single keyword can exhibit. This workshop will provide learners with: (1) the understanding behind the MCA approach to keywords; (2) the tools to create a data matrix of variables and individuals; (3) the ability to run MCA in R; and (4) the skills for interpreting the results.

Maximum number of participants: 50 people.

Workshop 6: Corpus querying and corpus building with Sketch Engine

Ondřej Matuška
Management School A001

Sketch Engine is a leading corpus management software hosting hundreds of preloaded corpora and thousands of corpora built by individual users. Participants in the workshop will practise using a wide selection of corpus querying and corpus building tools (word sketch, sketch difference, thesaurus, wordlist, n-grams, keyword extraction, term extraction, concordance, WebBootCaT). Special attention will be paid to the Corpus Query Language (CQL) which will be taught through a series of off-line and online exercises.

The workshop is aimed at corpus novices as well as those who have some experience in using existing corpus systems. By the end of the workshop, participants will be able to analyse existing corpora as well as build their own corpora from the data they already have or by having relevant content downloaded from the web automatically. Each participant will receive a free 3-month access to Sketch Engine to further improve the skills acquired during the tutorial.

Maximum number of participants: 50 people.

Corpus Linguistics 2023

About

Morning workshops (9.30am-12.30pm)

Workshop 1: What can you do with the CLARIN research infrastructure?

Workshop 2: #LancsBox X: A new powerful desktop tool for the analysis of millions and billions of words

Afternoon workshops (1.30pm-4.30pm)

Workshop 4: Spatial Humanities: Finding spatial and time narratives in corpus data

Workshop 5: Keyword Co-occurrence Analysis

Workshop 6: Corpus querying and corpus building with Sketch Engine