Collecting and Sharing corpora for language and speech disorders

DELAD/CLARIN workshop at the ICL conference Poznan

11 -12 September 2024

DELAD is an initiative that facilitates the sharing of corpora of speech of individuals with communication disorders (CSD) among researchers. We do this in a GDPR compliant way and at secure repositories in the CLARIN infrastructure. See our website.

DELAD regularly organises workshops around the themes:

Guidelines for collecting and sharing CSD
Ethics and legal aspects
Levels of anonymisation
Layered access of data
Integration of CSD in the CLARIN infrastructure
Formats
Relevant metadata

For themes and reports of our previous workshops, visit our website https://delad.ruhosting.nl/wordpress/delad-workshops-2017-2020/ and 2021, 2022.

We have organised this workshop in conjunction with the ICL Conference in Poznan in 2024: https://icl2024poznan.pl/. It was a hybrid workshop held on 11th and 12th September 2024 as a lunch to lunch meeting.

We invited researchers working with CSD to present their work, and address their data sharing methods including any obstacles encountered.

The programme featured presentations from DELAD representatives about sharing CSD via DELAD and some latest updates, including a new CLARIN Resource Family page for corpora with communication disorders (see https://www.clarin.eu/resource-families/corpora-disordered-speech). Other topics are related to metadata deemed relevant for making such datasets findable and a panel discussion about the role that Large Language Models (such as ChatGPT) can play in our research.

The workshop was sponsored by CLARIN ERIC.

Program for 11 and 12 September 2024:

Date	Time (CEST)	Topic	Agenda
11 Sep	14:30-14:45	Welcome & Introduction	Introduction (Henk van den Heuvel & Katarzyna Klessa
14:45-15:30	Recent development at ACE / DELAD Chair: Katarzyna Klessa	A CLARIN Resource Family for Corpora of Communication Disorders & Questionnaire about data sharing (Henk van den Heuvel & Satu Saalasti)
	COFFEE BREAK
15:45-16:55	Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Satu Saalasti	20-minute presentations & 10-minute discussion: – “Challenges in data sharing from a clinical perspective: a use case of voice data from patients with COPD” (Loes van Bemmel) – “Using a portable system for multi-channel audio data acquisition and processing” (Anita Lorenc et al.) – “Corpus-based research into intra- and interpersonal language variation in people with aphasia” (Marina Ruiter et al.)
	BREAK
17:10-18:20	Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Henk van den Heuvel	20-minute presentations & 10-minute discussion: – “Dysarthric speech database in Dutch and English for personalized dysarthric speech recognition” (YuanYuan Zhang et al.) – “The Icelandic Language Biobank: Data Collection through a Clinical Analysis Platform” (Iris Nowenstein et al.) “STAR – A Speech Therapy Animation and imaging Resource” (Eleanor Lawson et al.)
12 Sep	10:00-10:15	COFFEE BREAK at ICL
10:15-10:30	Welcome	Welcome & Wrap-up of Day 1 Satu Saalasti
10:30-11:00	Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Katarzyna Klessa	20-minute presentation & 10-minute discussion: – “Sensitive Data in HPC – How secure can it be?” (Matthiesen)
11:00-12:00	The impact of AI on research and treatment of language & speech impairments Chair & Moderator: Henk van den Heuvel	30-minute Introduction (Zhengyun Yue) & 30-minute Panel discussion
12:00-12:15	Conclusion of Workshop	Wrap-up
12:15-	LUNCH

All material is published on Zenodo: https://doi.org/10.5281/zenodo.13970356 including recordings of the sessions (per day)