Collecting and Sharing corpora for language and speech disorders
DELAD/CLARIN workshop at the ICL conference Poznan
11 -12 September 2024
DELAD is an initiative that facilitates the sharing of corpora of speech of individuals with communication disorders (CSD) among researchers. We do this in a GDPR compliant way and at secure repositories in the CLARIN infrastructure. See our website.
DELAD regularly organises workshops around the themes:
- Guidelines for collecting and sharing CSD
- Ethics and legal aspects
- Levels of anonymisation
- Layered access of data
- Integration of CSD in the CLARIN infrastructure
- Formats
- Relevant metadata
For themes and reports of our previous workshops, visit our website https://delad.ruhosting.nl/wordpress/delad-workshops-2017-2020/ and 2021, 2022.
We have organised this workshop in conjunction with the ICL Conference in Poznan in 2024: https://icl2024poznan.pl/. It was a hybrid workshop held on 11th and 12th September 2024 as a lunch to lunch meeting.
We invited researchers working with CSD to present their work, and address their data sharing methods including any obstacles encountered.
The programme featured presentations from DELAD representatives about sharing CSD via DELAD and some latest updates, including a new CLARIN Resource Family page for corpora with communication disorders (see https://www.clarin.eu/resource-families/corpora-disordered-speech). Other topics are related to metadata deemed relevant for making such datasets findable and a panel discussion about the role that Large Language Models (such as ChatGPT) can play in our research.
The workshop was sponsored by CLARIN ERIC.
Program for 11 and 12 September 2024:
Date | Time (CEST) | Topic | Agenda |
---|---|---|---|
11 Sep | 14:30-14:45 | Welcome & Introduction | Introduction (Henk van den Heuvel & Katarzyna Klessa |
14:45-15:30 | Recent development at ACE / DELAD Chair: Katarzyna Klessa | A CLARIN Resource Family for Corpora of Communication Disorders & Questionnaire about data sharing (Henk van den Heuvel & Satu Saalasti) | |
COFFEE BREAK | |||
15:45-16:55 | Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Satu Saalasti | 20-minute presentations & 10-minute discussion: – “Challenges in data sharing from a clinical perspective: a use case of voice data from patients with COPD” (Loes van Bemmel) – “Using a portable system for multi-channel audio data acquisition and processing” (Anita Lorenc et al.) – “Corpus-based research into intra- and interpersonal language variation in people with aphasia” (Marina Ruiter et al.) | |
BREAK | |||
17:10-18:20 | Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Henk van den Heuvel | 20-minute presentations & 10-minute discussion: – “Dysarthric speech database in Dutch and English for personalized dysarthric speech recognition” (YuanYuan Zhang et al.) – “The Icelandic Language Biobank: Data Collection through a Clinical Analysis Platform” (Iris Nowenstein et al.) “STAR – A Speech Therapy Animation and imaging Resource” (Eleanor Lawson et al.) | |
12 Sep | 10:00-10:15 | COFFEE BREAK at ICL | |
10:15-10:30 | Welcome | Welcome & Wrap-up of Day 1 Satu Saalasti | |
10:30-11:00 | Presentations by researchers about current status of their CDS & potential of ACE for CDS sharing Chair: Katarzyna Klessa | 20-minute presentation & 10-minute discussion: – “Sensitive Data in HPC – How secure can it be?” (Matthiesen) | |
11:00-12:00 | The impact of AI on research and treatment of language & speech impairments Chair & Moderator: Henk van den Heuvel | 30-minute Introduction (Zhengyun Yue) & 30-minute Panel discussion | |
12:00-12:15 | Conclusion of Workshop | Wrap-up | |
12:15- | LUNCH |
All material is published on Zenodo: https://doi.org/10.5281/zenodo.13970356 including recordings of the sessions (per day)