Progress Report 2019

Scroll to bottom

Progress Report 2019

Status DELAD

The DELAD community consist of researchers involved in collecting and analysing CDS, research data and infrastructure specialists, and legal experts. 

We call these individuals DELAD partners. DELAD partners will be invited to our regular workshops to actively participate in presentations and discussions.

DELAD has chosen the CLARIN infrastructure as primary space for storing and sharing CDS. This also means that we can apply for workshops with CLARIN funding at annual intervals.

DELAD is coordinated by a Steering Group with currently consists of:

Link to CLARIN Knowledge Centre ACE

DELAD has established close connections to the newly acknowledged CLARIN Knowledge Centre for Atypical Communication Expertise for making CDS available through The Language Archive (TLA)  at the Max Planck Institute in Nijmegen (being a CLARIN Data Centre) and CMU’s Talkbank  (Clinical Banks). These collaborations offer FAIR and safe opportunities to host and share CDS from DELAD partners. See more about this in the Infrastructure section below. 

Workshop 28-30 January 2019, Utrecht

Our most recent workshop took place as a lunch to lunch workshop on 28-30 January 2019 in Utrecht. It was funded by CLARIN as a Type II workshop which also gives budget for 3 PM ICT support. A full report of the workshop can be found here:

The main conclusions of the workshop were: 

  1. Reaffirmation that CLARIN is the Data Trust, to provide the data fence around CDS.
  2. DELAD should apply to become a Task Force within CLARIN focusing on practical issues on sharing CDS.
  3. As a result, the DELAD website should be updated with a CLARIN flag and contain relevant guidelines for collecting, sharing and storing CDS.
  4. Talkbank is seen as a good CLARIN site to host CDS, especially if a European storage cloud and stricter access policy can be realised.
  5. Work together on contributions for the CLARIN AC 2019 in Leipzig.

Items 3 & 4 were pursued for obtaining the 3 PM ICT developer time. 

New website

The DELAD website was completely revised during November and December of this year. It now also has its own domain and logo: Comments are welcome.

Hosting and sharing data

For hosting data and corpora for atypical communication and making these accessible in a FAIR manner, DELAD has established a close collaboration (via the ACE centre) with The Language Archive (TLA). TLA is situated at the Max Planck Institute for Psycholinguistics (MPI) in Nijmegen. As a CLARIN B Centre the goal of TLA is to provide a unique record of how people around the world use language in everyday life. They focus on collecting spoken and signed language materials in audio and video form along with transcriptions, analyses, annotations and other types of relevant material such as photos and accompanying notes. TLA offers storage of sensitive data (speech, audio and transcripts) and supports the CMDI metadata framework. TLA also supports strong authentication procedures, layered access to data, and persistent identification.

DELAD also engaged a close collaboration with CMU’s Talkbank / Clinical banks. Our collaboration allows that data can be registered at Talkbank and obtains its metadata and landing page at the Talkbank website whereas the storage of and authentication of access to the ‘raw’ data (typical audio and video) data is handled at TLA. FOR DELAD this is a very attractive combination of storing and sharing data via CLARIN authentication and European storage with the outreach that Talkbank offers.

Alternatively, our Finnish colleagues have started a cooperation on data hosting via remote access. The Language Bank of Finland (Kielipankki) already serves the researchers and students who use text and speech corpora of typical speech. The Language Bank of Finland is hosted by CSC – IT Center for Science on servers in CSC data centers in Finland, and overcoming questions related to sharing sensitive data are under consideration. A pilot dataset from Satu Salaasti (University of Helsinki) containing sensitive information will be licenced and stored in a secure environment without a direct connection to the internet. Limiting access to data with authorized access via a secured system will minimize the possibility for data misuse. The authorized user can access the data only on a virtual machine via Remote Desktop and would not be able to download the data.