Archiving and sharing speech data requires some planning ahead and
consent from the research participants is needed before data collection. Below
is a list of some of the resources on construction of information sheet and/or
consent form, and links to templates/samples of GDPR-compliant consent forms
for studies that involve sharing of speech data:
This section presents, through a case scenario, the key ethics issues that researchers may have to consider if they plan to share pathological speech data of their projects via platforms such as the one facilitated by DELAD. Please note that the example below is devised to assist in ongoing development in ethical consent and archiving. DELAD welcomes further discussions to elaborate the example. The ethics requirements may vary between organizations and/or countries, hence please consult your local ethics committee.
Case scenario: A researcher is planning to carry out a research project that investigates articulatory errors in adolescents and young adults with cerebral palsy using both auditory-perceptual and acoustic analyses. The data will also be compared to that of a group of age- and gender-matched typical speakers to examine whether there is any difference in speech characteristics between the two groups of speakers. Before speech data collection, the hearing and visual abilities of the participants will be screened to ensure that all have adequate abilities for taking part in the subsequent speech production tasks. In addition, the participants with cerebral palsy will undertake a language test to document their language ability. The acoustic analysis of speech will be carried out by the research assistants of the project, whereas a group of typical individuals will be recruited as the listeners of the auditory-perceptual analysis of speech. The researcher also plans to have the speech data archived after the completion of this project to allow possible further research using the same set of data (e.g., analysis of voice quality) by other researchers and for education purposes in the future.
Key ethical issues relevant to this case scenario: The case scenario above captures a number of elements that are common in many research projects on speech disorders in general. Below is a list of ethical issues related to each of these elements.
Involvement of human participants:
Information sheet and consent form are needed as long as human participants are involved in a research project.
An information sheet usually includes (but is not limited to) the following information: the aim(s) of the study; the inclusion and/or exclusion criteria regarding participant recruitment; the tasks or activities in which the participants will engage; voluntary participation and the option of withdrawing their participation or their data from being included in the study; the types of data to be collected from each participant; how the data will be handled, and where and for how long the data will be stored; the process for anonymising or de-identifying the data; how the data will be used (e.g., results to be disseminated in research papers and conferences); the detail of the plan of sharing the data set with other researchers; and the plan of secondary analysis. Some of these points are elaborated below. Additional issues will have to be considered when collecting data from children and vulnerable individuals; see the following 2 sub-sections.
Collecting data from individuals under the age of 18 years:
Consent from parent(s) or caregiver(s) regarding research participation of their children is needed.
Assent from the children under 18 may be required as well. This requirement may vary between countries, so check with your ethics committee. Ensure that you use age-appropriate language in the information sheet and assent form for the children. Using bigger font size and including pictures (relevant to the project) may help as well.
Collecting data from vulnerable individuals:
Consent from caregiver(s) may be required as well. Hence, check with your ethics committee to find out if that is needed.
Similar to collecting data from children, the information sheet and consent form for vulnerable individuals should be written at a linguistic level that can be understood by the participants (depending on the nature of their communication difficulties and cognitive abilities).
Handling of personal information of participants:
Research projects on speech disorders often also collect personal (e.g., date of birth, gender) and medical information (e.g., IQ score, neurological assessment results and diagnosis, medications) of the participants. In the case scenario above, the participant information also includes the results of hearing and visual ability screens and language test scores.
Researchers should only collect information that is necessary for answering the research questions. It is useful to make a list of the types of information to be collected and figure out the sources from which the information will be obtained (e.g., by asking the parent/caregiver? From some clinic record or medical report of the participants?).
Consider the method(s) for anonymizing or de-identifying the information. Not only the direct identifying information (or identifiers) need to be removed, but also indirect identifiers (e.g., date of birth) or description of participants’ characteristics that in combination might produce a unique profile, in turn revealing who they are.
Think about where the hard copy and/or electronic copy of the above-mentioned information will be kept during the lifetime of the project. Consult your local ethics committee or IT department of your institution for advice and/or storage platforms approved.
Handling of research data:
In the case scenario above, the research data will include the participants’ speech samples, and the acoustic analysis results and auditory-perceptual ratings of the speech samples.
Think about the format in which the data will be stored, or the types of files that will be created during the project. For example, for this specific case scenario, the speech samples will probably be saved as audio files and/or video files (in electronic format). The acoustic analysis data and any files generated during the analysis process (e.g., files for annotating the speech signals; spreadsheets for summarizing the measurements or further calculations; data and result files of statistical tests) will probably be in electronic format as well. For auditory-perceptual analysis, the ratings maybe in electronic format if they are collected using computer software, or in hard copy if paper response sheets are used. For each of these types of data, think about where they will be kept during the lifetime of the project.
Archiving and sharing research data with anonymized participant information:
This sub-section is for researchers who plan to make their research data and relevant anonymized participant information available to colleagues outside of their research team and might be interested in using the materials for further research or education purposes, through platforms such as the one facilitated by DELAD.
Researchers are advised to inform the participants of their plan of data sharing in the information sheet and ask for the participants’ consent to data sharing in the consent form. Similar to the option of pulling out from participation, consider giving the participants a period (e.g., 2 weeks after participation) should they eventually decide to withdraw their consent regarding data sharing and state this clearly in the information sheet as well.
Researchers are advised to give a succinct description of what type/format of research data and anonymized participant information will be shared and where these materials will be physically archived. The statements about DELAD may take the following form: “DELAD stands for Database Enterprise for Language and speech Disorders (website: http://delad.net/) that aims to provide a channel for researchers to share corpora of speech of individuals with communication disorders with educators and researchers. DELAD has linked up with the Knowledge Centre for Atypical Communication Expertise (website: https://ace.ruhosting.nl/), a K-centre of CLARIN (Common Language Resources and Technology Infrastructure; website: https://www.clarin.eu/) for archiving and sharing the speech corpora through The Language Archive (website: https://archive.mpi.nl/tla/) and/or TalkBank (website: https://talkbank.org).”
Regarding anonymized participant information, decide on the types of information that will be archived. For example, the researchers might have obtained the participants’ date of birth on the day of data collection to work out their age; but for archiving, one might decide to keep the record of participants’ age or age interval (e.g., 17;0- 17;11) only. Similarly, for test results and scoring sheets, one might decide to archive only the final judgements (e.g., passed the hearing screen) or final scores (e.g., total scores of a language test).
For the research data, there are some experimental or assessment tasks that may elicit certain personal information (e.g., asking the participants to talk about their voice problems, as a task to collect voice samples at connected speech level from the participants). It is possible that the personal information mentioned concerns the participants themselves or people whom they know. In such cases, apply an appropriate method to anonymise the speech data (e.g., removing the names mentioned or replacing that speech signal by a beep sound for audio files; blurring of faces in video recordings). Depending on the speech data collected, restricted access by other researchers might be considered.
As stated above, datasets shared via DELAD will be archived with The Language Archive (TLA; website: https://archive.mpi.nl/tla/), hence, check the deposit manual of TLA (https://archive.mpi.nl/tla/deposit-manual-tla) for further information regarding the acceptable format of the speech data. Do the same regarding archiving the data after the completion of the project.
Data access level:
Think about which level of access is appropriate for the datasets (e.g., will users have to register with the data archiving platform in order to access the data? Will the users need to sign a license or data use agreement before they can download the datasets? Will that platform keep a record on who accessed which data set?). The TLA allows these various access levels. Make sure the information is stated in the consent form for the participants.
Consent from the participants (speakers or listeners) is needed for future secondary analysis. Think about who might access the data (e.g., your research students? A colleague in the same institution? A researcher outside of your institution?) and how the data might be used or analysed again. Ask the participants explicitly to give consent for using their data in further research projects. Consider including this as a separate opt-in or ask for permission to contact the participants again regarding secondary analysis in future research projects. For the latter, the researchers will have to keep a record on the date that the contact information was obtained from the participants.
‘Mockup’ information sheets and consent forms: This sub-section includes examples of information sheets and consent forms devised based on the above-mentioned case scenario. Please note that these documents have not been approved by any research ethics committee; they are just examples to give you an idea what they might look like.
Information Sheet for Participants (age ≥18 years)
Consent Form for Participants (age ≥18 years)
Information Sheet for Parents/Guardians (of participants age <18 years)
Consent Form for Parents/Guardians (of participants age <18 years)
Information Sheet for Children Participants (age <18 years)
Assent Form for Children Participants (age <18 years)
Information Sheet for Listeners
Consent Form for Listeners
The corresponding example informed consent forms can be found here.