Research on and use of sensitive data involving AV-recordings requires an infrastructure where both the data and the research environment are optimal in terms of safeguards for the data and the research instruments. In HoMed we will develop a methodology to disclose sensitive AV-material through speech recognition. The methodology has the immediate potential to be employed in many other domains where sensitive AV-material needs to be transcribed and analysed. As a highly relevant and urgent use case, the HoMed project will expand CLARIAH’s infrastructure to do just that fora solution in the domain of pharmaceutical humanities and social studies (PharmSSH),
Every year there are more than 15,000 hospital admissions due to avoidable misuse of medicines in the Netherlands. Often, this has to do with the patient's unintentional improper use caused by either hard to understand information or cognitive problems.
In order to overcome these misunderstandings, we need to better understand the explicit and implicit attribution of meaning to medicines as part of the information processing.
The consortium will develop an infrastructure in which an existing “generic” Dutch ASR, built for the CLARIAH infrastructure, is adapted to the MedPharm domain on the semantic level, using a domain adaptation component (language model). In the second step the ASR is adapted on both the semantic and the acoustic level using sensitive inhouse data of Nivel, such that the AV-recordings themselves will not leave the Nivel building. The resulting ASR component will be made available at Nivel and in the CLARIAH Infrastructure. The resulting acoustic models (AMs) and the language model (LM) of the recogniser, and the metadata of the AV consultations at Nivel will be made available to the research community at large. Due to the open source character of the ASR, the models can also be employed in related projects such as Care2Report and CAIRE-lab to which HoMed is linked. The Stichting Open Spraaktechnologie will take up the distribution of the models.
Once available in the Media Suite, the AV-recordings and transcriptions are ready to be used for further analysis along the lines of the levelled approach, explained in Van der Molen et al. (2018) and depicted below. This will be evaluated in pilot research projects in the second year of the project with a feedback loop for improvements of the tools in the proposed infrastructure (notably the ASR performance and output).