Currently there is a high performing open access generic ASR for Dutch built by the HLT-consortium members and included in the CLARIAH Infrastructure. CLARIAH’s Media Suite contains dedicated tools for further analysis of the resulting transcripts.
To deal with the domain of pharmaceutical AV-material a dedicated version of the recogniser must be developed since the essential jargon for the MedPharm domain is not part of the vocabulary of the current generic ASR. Moreover, dealing with sensitive data the solution must be such that it is also suited for recordings which must remain on the owner’s premises.
We propose a methodology in which the recogniser’s LM is adapted in a first step by additional training on:
- pharmaceutical word lists (e.g. https://www.farmacotherapeutischkompas.nl/)
- topic specific interviews about medicine use from “Medisch Contact” and “Pharmaceutisch weekblad”
In a second step, the resulting recogniser will be implemented in-house at Nivel and fine-tuned on +200 hours of real-life recordings of medical visits with patients with either COPD or cancer, patient visits to the pharmacists.
These AV-recordings are sensitive and cannot leave the Nivel, but the resulting models can, as can the anonymised transcripts, and the metadata of the recordings. These will be included in the CLARIAH Infrastructure and made available to the research community at large.