Dataset based on Swiss German parliament debates and their Standard German transcripts.
Parliament: Grosser Rat Kanton Bern
Initial version, used in GermEval 2020 Task 4, 70 hours of training data
Improved and extended version, used in SwissText 2021 Task 3, 293 hours of training data
Test set with speech from people all over German-speaking Switzerland, with a dialect distribution close to the real dialect distribution.
Text data is from the German Common Voice project.
Initial version, used in SwissText 2021 Task 3, 13 hours of data
Unlabeled audio dataset containing recorded parliament debates.
Parliament: Gemeinderat Zürich
Initial version, used in SwissText 2021 Task 3, 1208 hours of data