ODSS: An Open Dataset of Synthetic Speech

Recent advances in deep learning for synthetic speech generation are driving the need for new detection technologies due to potential misuse. However, they face a lack of diverse, high-quality datasets for training and testing.

To address this challenge, a team of vera.ai researchers from the Fraunhofer Institute for Digital Media Technology, Germany and the Centre for Research and Technology Hellas, Greece have developed a new synthetic speech detection dataset.

The ODSS dataset contains fake speech examples along with their corresponding natural equivalents based on 156 voices spanning three languages — English, German, and Spanish — with a balanced gender representation. It encompasses 18,993 audio utterances synthesized from text, representing approximately 17 hours of audio data generated with two state of the art text-to-speech (TTS) methods: a two-step FastPitch+HifiGAN pipeline and the end-to-end VITS architecture. It furthermore includes preprocessed original utterances with trimmed silence and mitigated noise spikes.

The ODSS DatasetIDMT

The ODSS dataset can be used for development and benchmarking of synthetic speech detection methods. It incorporates tailored data distributions ready for training and provides multiple dimensions for the evaluation and analysis of generalizability. The utterances are uncompressed and don't include background noise, therefore audio augmentation techniques can also be applied to improve or test the robustness to various transformation.

Call to action: Would you like to contribute with your own speech samples and help us expose disinformation? Then do not miss the chance and contact us via mail (preferred) to discuss the details, or use the IDMT contact form. We also encourage you to design multiple splits for training and evaluation, besides the ones we suggested as a starting point.

The dataset can be found at https://zenodo.org/records/8370668 , always including the latest version. 

Reference, including a detailed description of the dataset: 
A. Yaroshchuk et al., "An Open Dataset of Synthetic Speech" in IEEE International Workshop on Information Forensics and Security (WIFS), Nürnberg, Germany, 2023, pp. 1-6, doi: 10.1109/WIFS58808.2023.10374863.

Author: Artem Yaroshchuk (IDMT)

Editor: Jochen Spangenberg (DW)

vera.ai is co-funded by the European Commission under grant agreement ID 101070093, and the UK and Swiss authorities. This website reflects the views of the vera.ai consortium and respective contributors. The EU cannot be held responsible for any use which may be made of the information contained herein.