Here, you can find a list of datasets (including respective links and references) coming out of the project work.
You may also want to check out our presence on Zenodo where we also list datasets or the Data Management Plan.
This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition and Multi-Source Composition. For a comprehensive understanding of these scenarios and the process behind generating the test files, please consult the referenced publication.
This dataset is accompanying the respective publication. In case you use it please cite: M. Gerhardt, L. Cuccovillo and P. Aichroth, "Audio Provenance Analysis in Heterogeneous Media Sets," 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2024, pp. 4387-4396, doi: 10.1109/CVPRW63382.2024.00442.
M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability.
This is the dataset and metadata accompanying the paper submission titled "EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles".
VERITE (VERification of Image-TExt pairs) is an annotated evaluation benchmark for multimodal (image-caption) misinformation detection that accounts for unimodal biases.
Dataset of 9.000 AI-generated images, described in the paper “Synthbuster: Towards Detection of Diffusion Model Generated Images” (Quentin Bammey, 2023, Open Journal of Signal Processing)