Here, you can find a selection of datasets (including respective links and references) coming out of the project work.
If you are interested in more: we strongly encourage you to also check out our presence on Zenodo where we list ALL datasets coming out of vera.ai. The Data Management Plan is also accessible there.
This is a dataset for a novel link classification task related to the spread of disinformation content online. The dataset focuses on links appearing within fact-checking and debunking articles written by journalists to counter the spread of popular disinformation. Languages are English and Bulgarian. Each item in the dataset is a classification of a link appearing within a fact-checking article in one of three categories based on the textual context of the link within the article. The classification categories are Misinformation Appearance, Supporting Evidence and Other.
This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition and Multi-Source Composition. For a comprehensive understanding of these scenarios and the process behind generating the test files, please consult the referenced publication.
This dataset is accompanying the respective publication. In case you use it please cite: M. Gerhardt, L. Cuccovillo and P. Aichroth, "Audio Provenance Analysis in Heterogeneous Media Sets," 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2024, pp. 4387-4396, doi: 10.1109/CVPRW63382.2024.00442.
M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability.
This is the dataset and metadata accompanying the paper submission titled "EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles".
VERITE (VERification of Image-TExt pairs) is an annotated evaluation benchmark for multimodal (image-caption) misinformation detection that accounts for unimodal biases.