Last updated: March 2026 · Based on internal evaluation across 7,700+ tracked real-world analyses
ImageWhisperer uses detection models in parallel, cross-validated against each other. No single model determines the verdict. This page documents what each model does well, where it struggles, and our overall system accuracy.
Our benchmarks are based on two sources:
We report accuracy honestly, including categories where detection is weak.
Our numbers compared to the research-reported industry average for single-model detectors tested on real-world images (not lab conditions).
¹ Averaged from peer-reviewed benchmarks: Dogoulis et al. (2023) on 16 detectors across 2.6M images; Corvi et al. (2023) cross-generator evaluation. Industry averages reflect single-model performance on out-of-distribution, real-world conditions (social media compression, unseen generators).
² Based on Guillaro et al. (2023) manipulation detection survey; Wu et al. (2022) IML benchmark. Most detectors tested on spliced/copy-move/inpainted images without cross-validation.
Primary AI-generation detector. Trained on diffusion model outputs (Midjourney, DALL-E, Stable Diffusion).
Midjourney DALL-E 3 Stable Diffusion Flux (use Flux Probe) Illustrations/artwork
Specializes in detecting image splicing, compositing, and copy-move manipulations.
Spliced composites Background replacement Heavily compressed JPEGs Professional retouching
Forgery localization model. Produces heatmaps showing manipulated regions.
Region manipulation Inpainting Uniform AI-generated images
Vision Transformer for image manipulation localization at pixel level.
Specialized DINOv2 linear probe trained specifically for Flux-generated images, which evade general detectors.
Flux Non-Flux AI generators
Third-party AI detection service used as authority signal for cross-validation.
Sparse-ViT, Mesorch, ClipDet, CommFor, HiFi-Net++, and PerspectiveFields provide supporting votes in the multi-model ensemble. Individual accuracy varies (70–87%) but their combined signal strengthens verdict confidence.
ImageWhisperer vs. research-reported industry averages per category.
| Category | ImageWhisperer | Industry Avg. | Delta | Notes |
|---|---|---|---|---|
| Midjourney v5/v6 | 96% | 82% | +14 | Strongest detection category |
| DALL-E 3 | 94% | 79% | +15 | Reliable detection |
| Stable Diffusion XL | 92% | 76% | +16 | Good across most subjects |
| Flux | 93% | 21% | +72 | Flux Probe + ensemble vs. general detectors |
| Face swaps / deepfakes | 85% | 62% | +23 | B-Free + SPAI + HiFi-Net++ cross-validated |
| Spliced composites | 80% | 48% | +32 | SPAI + TruFor + IML-ViT + PerspectiveFields |
| Background replacement | 77% | 40% | +37 | Hardest manipulation category |
| Screenshots | N/A | N/A | — | Flagged as "Further Research Needed" |
| Illustrations / artwork | Limited | High FP | — | Guards suppress false AI flags on artwork |
Industry averages sourced from Dogoulis et al. (2023), Corvi et al. (2023), and Guillaro et al. (2023). Averages reflect single-model performance in cross-generator, real-world conditions. Flux average from Feb 2026 benchmark of 16 detection methods across 2.6M images.
Why the gap? Most detectors are a single model returning a single score. ImageWhisperer runs 10+ models in parallel, cross-validates their outputs, and requires corroboration before any verdict. That ensemble approach is why our real-world accuracy stays 20–40 percentage points above the single-model industry average.
Flux-generated images are the hardest for the industry to detect. A February 2026 academic benchmark tested 16 detection methods across 2.6 million images and found an average accuracy of just 21% on Flux Dev. ImageWhisperer's dedicated Flux Probe achieves 93% on this category — a purpose-built DINOv2 linear probe trained specifically on Flux outputs. That's a +72 percentage point advantage.
Many tools report 95–99% accuracy in controlled settings, but independent studies consistently show steep drops in real-world conditions. Platform re-encoding (Instagram, WhatsApp, Twitter compression), screenshots, and generators not in the training data all degrade performance. Our numbers are based on 7,700+ tracked real-world user uploads, not curated test sets — they reflect what you'll actually experience.
A detector that flags 18% of real photos as AI-generated (the industry average) creates alert fatigue and erodes editorial trust. ImageWhisperer's corroboration requirement — no single model can override the verdict alone — keeps our false positive rate at 5%, nearly four times lower than the average single-model detector.
What sets ImageWhisperer apart. We combine forensic AI detection with investigative tools — fact-checking, reverse image search, EXIF analysis, location verification, and full narrative explanations — in a single analysis. Most detection tools return a score; we explain why.
We believe transparency about limitations builds more trust than inflated accuracy claims.
Questions about our methodology? Found an image we got wrong? Let us know — every report makes the system better.