No single test can catch everything. ImageWhisperer runs 41 independent checks in parallel — each approaching the problem from a completely different angle. Here's what each one does, why it matters, and how they work together. The verdict is not a number — it's a case file.
These models answer the fundamental question: was this image generated by artificial intelligence? Each uses a different mathematical approach to distinguish AI-generated pixels from camera-captured ones. They run simultaneously on GPU and return results in under a second.
No single model is trusted alone. The verdict system requires corroboration — at least two independent signals must agree before upgrading a verdict.
An independent, commercially maintained AI detection service that acts as a second opinion. While our own models are trained and maintained in-house, this external scanner provides a completely independent assessment from an outside lab.
It returns a probability score from 0 to 1. When both our in-house models and this external scanner agree that an image is AI-generated, the combined signal is far stronger than either alone. When they disagree, the system flags the discrepancy for review rather than guessing.
Think of it like getting a second medical opinion from a different hospital. If two independent labs reach the same diagnosis, you can be more confident in the result.
Built on DINOv2, a foundation model trained on 142 million images to understand visual structure. B-Free (University of Naples, GRIP-UNINA) extracts the "visual DNA" of an image — the deep structural patterns that distinguish a photo taken by a camera sensor from one synthesized by a neural network.
Camera sensors capture light through a Bayer filter, introducing specific noise patterns, lens distortions, and color processing artifacts. AI generators don't have physical sensors — they hallucinate these properties. B-Free detects the absence of authentic sensor signatures.
One of the most reliable single signals in the system. It generalizes well across different AI generators because it looks at fundamental image properties rather than generator-specific artifacts.
Converts the image into the frequency domain using spectral analysis. Real photographs have characteristic frequency distributions created by optical systems. AI-generated images produce different frequency signatures because they're constructed mathematically rather than optically.
Particularly effective at detecting image editing and splicing. When part of an image has been replaced, the frequency characteristics of the edited region differ from the original.
Frequency analysis catches what pixel-level inspection misses. Even a perfect visual edit still disrupts the mathematical frequency fingerprint of the original capture.
Trained on output from 4,803 different AI image generators to recognize the computational fingerprints common across all of them. Rather than memorizing the quirks of one particular generator, it learns what AI-generated images share in common regardless of source.
The tradeoff: because it was trained broadly, it can sometimes trigger on images with certain aesthetic qualities — clean studio lighting, gallery-quality compositions, or artwork. The verdict system knows this and requires corroboration before trusting a high score.
New AI generators appear every week. A detector that only recognizes known generators becomes obsolete quickly. This model's breadth gives it a fighting chance against generators it's never seen.
Tests how an image responds to small disturbances. Real photographs are "fragile" — tiny changes create noticeable artifacts because the data represents a coherent physical scene. AI-generated images are "rigid" — they respond differently because they were constructed to look right, not to be a coherent capture of light.
The model applies controlled mathematical perturbations and measures how stable the visual features remain. The difference reveals whether the image was optically captured or computationally generated.
Black Forest Labs' Flux produces images so photorealistic that conventional detectors consistently score them as authentic. In testing, standard detectors gave a known Flux image only 15% AI probability. Our Flux probe scored the same image at 92%.
Built as a lightweight classifier on top of visual foundation features, trained specifically on confirmed Flux output. The model weights are just 4KB, making it fast to load and hot-reloadable without restarting the server.
Every night, the system scans images users have uploaded, identifies new Flux examples with high confidence, and retrains automatically. The detector gets better with every image it sees.
A targeted classifier trained on images generated by Google's AI image models. Each AI generator leaves its own statistical fingerprint — subtle patterns in how it handles lighting, textures, skin tones, and fine detail that are invisible to the eye but detectable by a model trained specifically on that generator's output.
A lightweight linear probe on foundation model features, designed to catch images from one of the most widely used AI image generators in the world.
Trained on images produced by OpenAI's image generation models. OpenAI's generators have distinctive characteristics: specific ways they render text, handle perspective, process fine detail in hair and fabric, and produce skin textures.
Recognizes OpenAI-specific patterns even when general-purpose detectors cannot distinguish the output from a real photograph. Particularly useful as GPT-generated images increasingly appear in social media and news contexts.
Trained on 140,000+ real photographs from actual camera hardware. Rather than looking for signs of AI, this model looks for signs of reality — the characteristic patterns that real cameras leave in images.
When an image closely matches a known camera fingerprint, it provides positive evidence of authenticity. When it doesn't match any known camera, it raises a flag. This inversion — proving realness rather than proving fakeness — is especially powerful for images that fool AI-detection models.
Most detectors ask "is this AI?" The camera matcher asks "is this a camera?" A manipulated real photo and a well-made AI image both fail this test differently, but neither matches a real camera fingerprint cleanly.
Uses CLIP-based visual embeddings to detect AI-generated content through high-level semantic features. While other models look at noise patterns or frequency spectra, this detector analyzes the meaning of visual elements — how objects relate to each other, whether scene composition follows natural photographic patterns.
Complementary to frequency-domain and noise-based detectors: it catches a different class of artifacts. An AI image might have perfect pixel-level noise but unnatural semantic composition.
Not all fakes are fully AI-generated. Many of the most deceptive manipulations start with a real photograph and alter specific elements. These detectors find the seams.
Each returns a heatmap showing exactly where manipulation was detected. The combination of four independent localization models dramatically reduces false positives.
Performs fine-grained tampering detection by analyzing three layers simultaneously: digital noise consistency, JPEG compression artifact patterns, and pixel-level statistical anomalies. When an image has been edited, the modified region has different noise characteristics than the surrounding original content.
Produces a pixel-level heatmap showing which regions have been altered. Particularly effective at detecting copy-paste operations, object removal, and face replacement.
Divides the image into patches and compares each patch against its neighbors for consistency. A genuine photo has uniform characteristics across all patches. An edited photo has patches where noise level, compression, or color processing suddenly change.
The vision transformer architecture captures both local inconsistencies and global context. Returns a heatmap highlighting exactly which patches show signs of manipulation.
Focuses computational attention on the most suspicious regions rather than analyzing every pixel equally. Particularly efficient at catching small, targeted edits — a changed face in a crowd, an altered license plate, a modified date on a document.
The sparsity mechanism lets it zoom in on fine-grained details while maintaining awareness of global image context.
Simultaneously analyzes edge frequency patterns and illumination frequency patterns. Real photographs have consistent relationships between edge sharpness and lighting. Composited images often have mismatches — sharp edges where lighting suggests soft boundaries, or lighting gradients that don't follow the geometry.
By checking both channels independently and comparing results, it catches compositing artifacts invisible in either channel alone.
The detection models do the math. Then AI actually looks at the image — the way a human investigator would. Calculator plus detective.
These inspections use a large language model with carefully engineered prompts. All eight are combined into a single API call, and web detection results are injected so the AI can identify people, places, and events using real-world context.
Produces a structured journalist-style analysis using the Who, What, Where, When, Why framework. Identifies people, events, and context. Cross-references against web detection results to name recognized individuals and link to known events.
Written for fact-checkers: "A man in a suit" becomes "appears to be [Name], based on web matches to [source]" when corroborating evidence exists.
A focused 3–4 sentence analysis of the visual environment. Goes beyond object identification to assess scene coherence: are elements consistent? Does lighting match across the scene? Are there incongruous objects or impossible spatial relationships?
A technical photographic assessment: lighting consistency, edge quality, texture patterns, shadow direction, and compression artifacts. Unlike detection models which output numbers, forensic observations produce human-readable explanations that tell a journalist something actionable.
Specifically searches for splicing, inpainting, face swapping, background replacement, and AI-assisted retouching. Also checks for AI generation signatures: nonsensical text, anatomical impossibilities, impossible reflections, repeating patterns. Distinguishes normal adjustments from deceptive manipulation.
Estimates where the photo was taken based on visible clues: text on signs, architecture, vegetation, road markings, vehicle types, cultural context, landmarks. Returns city, country, and confidence level. Cross-referenced against EXIF GPS data and user-provided claims.
Extracts and translates text visible in the image: signs, banners, documents, screens, clothing, watermarks. Returns both original text and English translation. AI-generated images often contain text with subtle errors — misspellings, nonsensical phrases — that this step catches.
Cross-references visual content against known news events, viral images, and documented incidents. Surfaces original context: when it happened, where, and what sources reported. Particularly valuable for images shared out of context — a real photo from 2020 recirculated as 2026.
The final synthesis. After all models and analyses complete, a journalist-focused summary: the story of the photo, key evidence, per-model explanations, and actionable advice. Explains the verdict in plain language accessible to anyone, not just technical experts.
These checks examine the technical properties of the image file itself: metadata, compression artifacts, geometric consistency, shadow physics, and provenance credentials.
Based on physical constraints that AI generators struggle to fake: real shadows follow real light sources, real JPEG compression leaves specific mathematical traces, and real cameras embed specific metadata.
Re-compresses the image and compares against the original. In an unmodified JPEG, error levels are uniform. Pasted-in regions have a different compression history and show up as brighter areas. Like a palimpsest: even when you write new text over erased text, the ghost of the original remains.
Extracts all embedded metadata: EXIF tags (camera, lens, exposure, GPS), XMP (editing history), IPTC (credits), ICC color profiles. Checks camera-lens compatibility, exposure parameter consistency, GPS plausibility, timestamps, and AI-specific software signatures. Real cameras typically embed 10+ internally consistent tags; AI images either have none or synthetic metadata that doesn't add up.
Checks for C2PA manifests — a cryptographic "chain of custody" backed by Adobe, Microsoft, Google, and camera manufacturers. When present, provides the strongest possible provenance signal: confirms the specific camera, time, and every edit made. Also checks for invisible AI watermarks.
Detects real people on AI-generated backgrounds (or vice versa) by analyzing four dimensions across the foreground/background boundary: noise patterns, compression levels, color temperature gradients, and edge coherence. One of the most important checks for "hybrid fakes" — images that are part real, part AI.
Traces lines that should converge at vanishing points. In real photography, all parallel lines converge to the same point. In composited or AI images, different parts often have different vanishing points — a geometric impossibility that proves the scene was assembled from multiple sources.
Based on CVPR 2023 research. Computes a per-pixel gravity direction field. In a real photo, gravity is consistent everywhere. Divides the image into a 4×4 grid and measures gravity angle deviation across regions. Composited images, taken at different camera angles, create perspective inconsistencies.
Never overrides a verdict alone. Adds weight to existing concerns but cannot single-handedly change a green verdict, preventing false positives from unusual camera angles.
Analyzes shadow direction, intensity, and blur across the image. In a real photo with a single light source, all shadows point the same direction. Composited objects from different sources have shadows cast by different light sources. Extremely difficult to fake because correct shadow rendering requires accurate 3D geometry.
Detects visible AI watermarks that appear in the corner of images generated by certain AI tools. Uses pixel-geometry pattern matching rather than AI, making it extremely fast (~20ms) and zero-cost. Images that retain their watermarks provide instant, definitive identification.
An image doesn't exist in isolation. These checks reach outside the pixel data to verify the image against the world's information.
External checks can definitively resolve cases that pixel-level analysis cannot. A technically perfect AI image already debunked by Reuters is known to be fake — no model score required.
Searches the web for exact, partial, and visually similar matches. Finds where this image has appeared online, which websites published it, and what context it was shared in. Results are categorized: news outlets, social media, fact-checking organizations, AI art sites, stock agencies, government sources.
Every web match includes a "Trace Source" button that opens Google Lens to find the original source, helping journalists trace the image back to its first appearance.
Before any analysis begins, the image is compared against a curated database using perceptual hashing — recognizing images even after cropping, compression, rescaling, and screenshotting. A match triggers an instant verdict linking back to the original fact-check article.
The only check that can immediately produce a final verdict (~5ms) without waiting for all other analyses to complete.
Scans web results for fact-check articles from Reuters, Snopes, PolitiFact, FactCheck.org, AFP, India Today, and dozens more across 10+ languages. Uses domain and keyword matching. Found sources appear as clickable links in the verdict. A fact-check match can override uncertain model scores.
Analyzes the credibility profile of web sources. An image that only appears on AI art platforms is treated very differently from one published by Reuters, AP, or government archives.
Checks if the uploaded image is hosted at the exact same URL found in web results. An exact match to a news wire or government source is strong provenance evidence. Matches to AI gallery sites provide opposing evidence.
When a user provides a claim ("This photo shows the aftermath of the earthquake in Turkey"), the system extracts key terms, searches news sources, and checks whether the claim matches published reporting. Catches the common pattern of real photos recaptioned to show a different event.
Multi-layered: GPS from EXIF, landmark identification via AI, reverse geocoding, Street View availability, sun position verification against timestamp. A photo claimed from "Paris" with EXIF GPS pointing to Lagos, with visible text in Yoruba, triggers a location mismatch alert.
Extracts visible text from signs, headlines, documents. Used as additional search context and to catch AI-generated gibberish text that looks plausible at a glance but is meaningless.
Reads QR codes and barcodes embedded in images. Decodes the content and presents the URL it points to. Zero-cost, offline decoding using the pyzbar library.
Follows redirect chains on bit.ly, tinyurl.com, t.co, and others to reveal actual destinations. Exposes links that might point to AI generators, misinformation sites, or phishing campaigns hidden behind shortened URLs.
Identifies platform watermarks, compression signatures, and source indicators. Different platforms apply characteristic processing. Recognizing the platform of origin establishes the distribution path: first shared by a journalist on Twitter, or originated on an AI art Discord?
All 41 checks run in parallel. Results arrive in seconds. The verdict system weighs each signal and checks for corroboration — every override requires at least two independent models to agree. When models disagree, we tell you openly. When they agree, the verdict is decisive.
Strong evidence of AI generation. Multiple independent systems agree. Critical forensic indicators found.
Mixed signals. Some concerning indicators but not conclusive. May indicate editing in a real photograph. Requires human review.
Passes all critical tests. Noise patterns, perspective, and metadata consistent with real photography from a real camera.
Found in news sources with conflicting reports, or AI raised concerns that forensic evidence doesn't confirm. Human verification essential.
Why no single approach works, and how combining 41 independent signals catches what any one model would miss.