Back to Dashboard
ABOUT_PROJECT

Project Disclosure

This dashboard provides a visualized interface for a specialized crime incident database. Below are critical details regarding data sources, limitations, and processing methods.

Data Source

The primary dataset is aggregated from FDeSouche. This project acts as a visualization layer and does not independently verify the primary source material.

Knowledge Cutoff

The current database snapshot covers the following temporal range:

Cutoff Date
Nov 2025

AI Extraction Disclaimer

Structural data (names, locations, crime types, demographics) was extracted from unstructured text using Artificial Intelligence models.

  • Potential for "hallucination" or misclassification exists.
  • Demographic inference (e.g., religion, ethnicity) is probabilistic.
  • Entity resolution (merging similar names) may not be 100% accurate.

Methodology

Our data pipeline processes articles through several stages to extract structured crime incident data:

  1. Collection — Articles are scraped from the source website and stored with their original content, URL, and publication date.
  2. Filtering — Articles are classified as crime-relevant using keyword matching with LLM fallback for ambiguous cases.
  3. Extraction — LLMs extract structured data from article text: incident details, persons involved, locations, charges, and sentences.
  4. Normalization — Extracted values are normalized to canonical forms (e.g., name variants, nationality spellings, religion terminology).
  5. Deduplication — Duplicate articles, incidents, and persons are identified using content hashing and similarity scoring with LLM verification.
  6. Export — Deduplicated data is exported to the production database that powers this dashboard.

Each extracted record maintains provenance linking back to source articles, allowing verification of any data point.