ABOUT_PROJECT

Project Disclosure

This dashboard provides a visualized interface for a specialized crime incident database. Below are critical details regarding data sources, limitations, and processing methods.

Data Source

The primary dataset is aggregated from FDeSouche. This project acts as a visualization layer and does not independently verify the primary source material.

Knowledge Cutoff

The current database snapshot covers the following temporal range:

Cutoff Date

Nov 2025

AI Extraction Disclaimer

Structural data (names, locations, crime types, demographics) was extracted from unstructured text using Artificial Intelligence models.

Potential for "hallucination" or misclassification exists.
Demographic inference (e.g., religion, ethnicity) is probabilistic.
Entity resolution (merging similar names) may not be 100% accurate.

Methodology

Our data pipeline processes articles through several stages to extract structured crime incident data:

Collection — Articles are scraped from the source website and stored with their original content, URL, and publication date.
Filtering — Articles are classified as crime-relevant using keyword matching with LLM fallback for ambiguous cases.
Extraction — LLMs extract structured data from article text: incident details, persons involved, locations, charges, and sentences.
Normalization — Extracted values are normalized to canonical forms (e.g., name variants, nationality spellings, religion terminology).
Deduplication — Duplicate articles, incidents, and persons are identified using content hashing and similarity scoring with LLM verification.
Export — Deduplicated data is exported to the production database that powers this dashboard.

Each extracted record maintains provenance linking back to source articles, allowing verification of any data point.

Back to Dashboard