Skip to main content
Enterprise AI Analysis: Estimating species commonness and prevalence through unsupervised methods

Enterprise AI Analysis

Estimating species commonness and prevalence through unsupervised methods

The prevalence of a species in a given area is crucial for estimating the environmental conditions associated with its subsistence within Ecological Niche Models (ENMs). Prevalence is defined as the proportion of presences relative to the total number of sampled sites, reflecting prior expectation on species commonness or rarity. However, reliable estimation often faces challenges due to limited or biased occurrence data, particularly for rare or poorly monitored species. This work presents a data-driven, multi-species methodology to estimate species prevalence for use in ENMs. It leverages species occurrence records from the Global Biodiversity Information Facility and is entirely unsupervised. It utilises two clustering methods, one deep-learning model, and an ensemble model, plus statistical analysis to classify species commonness and transform classifications into prevalence probabilities. A case study is presented for 161 species living in the Massaciuccoli Lake basin (Tuscany, Italy), a wetland of high biodiversity value and ecological sensitivity. The models classified the species' prevalence based on observations from other Italian wetland sites, and were evaluated against expert-based assessments. All models achieved high accuracy, with the deep-learning model achieving the highest (~81-90%). The proposed methodology is scalable and reproducible and can inform ENMs with objective, robust prevalence estimates.

Quantifiable Impact & Strategic Advantages

This research demonstrates significant advancements in automated ecological modeling, providing robust, objective, and scalable solutions for enterprise-level biodiversity assessments.

0 VAE Binary Accuracy
0 VAE 3-Category Accuracy
0 High Kappa Score (Binary)
0 Expert Agreement

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Methodology Overview
Performance Analysis
Case Study Insights

Unsupervised Multi-Species Prevalence Estimation

This methodology introduces a data-driven, unsupervised framework for classifying species commonness and estimating prevalence probabilities for Ecological Niche Models (ENMs). It leverages species occurrence records from large biodiversity databases like GBIF, focusing on aggregating ecologically meaningful features to enable direct comparison across species without assuming data completeness. The approach integrates clustering algorithms (Multi K-means, X-means) and a deep-learning model (Variational Autoencoder - VAE) to identify prevalence patterns and detect rare species.

The goal is to provide objective and robust prior prevalence estimates, crucial for the accurate calibration and interpretation of ENMs, especially for poorly monitored or rare species. The system is designed to be scalable and reproducible across various ecosystems and taxa, contributing to more reliable habitat distribution estimates in biodiversity assessments.

Superior Accuracy with Deep Learning

The Variational Autoencoder (VAE) consistently demonstrated the highest level of accuracy and agreement with expert assessments in classifying species commonness. For binary classification ("very common" vs. "less common"), the VAE achieved 90.06% accuracy and a Kappa of 0.80 when compared to an expert ensemble. In three-category classification ("very common", "fairly common", or "rare"), VAE reached 85.71% accuracy with a Kappa of 0.76.

Traditional clustering methods, Multi K-means (83.6% binary accuracy) and X-means (84.5% binary accuracy), also showed high agreement but were slightly less accurate than VAE. Sensitivity analysis highlighted that features like the average number of occurrence records per dataset (IntraDs), thresholded observation frequency (HF), and average species abundance per occurrence record (A) were the most critical contributors to classification performance.

Massaciuccoli Lake Basin: Validation & Discrepancies

The methodology was validated using a case study in the biodiversity-rich Massaciuccoli Lake basin, Tuscany, Italy, focusing on 161 species. Models were trained on GBIF data from other Italian wetlands and evaluated against expert-based assessments.

Key findings from the case study include:

  • The VAE's high performance validates its ability to capture species rarity and commonness patterns.
  • Analysis of misclassified species revealed cases where expert opinion (e.g., general knowledge of invasiveness or widespread distribution) diverged from model predictions based on available data, highlighting the challenge of sampling biases in large datasets.
  • Examples include Acrocephalus melanopogon (moustached warbler), Cyprinus carpio (common carp), Halyomorpha halys (brown marmorated stink bug), and Tarentola mauritanica (common wall gecko), where models often indicated rarity despite expert classification as common due to sparse or concentrated observations in GBIF.
This underscores the methodology's strength in identifying data-driven commonness, even when it contrasts with generalized expert assumptions.

Enterprise Process Flow

Feature Extraction
Modelling (Clustering & VAE)
Classification
Ensemble Assessment
Prevalence Estimation
0 Highest Accuracy in Binary Classification Achieved by VAE Model

Model Performance Comparison (vs. Expert Ensemble)

Model Binary Accuracy (%) 3-Category Accuracy (%) Kappa (Binary) Kappa (3-Category)
Multi K-means 86.96 82.61 0.73 0.72
X-means 88.20 84.47 0.76 0.73
VAE 90.06 85.71 0.80 0.76
Ensemble 87.58 81.37 0.75 0.69

Case Study: Insights from Massaciuccoli Lake Basin

The Massaciuccoli Lake basin served as a crucial validation ground for the unsupervised methods. Examining 161 species, the study revealed instances where model predictions based on empirical data challenged traditional expert assessments, primarily due to sampling biases inherent in large biodiversity datasets.

Misclassified Species Highlights:

  • Acrocephalus melanopogon (Moustached Warbler): Experts considered it "very common" due to established presence, but models indicated "rare" based on low average observation frequency (5 per year) and limited year-round data from other wetlands. This highlights the impact of temporal observation patterns.
  • Cyprinus carpio (Common Carp): While generally classified as invasive, in Massaciuccoli Lake, its population is stable. Experts deemed it "very common," but models, based on only 12 observations over a decade, leaned towards "rare." This illustrates how general classification can override sparse data.
  • Halyomorpha halys (Brown Marmorated Stink Bug): Known as highly invasive, experts classified it as "very common." However, models, noting only 19 observations over a decade with no established invasion indicated "rare," showing the influence of general invasive behavior over actual local data.
  • Tarentola mauritanica (Common Wall Gecko): Experts considered it "very common" due to its widespread distribution in Tuscany. Models, observing sparse (40 records in a decade) and concentrated urban sightings, particularly with recent reports, classified it as "rare," revealing the gap between regional knowledge and specific GBIF data.

These examples underscore the methodology's strength in surfacing data-driven insights that might contradict conventional wisdom, pointing to the need for careful data interpretation in ecological assessments.

Calculate Your Potential ROI

Estimate the efficiency gains and cost savings your enterprise could achieve by integrating our AI-powered solutions.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A clear, phased approach to integrating AI into your enterprise, ensuring maximum impact and smooth transition.

Phase 1: Discovery & Strategy

Initial consultation to understand your specific needs, data landscape, and business objectives. We define key metrics and tailor an AI strategy aligned with your enterprise vision.

Phase 2: Data Integration & Model Development

Securely integrate your data sources. Our experts develop and train custom AI models, leveraging state-of-the-art algorithms and robust validation processes to ensure accuracy and reliability.

Phase 3: Deployment & Optimization

Seamless deployment of AI solutions into your existing infrastructure. Continuous monitoring and iterative optimization to ensure peak performance and adaptation to evolving data and business requirements.

Phase 4: Training & Support

Comprehensive training for your team to maximize adoption and utilization. Ongoing support and maintenance to ensure the long-term success and scalability of your AI initiatives.

Ready to Transform Your Enterprise?

Connect with our AI specialists to discuss a tailored strategy for your organization. Let's build the future, together.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking