Home
Contemporary species distribution modeling tools for python.
Documentation: earth-chris.github.io/elapid
Source code: earth-chris/elapid
Introduction¶
elapid
is a series of species distribution modeling tools for python. This includes a custom implementation of Maxent and a suite of methods to simplify working with biogeography data.
The name is an homage to A Biogeographic Analysis of Australian Elapid Snakes (H.A. Nix, 1986), the paper widely credited with defining the essential bioclimatic variables to use in species distribution modeling. It's also a snake pun (a python wrapper for mapping snake biogeography).
Installation¶
pip install elapid
or conda install -c conda-forge elapid
Installing glmnet
is optional, but recommended. This can be done with pip install elapid[glmnet]
or conda install -c conda-forge elapid glmnet
. For more support, and for information on why this package is recommended, see this page.
The conda
install is recommended for Windows users. While there is a pip
distribution, you may experience some challenges. The easiest way to overcome them is to use Windows Subsystem for Linux (WSL). Otherwise, see this page for support.
Why use elapid?¶
The amount and quality of bioegeographic data has increased dramatically over the past decade, as have cloud-based tools for working with it. elapid
was designed to provide a set of modern, python-based tools for working with species occurrence records and environmental covariates to map different dimensions of a species' niche.
elapid
supports working with modern geospatial data formats and uses contemporary approaches to training statistical models. It uses sklearn
conventions to fit and apply models, rasterio
to handle raster operations, geopandas
for vector operations, and processes data under the hood with numpy
.
This makes it easier to do things like fit/apply models to multi-temporal and multi-scale data, fit geographically-weighted models, create ensembles, precisely define background point distributions, and summarize model predictions.
It does the following things reasonably well:
Point sampling
Select random geographic point samples (aka background or pseudoabsence points) within polygons or rasters, handling nodata
locations, as well as sampling from bias maps (using elapid.sample_raster()
, elapid.sample_vector()
, or elapid.sample_bias_file()
).
Vector annotation
Extract and annotate point data from rasters, creating GeoDataFrames
with sample locations and their matching covariate values (using elapid.annotate()
). On-the-fly reprojection, dropping nodata, multi-band inputs and multi-file inputs are all supported.
Zonal statistics
Calculate zonal statistics from multi-band, multi-raster data into a single GeoDataFrame
from one command (using elapid.zonal_stats()
).
Feature transformations
Transform covariate data into derivative features
to expand data dimensionality and improve prediction accuracy (like elapid.ProductTransformer()
, elapid.HingeTransformer()
, or the all-in-one elapid.MaxentFeatureTransformer()
).
Species distribution modeling
Train and apply species distribution models based on annotated point data, configured with sensible defaults (like elapid.MaxentModel()
and elapid.NicheEnvelopeModel()
).
Training spatially-aware models
Compute spatially-explicit sample weights, checkerboard train/test splits, or geographically-clustered cross-validation splits to reduce spatial autocorellation effects (with elapid.distance_weights()
, elapid.checkerboard_split()
and elapid.GeographicKFold()
).
Applying models to rasters
Apply any pixel-based model with a .predict()
method to raster data to easily create prediction probability maps (like training a RandomForestClassifier()
and applying with elapid.apply_model_to_rasters()
).
Cloud-native geo support
Work with cloud- or web-hosted raster/vector data (on https://
, gs://
, s3://
, etc.) to keep your disk free of temporary files.
Check out some example code snippets and workflows on the Working with Geospatial Data page.
elapid
requires some effort on the user's part to draw samples and extract covariate data. This is by design.
Selecting background samples, computing sample weights, splitting train/test data, and specifying training parameters are all critical modeling choices that have profound effects on inference and interpretation.
The extra flexibility provided by elapid
enables more control over the seemingly black-box approach of Maxent, enabling users to better tune and evaluate their models.
How to cite¶
BibTeX:
@article{
Anderson2023,
title = {elapid: Species distribution modeling tools for Python}, journal = {Journal of Open Source Software}
author = {Christopher B. Anderson},
doi = {10.21105/joss.04930},
url = {https://doi.org/10.21105/joss.04930},
year = {2023},
publisher = {The Open Journal},
volume = {8},
number = {84},
pages = {4930},
}
Or click "Cite this repository" on the GitHub page.