This script is designed to select dominant species from abundance records, and habitat if the information is available.

PRE_FATE.selectDominant(
  mat.observations,
  doRuleA = TRUE,
  rule.A1 = 10,
  rule.A2_quantile = 0.9,
  doRuleB = TRUE,
  rule.B1_percentage = 0.25,
  rule.B1_number = 5,
  rule.B2 = 0.5,
  doRuleC = FALSE,
  opt.doRobustness = FALSE,
  opt.robustness_percent = seq(0.1, 0.9, 0.1),
  opt.robustness_rep = 10,
  opt.doSitesSpecies = TRUE,
  opt.doPlot = TRUE
)

Arguments

mat.observations

a data.frame with at least 3 columns :
sites, species, abund
(and optionally, habitat)
(see Details)

doRuleA

default TRUE.
If TRUE, selection is done including constraints on number of occurrences

rule.A1

default 10.
If doRuleA = TRUE or doRuleC = TRUE, minimum number of releves required for each species

rule.A2_quantile

default 0.9.
If doRuleA = TRUE or doRuleC = TRUE, quantile corresponding to the minimum number of total occurrences required for each species (between 0 and 1)

doRuleB

default FALSE.
If TRUE, selection is done including constraints on relative abundances

rule.B1_percentage

default 0.25.
If doRuleB = TRUE, minimum relative abundance required for each species in at least rule.B1_number sites (between 0 and 1)

rule.B1_number

default 5.
If doRuleB = TRUE, minimum number of sites in which each species has relative abundance >= rule.B1_percentage

rule.B2

default 0.5.
If doRuleB = TRUE, minimum average relative abundance required for each species (between 0 and 1)

doRuleC

default FALSE.
If TRUE, selection is done including constraints on number of occurrences at the habitat level (with the values of rule.A1 and rule.A2_quantile)

opt.doRobustness

(optional) default FALSE.
If TRUE, selection is also done on subsets of mat.observations, keeping only a percentage of releves or sites, to visualize the robustness of the selection

opt.robustness_percent

(optional) default c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9).
If opt.doRobustness = TRUE, vector containing values between 0 and 1 corresponding to the percentages with which to build subsets to evaluate robustness

opt.robustness_rep

(optional) default 10.
If opt.doRobustness = TRUE, number of repetitions for each percentage value defined by opt.robustness_percent to evaluate robustness

opt.doSitesSpecies

(optional) default TRUE.
If TRUE, building of abundances / occurrences tables for selected species will be processed, saved and returned.

opt.doPlot

(optional) default TRUE.
If TRUE, plot(s) will be processed, otherwise only the calculation and reorganization of outputs will occur, be saved and returned.

Value

A list containing one vector, four or five data.frame objects with the following columns, and up to five ggplot2 objects :

species.selected

the names of the selected species

tab.rules

A1,A2,B1,B2, hab

if the rule has been used, if the species fullfills this condition or not

species

the concerned species

SELECTION

the summary of rules with which the species was selected, or not

SELECTED

TRUE if the species fullfills A1 and at least one other condition, FALSE otherwise

tab.robustness

...

same as tab.rules

type

the type of subset (either releves or sites)

percent

the concerned percentage of values extraction

rep

the repetition ID

tab.dom.AB

table containing sums of abundances for all selected species (sites in rows, species in columns)

tab.dom.PA

table containing counts of presences for all selected species (sites in rows, species in columns)

plot.A

ggplot2 object, representing the selection of species according to rules A1 and A2

plot.B

ggplot2 object, representing the selection of species according to rules B

plot.C

ggplot2 object, representing the selection of species according to rules C (A1 and A2 per habitat)

plot.pco

ggplot2 object, representing selected species with Principal Coordinates Analysis (see dudi.pco)

plot.robustness

ggplot2 object, representing the robustness of the selection of species for each rule

The information is written in PRE_FATE_DOMINANT_[...].csv files :

TABLE_complete

the complete table of all species and the selection rules described above (tab.rules)

TABLE_species

only the names / ID of the species selected

TABLE_sitesXspecies_AB

abundances table of selected species

TABLE_sitesXspecies_PA

presence/absence table of selected species

Up to six PRE_FATE_DOMINANT_[...].pdf files are also created :

STEP_1_rule_A

STEP_2_selectedSpecies_PHYLO

STEP_1_rule_B

STEP_2_selectedSpecies_PCO

STEP_1_rule_C

STEP_2_selectedSpecies_robustness

Details

This function provides a way to select dominant species based on presence/abundance sampling information.

Three rules can be applied to make the species selection :

A. Presence releves

both conditions must be fullfilled

on number of releves

the species should be found a minimum number of times (rule.A1)
This should ensure that the species has been given sufficient minimum sampling effort. This criterion MUST ALWAYS be fullfilled.

on number of sites

the species should be found in a certain number of sites, which corresponds to the quantile rule.A2_quantile of the total number of records per species
This should ensure that the species is covering all the studied area (or at least a determining part of it, assuming that the releves are well distributed throughout the area).

B. Abundance releves :

at least one of the two conditions is required

on dominancy

the species should be dominant (i.e. represent at least rule.B1_percentage % of the coverage of the site) in at least rule.B1_number sites
This should ensure the selection of species frequently abundant.

on average abundance

the species should have a mean relative abundance superior or equal to rule.B2
This should ensure the selection of species not frequent but representative of the sites in which it is found.

C. Presence releves
per habitat :

If habitat information is available (e.g. type of environment : urban, desert, grassland... ; type of vegetation : shrubs, forest, alpine grasslands... ; etc), the same rules than A can be applied but for each habitat.
This should help to keep species that are not dominant at the large scale but could be representative of a specific habitat.

A table is created containing for each species whether or not it fullfills the conditions selected, for example :

| ___A1 ___A2 ___B1 ___B2 grass lands |
_______________________________________
| _TRUE FALSE FALSE _TRUE _TRUE FALSE | species a
| _TRUE _TRUE _TRUE FALSE FALSE FALSE | species b
| FALSE FALSE FALSE FALSE FALSE _TRUE | species c

This table is transformed into Euclidean distance matrix (with gowdis and quasieuclid functions)
to cluster and represent species (see .pdf output files) :

  • through phylogenetic tree (with hclust and as.phylo functions)

  • through Principal Component Analysis (with dudi.pco)

according to their selection rules :

  • A2 : spatial dominancy (widespread but poorly abundant)

  • B1 : local dominancy (relatively abundant or dominant in a certain number of sites)

  • B2 : local dominancy (not widespread but dominant in few sites)

  • C : habitat dominancy (not widespread but dominant in a specific habitat)

  • A2 & B1 : (widespread and relatively abundant)

  • A2 & B2 : (widespread and dominant in few sites)

  • A2 & B1 & B2 : (widespread and dominant)

  • B1 & B2 : (relatively widespread but dominant)

NB :
Species not meeting any criteria or only A1 are considered as "Not selected".
Priority is set to A2, B1 and B2 rules, rather than C. Hence, species selected according to A2, B1 and/or B2 can also meet criterion C while species selected according to C do not meet any of the three criteria.
Species selected according to one (or more) criterion but not meeting criterion A1 are also considered as "Not selected".

Author

Isabelle Boulangeat, Maya Guéguen

Examples


## Load example data
Champsaur_PFG = .loadData('Champsaur_PFG', 'RData')

## Species observations
tab = Champsaur_PFG$sp.observations

## No habitat, no robustness -------------------------------------------------
tab.occ = tab[, c('sites', 'species', 'abund')]
sp.SELECT = PRE_FATE.selectDominant(mat.observations = tab.occ)
names(sp.SELECT)
str(sp.SELECT$tab.rules)
str(sp.SELECT$tab.dom.PA)
plot(sp.SELECT$plot.A)
plot(sp.SELECT$plot.B$abs)
plot(sp.SELECT$plot.B$rel)

## Habitat, change parameters, no robustness (!quite long!) --------------------
if (FALSE) {
tab.occ = tab[, c('sites', 'species', 'abund', 'habitat')]
sp.SELECT = PRE_FATE.selectDominant(mat.observations = tab.occ
                                    , doRuleA = TRUE
                                    , rule.A1 = 10
                                    , rule.A2_quantile = 0.9
                                    , doRuleB = TRUE
                                    , rule.B1_percentage = 0.2
                                    , rule.B1_number = 10
                                    , rule.B2 = 0.4
                                    , doRuleC = TRUE)
names(sp.SELECT)
str(sp.SELECT$tab.rules)
plot(sp.SELECT$plot.C)
plot(sp.SELECT$plot.pco$Axis1_Axis2)
plot(sp.SELECT$plot.pco$Axis1_Axis3)
}

## No habitat, robustness (!quite long!) --------------------
if (FALSE) {
tab.occ = tab[, c('sites', 'species', 'abund')]
sp.SELECT = PRE_FATE.selectDominant(mat.observations = tab.occ
                                    , opt.doSitesSpecies = FALSE
                                    , opt.doRobustness = TRUE
                                    , opt.robustness_percent = seq(0.1,0.9,0.1)
                                    , opt.robustness_rep = 10)
names(sp.SELECT)
str(sp.SELECT$tab.robustness)
names(sp.SELECT$plot.robustness)
plot(sp.SELECT$plot.robustness$`All dataset`)
}