Appendix 5: Chapter 4 Supplementary Materials

Article title: Forest gap dynamics: an underexplored factor that drives divergent adaptive growth strategies within tropical tree species

Authors: Sylvain Schmitt, Niklas Tysklind, Myriam Heuertz, Bruno Hérault

The following Supporting Information is available for this article:

SI Materials and Methods.

SI Materials and Methods.

Part of the analyses are common with and described in Schmitt et al. (in prep).

Study site

The study was conducted in the Paracou field station, in the coastal forests of French Guiana, South America. The site is characterized by an average of 3,041 mm annual rainfall and a mean air temperature of 25.71 °C (Aguilos et al. 2018). Old tropical forest with an exceptional richness (i.e. over 750 woody species) grows across the succession of small hills of this area, which rise to 10–40 m a.s.l. (Gourlet-Fleury et al. 2004). The site comprises 16 permanent plots (fifteen 6.25 ha plus one 25 ha) which have been censused (DBH>10) every 1-2 years for more than 35 years. Nine of the plots were logged and subjected to human-induced disturbance in 1986 (details on the experiment in Hérault and Piponiot 2018).

Plant material

Four hundred and two individuals of Symphonia globulifera (Clusiaceae) were sampled in 2017 during the dry season (from September to December) in Paracou. Symphonia globulifera L.f (Clusiaceae) was previously recognized as composed of two morphotypes in French Guiana (Sabatier et al. 1997, Molino and Sabatier 2001, Baraloto et al. 2007). S. globulifera sensu stricto and Symphonia sp.1 occur in sympatry but in differentiated habitats, with S. globulifera preferentially growing in valley bottoms with an acquisitive functional strategy and S. sp1 preferentially exploiting a variety of drier habitats with a conservative functional strategy (Allié et al. 2015, Schmitt et al., in prep; Schmitt et al. 2020). Symphonia have been highlighted as a species complex with low (phylo-)genetic species resolution and high levels of plastid DNA sharing among sister species (Gonzalez et al. 2009, Baraloto et al. 2012a, Torroba-Balmori et al. 2017, Caron et al. 2019). In addition, outgroups for genetic analysis in Symphonia were comprised of 13 individuals of Symphonia globulifera from Africa (Sao Tome, Gabon, Cameroun, Congo, Benin, Liberia, Ivory Coast, and Ghana), seven Symphonia globulifera from South America (Brazil, Costa Rica and Panama), two Symphonia nectarifera Jum. & H. Perrier from Madagascar, two Symphonia urophylla (Decne. ex Planch. & Triana) Benth. & Hook.f. ex Vesque from Madagascar, five Pentadesma butyracea Sabine from Benin and Cameroon and one Pentadesma grandifolia Baker f. from Cameroon. Leaves were collected from the 432 individuals (402 + 30 outgroups) and dessicated using silica gel.

Sequence capture

Design of probes set

The genomic and transcriptomic resources used for the design were comprised of a published low-coverage draft genome from Africa (Olsson et al. 2017), an unpublished draft genome from French Guiana [Scotti et al., in prep], an unpublished transcriptome from 20 juveniles from French Guiana [Tysklind et al., in prep], and reduced-representation genomic sequence reads of individuals from French Guiana [Torroba-Balmori et al., unpublished]. We aligned genomic reads on the two genome drafts with bwa (Li and Durbin 2009). We kept scaffolds from the two genome drafts with a length superior to 1 kbp and at least one matching alignment with a read with a single match on the genome, and merged the two filtered genome drafts with quickmerge (Chakraborty et al. 2016). We aligned transcripts on the new filtered genome draft with BLAT (Kent 2002) and selected 533 scaffolds without transcript-match, i.e. anonymous scaffolds. We masked repetitive regions with RepeatMasker (Smit et al. 2015) and selected 533 1-kbp anonymous loci within the 533 previous scaffolds.

Similarly, we filtered transcripts from the 20 juveniles of Symphonia globulifera from French Guiana [Tysklind et al., in prep] based on SNP quality, type and frequency. We further detected open reading frames (ORFs) using transdecoder (Haas et al. 2013), and selected transcripts with non-overlapping ORFs including a start codon. We kept ORFs with an alignment on scaffolds from the aforementioned genome draft for Symphonia using BLAT (Kent 2002), and masked repetitive regions with RepeatMasker (Smit et al. 2015). We selected 1,150 genic loci of 500-bp to 1-kbp, from 100 bp before the start to a maximum of 900 bp after the end of the ORFs, resulting in 1-Mbp genomic loci that included a coding region.

Genomic libraries and sequence capture

Genomic DNA was extracted from 5 mg of dried leaf tissue with a CTAB protocol (Doyle and Doyle 1987). DNA extracts were digested with ‘Ultra II FS Enzyme Mix’ (new England Biolabs Inc, MA, USA) for a target size of 150 bp, and libraries built with the ‘NEBNext Ultra II FS DNA Library Prep kit for Illumina’(New England Biolabs Inc, MA, USA). We amplified and tagged libraries using 5 \(\mu L\) of adaptor-ligated DNA, 8.3 \(\mu L\) of ‘NEBNext Ultra II Q5 Master Mix’ (new England Biolabs Inc, MA, USA), 2x 1.6 \(\mu L\) of Index Primer i5 and i7 from ‘NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1 and Set 2)’ (new England Biolabs Inc, MA, USA). Initial denaturation (98°C for 30 s) was followed by 8 cycles (98°C for 10 s and 65°C for 1 min 30 s) and a final extension (65°C for 5 min). We pooled libraries in four equimolar multiplexes for each genus. We obtained a custom made set of 20,000 80-mer probes for each genus using myBaits Custom 1-20K (Arbor Biosciences, MI, USA) and conducted the capture experiments using the corresponding myBaits V4 protocol with a hybridization time of 80 hours. We pooled the four multiplexes and sequenced them in two lanes of an Illumina HiSeq 4000 instrument obtaining 2x150bp pair-end reads for each genus.

SNP calling and filtering

We assessed the quality off raw reads using multiqc (Ewels et al. 2016) and trimmed them with trimmomatic (Bolger et al. 2014). We kept only pair-end reads without adaptors and a phred score above 15 in a sliding window of 4. Seventy percent of trimmed reads mapped off-targets using bwa (Li and Durbin 2009). We thus mapped trimmed reads on the hybrid reference built for the sequence capture experiment using bwa (Li and Durbin 2009), picard (Broad Institute 2018), samtools (Li et al. 2009) and bedtools (Quinlan and Hall 2010). We called variants for each individual using HaplotypeCaller, aggregated variants using GenomicsDBImport and jointly-genotyped individuals using GenotypeGVCFs all in GATK4 software (Auwera et al. 2013). We filtered biallelic SNPs with a quality above 30, a quality by depth above 2, a Fisher strand bias below 60 and strand odds ratio above 3 using GATK4 (Auwera et al. 2013). Finally, we filtered individuals and SNPs for missing data with a maximum of 95% and 15% of missing data per individual and SNP, respectively, using plink2 (Chen et al. 2019). We obtained 454,262 biallelic SNPs over 385 individuals without outgroups, that we used for population genetic analyses. Since low-frequency alleles and linkage disequilibrium will bias the number of fixed loci and increase the number of false-positives in genomic scans for outliers (Foll and Gaggiotti 2008), we built a second dataset for quantitative genomics and genomic scans, filtering variants with a minor allele frequency above 5% (18 individuals) and with linkage disequilibrium \(r^2<0.99\). We further removed admixed individuals (see population genetic analyses for criteria) and retained 70,737 biallelic SNPs over 372 individuals.

Analyses

Genetic species delimitation

We investigated population genetic structure using admixture (Alexander and Lange 2011), using 10 repetitions of K genetic groups varying from 1 to 10 and assessed the number of gene pools with cross validation. We defined individuals with a membership to gene pools below 90% as admixed and the remaining individuals as genetically pure. We further investigated admixture with the introgress R package (Gompert and Alex Buerkle 2010), using genetically pure individuals as parental populations and all individuals as the hybrid population. We validated gene pool delimitation by comparison with botanical identifications using a confusion matrix, and we conducted a second blind-identification of every collected individual in November 2019.

Neighbour crowding effect on neutral and adaptive genetic variation

We did environmental association analyses (Rellstab et al. 2015) in each complex using general linear mixed models developed for genome wide association studies (GWAS). We used mean neighbourhood crowding index (\(NCI\); Uriarte et al. 2004a) over the last 30 years, an indirect measurement of access to light and forest gap dynamics, as the response variable and genetic structure (gene pools representing species) and relatedness (kinship matrix) as explanatory variables, as it is common practice (Rellstab et al. 2015). This analysis assumed that the neighbour crowding conditions where individuals have grown above 10-cm DBH are strongly correlated to the individual heritable phenotypes (e.g. Eckert et al. 2010). The mean neighborhood crowding index \(NCI_i\) from tree individual \(i\) was calculated as follow:

\[NCI_i=\overline{\sum_{j|\delta_{i,j}<20m}DBH^2_{j,t}.e^{-\frac14\delta_{i,j}}}\]

with \(DBH_{j,t}\) the diameter of the neighbouring tree \(j\) in year \(t\) and \(\delta_{i,j}\) its distance to the individual tree \(i\). \(NCI_i\) is computed for all neighbours at a distance \(\delta_{i,j}\) inferior to the maximum neighbouring distance of 20 meters. The power of neighbours \(DBH_{j,t}\) effect was set to 2 to represent a surface. The decrease of neighbours diameter effect with distance was set to -0.25 to represent trees at 20 meters of the focal trees having 1% of the effect of the same tree at 0 meters. \(NCI_i\) is computed as the mean of yearly \(NCI_{i,t}\) over the last 30 years denoted by the overline.

We used genetic species and individual kinship in an animal model (Wilson et al. 2010) to estimate genetic variance associated with neighbour crowding index. We used a lognormal likelihood given that distributions of environmental variables were positive and skewed. We inferred individual kinship using KING (Manichaikul et al. 2010), as the method is robust to population structure. We set negative kinship values to null as they were confounding with population structure, and we further ensured that the matrix was positive-definite using the nearPD function from the R package Matrix. The environment \(y_{s,i}\) where individual \(i\) in species \(s\) grows was inferred with a lognormal distribution with the following formula:

\[y_{s,i} \sim logN(log(\mu_s.a_{i}),\sigma^2_1)\] \[a_{i} \sim MVlogN_N(log(1),\sigma^2_2.K)\]

where \(\mu_s\) is the mean environment of species \(s\), \(a_i\) is the breeding value of the individual \(i\) and \(\sigma^2_1\) is the shape parameter of the lognormal. Individual breeding values \(a_i\) are defined following a multivariate lognormal law \(\mathcal{MVlogN}\) of co-shape matrix defined as the product of the kinship matrix \(K\) with estimated individual genotypic variation \(\sigma^2_2\). To estimate variances on a normal scale, we log-transformed species fixed effect, genetic additive values, and we calculated conditional and marginal \(R^2\) (Nakagawa and Schielzeth 2013). A Bayesian method was used to infer parameters using stan language [Carpenter et al. (2017) and rstan package (Stan Development Team 2018) in the R environment (R Core Team 2020) using the No-U-Turn Sampler alogirthm (NUTS, Hoffman and Gelman 2014), which performs better for estimating genetic parameters and breeding values (Nishio and Arakawa 2019).

Neutral and adaptive genetic variation effect on individual growth

We investigated effects of ecological and evolutionary processes on individual growth, using genetic species and kinship. The individual growth of individual \(i\) in population \(p\) between individual recruitment \(y_0\) and 2017, correspond to the difference of DBH between the two years, and is defined with a hierarchical model in a lognormal distribution as follow:

\[DBH_{y=2017,p,i} - DBH_{y=y0,p,i} \sim logN(log[\sum_{y=y0}^{y=2017}AGR(DBH_{y,p,i})], \sigma^2_1)\]

where the difference of DBH \(DBH_{y=2017,p,i}-DBH_{y=y_0,p,i}\) is defined with a lognormal distribution located on the logarithm of the sum of annual growth rates \(AGR\) during the period \(y_0-2017\) and of shape \(\sigma_1\). The annual growth rates \(AGR\) for individual \(i\) in population \(p\) at year \(y\) with a diameter of \(DBH_{y,p,i}\) is defined following a Gompertz model (Gompertz 1825) already identified as the best model for growth-trajectories in Paracou (Hérault et al. 2011):

\[AGR(DBH_{y,p,i}) = Gmax_i.exp(-\frac12[\frac{log(\frac{DBH_{y,p,i}}{Doptp})}{Ksp}]^2)\]

where \(Gmax_i\) is the maximum growth potential (maximal AGR during individual life) for individual \(i\), \(Dopt_p\) is the population optimal diameter at which the individual reach its maximum growth potential, and \(Ks_p\) is the population kurtosis defining the width of the bell-shaped growth-trajectory (see figure 1 in Hérault et al. 2011). To ease model inference population optimal diameter \(Dopt_p\) and kurtosis \(Ks_p\) were defined as random population effect centered on a global \(Dopt\) and \(Ks\) with corresponding variances \(\sigma^2_{P,Dopt}\) and \(\sigma^2_{P,Ks}\). Individual \(i\) maximum growth potential \(Gmax_i\) was defined in a nested Animal model with a lognormal distribution:

\[Gmax_i \sim logN(log(Gmax_p.a_i), \sigma_{R,Gmax})\] \[a_i \sim MVlogN(log(1), \sigma_{G,Gmax}.K)\]

where \(Gmax_p\) is the mean \(Gmax\) of population \(p\), \(a_i\) is the breeding value of individual \(i\), and \(\sigma_{R,Gmax}\) is the shape of the lognormal distribution. Individual breeding values \(a_i\) are defined following a multivariate lognormal law \(MVlogN\) with a co-shape matrix defined as the product of the kinship matrix \(K\) and the genotypic variation \(\sigma_{G,Gmax}\). To estimate variances on a normal-scale, we log-transformed population fixed effect, genetic additive values, and calculated conditional and marginal \(R^2\) (Nakagawa and Schielzeth 2013). We used Bayesian inference with No-U-Turn Sampler (NUTS, Hoffman and Gelman 2014) using stan language (Carpenter et al. 2017).

References

Aguilos, M. et al. 2018. Interannual and Seasonal Variations in Ecosystem Transpiration and Water Use Efficiency in a Tropical Rainforest. - Forests 10: 14.

Alexander, D. H. and Lange, K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. - BMC Bioinformatics in press.

Allié, E. et al. 2015. Pervasive local-scale tree-soil habitat association in a tropical forest community. - PLoS ONE 10: e0141488.

Auwera, G. A. et al. 2013. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. - Current Protocols in Bioinformatics 43: 483–492.

Baraloto, C. et al. 2007. Seasonal water stress tolerance and habitat associations within four Neotropical tree genera. - Ecology 88: 478–489.

Baraloto, C. et al. 2012a. Using functional traits and phylogenetic trees to examine the assembly of tropical tree communities. - Journal of Ecology 100: 690–701.

Bolger, A. M. et al. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. - Bioinformatics 30: 2114–2120.

Broad Institute 2018. Picard Tools.

Caron, H. et al. 2019. Chloroplast DNA variation in a hyperdiverse tropical tree community. - Ecology and Evolution 9: ece3.5096.

Carpenter, B. et al. 2017. Stan : A Probabilistic Programming Language. - Journal of Statistical Software in press.

Chakraborty, M. et al. 2016. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. - Nucleic Acids Research 44: 1–12.

Chen, Z. L. et al. 2019. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. - Nature Communications in press.

Doyle, J. and Doyle, J. 1987. Genomic plant DNA preparation from fresh tissue-CTAB method. - Phytochem Bull 19: 11–15.

Eckert, A. J. et al. 2010. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). - Genetics 185: 969–982.

Ewels, P. et al. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. - Bioinformatics 32: 3047–3048.

Foll, M. and Gaggiotti, O. 2008. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. - Genetics 180: 977–993.

Gompert, Z. and Alex Buerkle, C. 2010. Introgress: A software package for mapping components of isolation in hybrids. - Molecular Ecology Resources 10: 378–384.

Gompertz, B. 1825. On the nature of the function expressive of the law of humanmortality, and on a newmode of determining the value of life contingencies. - Philosophical Transactions of the Royal Society of London 115: 513–583.

Gonzalez, M. A. et al. 2009. Identification of amazonian trees with DNA barcodes. - PLoS ONE 4: e7483.

Gourlet-Fleury, S. et al. 2004. Ecology and management of a neotropical rainforest : lessons drawn from Paracou, a long-term experimental research site in French Guiana Ecology and management of a neotropical rainforest : lessons drawn from Paracou, a long-term experimental research sit. - Paris: Elsevier.

Haas, B. J. et al. 2013. De novo transcript sequence recostruction from RNA-Seq: reference generation and analysis with Trinity.

Hérault, B. and Piponiot, C. 2018. Key drivers of ecosystem recovery after disturbance in a neotropical forest. - Forest Ecosystems 5: 2.

Hérault, B. et al. 2011. Functional traits shape ontogenetic growth trajectories of rain forest tree species. - Journal of Ecology 99: 1431–1440.

Hoffman, M. D. and Gelman, A. 2014. The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. - Journal of Machine Learning Research 15: 1593–1623.

Kent, W. J. 2002. BLAT—The BLAST-Like Alignment Tool. - Genome Research 12: 656–664.

Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. - Bioinformatics 25: 1754–1760.

Li, H. et al. 2009. The Sequence Alignment/Map format and SAMtools. - Bioinformatics 25: 2078–2079.

Manichaikul, A. et al. 2010. Robust relationship inference in genome-wide association studies. - Bioinformatics 26: 2867–2873.

Molino, J.-F. and Sabatier, D. 2001. Tree Diversity in Tropical Rain Forests: A Validation of the Intermediate Disturbance Hypothesis. - Science 294: 1702–1704.

Nakagawa, S. and Schielzeth, H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. - Methods in Ecology and Evolution 4: 133–142.

Nishio, M. and Arakawa, A. 2019. Performance of Hamiltonian Monte Carlo and No-U-Turn Sampler for estimating genetic parameters and breeding values. - Genetics Selection Evolution 51: 1–12.

Olsson, S. et al. 2017. Development of genomic tools in a widespread tropical tree, Symphonia globulifera L.f.: a new low-coverage draft genome, SNP and SSR markers. - Molecular Ecology Resources 17: 614–630.

Quinlan, A. R. and Hall, I. M. 2010. BEDTools: A flexible suite of utilities for comparing genomic features. - Bioinformatics 26: 841–842.

R Core Team 2020. R: A Language and Environment for Statistical Computing.

Rellstab, C. et al. 2015. A practical guide to environmental association analysis in landscape genomics. 24: 4348–4370.

Sabatier, D. et al. 1997. The influence of soil cover organization on the floristic and structural heterogeneity of a Guianan rain forest. - Plant Ecology 131: 81–108.

Schmitt, S. et al. 2020. Topography consistently drives intra- and inter-specific leaf trait variation within tree species complexes in a Neotropical forest. - Oikos: oik.07488.

Smit, A. et al. 2015. RepeatMasker Open-4.0.: <http://www.repeatmasker.org>.

Stan Development Team 2018. Rstan: the R interface to Stan.

Torroba-Balmori, P. et al. 2017. Altitudinal gradients, biogeographic history and microhabitat adaptation affect fine-scale spatial genetic structure in African and Neotropical populations of an ancient tropical tree species. - PLOS ONE 12: e0182515.

Uriarte, M. et al. 2004a. A spatially explicit model of sapling growth in a tropical forest: Does the identity of neighbours matter? - Journal of Ecology 92: 348–360.

Wilson, A. J. et al. 2010. An ecologist’s guide to the animal model. - Journal of Animal Ecology 79: 13–26.