--- title: Estimating the linkage false discovery rate from sample size output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Estimating the linkage false discovery rate from sample size} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- This vignette provides an overview of the primary function of the linkage scenario portion of the phylosamp package: how to estimate the false discovery rate given a sample size. In the examples provided, we use the default `assumption` argument (multiple transmissions and multiple links, `mtml`), though alternative assumptions can also be specified. The most basic function of the package is `translink_tdr()`, which calculates the probability that an identified link represents a true transmission event. This calculation relies on the following parameters: | Param | Variable Name | Description | |:-----:|:-------------:|-------------| | **$\eta$** | sensitivity | the sensitivity of the linkage criteria for identifying transmission links | | **$\chi$** | specificity | the specificity of the linkage criteria for identifying transmission links | | **$\rho$** | rho | the proportion of infections sampled | | **$M$** | M | the number of infections sampled | | **$R$** | R | the average reproductive number (also denoted $R_\text{pop}$, see below) | ```{r setup, message=FALSE} library(phylosamp) ``` ```{r truediscoveryrate} translink_tdr(sensitivity=0.99, specificity=0.95, rho=0.75, M=100, R=1) ``` In other words, given a sample size of 100 infections (representing 75% of the total population), a linkage criteria with a specificity of 99% for identifying infections linked by transmission and a specificity of 95%, fewer than 25% of identified pairs will represent true transmission events. Increasing the specificity to 99.5% has a significant impact on our ability to distinguish linked and unlinked pairs: ```{r truediscoveryrate2} translink_tdr(sensitivity=0.99, specificity=0.995, rho=0.75, M=100, R=1) ``` The other core functions are designed to calculate the expected number of true transmission pairs identified in the sample (`translink_expected_links_true()`) and the total number of linkages one can expect to identify given the sensitivity and specificity of the linkage criteria and a particular sample size and proportion (`translink_expected_links_obs()`). ```{r expected_pairs} translink_expected_links_true(sensitivity=0.99, rho=0.75, M=100, R=1) translink_expected_links_obs(sensitivity=0.99, specificity=0.95, rho=0.75, M=100, R=1) ``` ##### Important Note It is important to recognize that $R$ in these functions represents the average $R$ in the sampled population (alternatively denoted $R_\text{pop}$). Because any sampling frame contains a finite number of cases, there will always be more cases than infection events (at minimum, all infectees in a transmission chain plus a single index case), so $R_\text{pop}\leq1$. For outbreaks with a single introduction, $R_\text{pop}$ is approximately equal to 1; sampling frames containing cases from separate introduction events will have lower values of $R_\text{pop}$.