TY - JOUR
T1 - Bayesian Modeling of Enteric Virus Density in Wastewater Using Left-Censored Data
AU - Kato, Tsuyoshi
AU - Miura, Takayuki
AU - Okabe, Satoshi
AU - Sano, Daisuke
N1 - Funding Information:
Acknowledgments This study was supported by the Japan Science and Technology Agency through a Grant-in-Aid for Core Research for Evolutionary Science and Technology (CREST), and the Japan Society for the Promotion of Science through Grant-in-Aid for Young Scientist (A) (22686049) and Scientific Research (A) (24246089).
PY - 2013/12
Y1 - 2013/12
N2 - Stochastic models are used to express pathogen density in environmental samples for performing microbial risk assessment with quantitative uncertainty. However, enteric virus density in water often falls below the quantification limit (non-detect) of the analytical methods employed, and it is always difficult to apply stochastic models to a dataset with a substantially high number of non-detects, i.e., left-censored data. We applied a Bayesian model that is able to model both the detected data (detects) and non-detects to simulated left-censored datasets of enteric virus density in wastewater. One hundred paired datasets were generated for each of the 39 combinations of a sample size and the number of detects, in which three sample sizes (12, 24, and 48) and the number of detects from 1 to 12, 24 and 48 were employed. The simulated observation data were assigned to one of two groups, i.e., detects and non-detects, by setting values on the limit of quantification to obtain the assumed number of detects for creating censored datasets. Then, the Bayesian model was applied to the censored datasets, and the estimated mean and standard deviation were compared to the true values by root mean square deviation. The difference between the true distribution and posterior predictive distribution was evaluated by Kullback-Leibler (KL) divergence, and it was found that the estimation accuracy was strongly affected by the number of detects. It is difficult to describe universal criteria to decide which level of accuracy is enough, but eight or more detects are required to accurately estimate the posterior predictive distributions when the sample size is 12, 24, or 48. The posterior predictive distribution of virus removal efficiency with a wastewater treatment unit process was obtained as the log ratio posterior distributions between the posterior predictive distributions of enteric viruses in untreated wastewater and treated wastewater. The KL divergence between the true distribution and posterior predictive distribution of virus removal efficiency also depends on the number of detects, and eight or more detects in a dataset of treated wastewater are required for its accurate estimation.
AB - Stochastic models are used to express pathogen density in environmental samples for performing microbial risk assessment with quantitative uncertainty. However, enteric virus density in water often falls below the quantification limit (non-detect) of the analytical methods employed, and it is always difficult to apply stochastic models to a dataset with a substantially high number of non-detects, i.e., left-censored data. We applied a Bayesian model that is able to model both the detected data (detects) and non-detects to simulated left-censored datasets of enteric virus density in wastewater. One hundred paired datasets were generated for each of the 39 combinations of a sample size and the number of detects, in which three sample sizes (12, 24, and 48) and the number of detects from 1 to 12, 24 and 48 were employed. The simulated observation data were assigned to one of two groups, i.e., detects and non-detects, by setting values on the limit of quantification to obtain the assumed number of detects for creating censored datasets. Then, the Bayesian model was applied to the censored datasets, and the estimated mean and standard deviation were compared to the true values by root mean square deviation. The difference between the true distribution and posterior predictive distribution was evaluated by Kullback-Leibler (KL) divergence, and it was found that the estimation accuracy was strongly affected by the number of detects. It is difficult to describe universal criteria to decide which level of accuracy is enough, but eight or more detects are required to accurately estimate the posterior predictive distributions when the sample size is 12, 24, or 48. The posterior predictive distribution of virus removal efficiency with a wastewater treatment unit process was obtained as the log ratio posterior distributions between the posterior predictive distributions of enteric viruses in untreated wastewater and treated wastewater. The KL divergence between the true distribution and posterior predictive distribution of virus removal efficiency also depends on the number of detects, and eight or more detects in a dataset of treated wastewater are required for its accurate estimation.
KW - Bayesian model
KW - Enteric virus density
KW - Left-censored data
KW - Non-detects
KW - Predictive distribution
KW - Wastewater
UR - http://www.scopus.com/inward/record.url?scp=84887410405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887410405&partnerID=8YFLogxK
U2 - 10.1007/s12560-013-9125-1
DO - 10.1007/s12560-013-9125-1
M3 - Article
AN - SCOPUS:84887410405
SN - 1867-0334
VL - 5
SP - 185
EP - 193
JO - Food and Environmental Virology
JF - Food and Environmental Virology
IS - 4
ER -