The bioconcentration factor, BCF, has been defined as the ratio of the chemical concentration in biota to that in water at steady state, as a result of absorption via the respiratory surface (Hamelink; 1977). Environmentally, the BCF describes the accumulation of pollutants partitioning from the aqueous phase into an organic phase (typically fish) and does not include uptake due to diet. The BCF has no units, as seen by the equation below:
BCF = [Concentration in organism] / [Concentration in environment]
Here, the BCF and steady-state BCF are synonymous with steady-state occurring after the organism has been exposed for a sufficient length of time such that the ratio does not change substantially. Environmentally, there may be concern if a significant amount of a substance is concentrated in a local environment (through dumping, spillage, production, etc.) and the BCF is significantly greater than 1. The European Union’s REACH framework indicates BCF measurements for pollutant/chemical production or import of greater than 100 tons per year, as the BCF can be useful for classification and labeling, prioritization, and safety assessment purposes. In particular it has been suggested that the BCF could be used in a first tier risk assessment of secondary poisoning in wildlife and humans though dietary exposure. The Organization for Economic Co-operation and Development (OECD) 305 guideline specifies the preferred experimental conditions for BCF testing. The number of fish suggested for the test ranges from 132 to 240 with each test being performed from 44 to 116 days. A literature data set of 592 substances with experimentally measured data points was compiled and logBCF was modeled using ANNE methodology. 20% of the entire data set was set aside for the test set and in both 2D and 3D cases.
Understanding biodegradation of chemicals in the environment is becoming an increasing concern. There are several assays available to classify if a compound is biodegradable. (Cheng; 2012) provided data and references for a set of 1604 compounds that have either undergone the Japanese MITI (14 day protocol) or the OECD 301C (28 day protocol). The compound of interest is mixed with sludge from several different geographical locations and then oxygen consumption is measured for the period of time based on the protocol. Biological oxygen demand (BOD) and percent biodegradation are calculated via the following equations, where ThOD represents the theoretical oxygen demand. Both BOD and ThOD are given in units of [mg O2 / mg test substance].
BOD = ([mg O2 uptake by test substance] - [mg O2 uptake by blank]) / [mg of test substance in the vessel]
% Biodegradation = 100 * BOD / ThOD
A compound is considered readily biodegradable if the BOD is greater than or equal to 60% of the ThOD, otherwise that compound is considered non-readily biodegradable.
After removing all metal containing compounds and maintaining only a single stereoisomer or tautomer of duplicated compounds, 1581 compounds remained. Approximately 22% of the data set was set aside as an external test set with the remaining compounds used as the training pool from which an artificial neural network ensemble was created with sensitivity and specificity above 82% on both the training pool and external data set.
Beginning in 1995, the Mid-Continent Ecology Division of the US Environmental Protection Agency tested a set of industrial compounds for lethal effects on Pimephales promelas, the fathead minnow. The resulting database was used for internal efforts to develop a structure-activity relationship model and was also made available for public use under the EPA’s DSSTox program. This published data was the basis for training ADMET Predictor’s fathead minnow acute toxicity model, called TOX_FHM.
The result that appears in the TOX_FHM column is the predicted concentration in units of mg/L of a given compound that will kill 50 percent of a population of minnows after an exposure time of 96 hours.
Although relatively few pharmaceutical compounds appeared in the training set of this model, those molecules that do fall into its chemical space still have a high predictive confidence. The model is best suited to aromatic, amine-rich, halogenated, or non-polar compounds.
Another toxicity assay has been developed at the College of Veterinary Medicine at the University of Tennessee in the laboratory of Prof. T.W. Schultz (Schultz 1997). The assay measures the concentration of toxicant needed to inhibit 50% growth (IGC50) in the protozoan species, Tetrahymena pyriformis, after approximately 40 hours exposure (8-9 cell cycles in the control group) at 27°C. Publicly available data from this assay was employed in a recent publication to assess various QSAR modeling approaches by different research groups (Zhu et al, 2008). In that study, the dataset was partitioned into a training set (provided to each of the modeling groups) and two test sets (the second test set being discovered after the study was initiated). Using the same partitioning, we matched the best level of performance (in both test sets) as reported among individual models of that study. By repartitioning the full dataset into training/verification and test sets using a Kohonen map, further improvement in performance was achieved as shown in figure below. The output is pIGC50 where the IGC50 part is in units of mmol/L.
The next aquatic toxicity model, TOX_DM, is based on lethal concentration (in mg/L) that results in the death of 50% of Daphnia magna (water fleas) after 48 hours. The data for this study was obtained from the EPA’s website with the endpoint given as pLC50 along with the guidelines as to how it was obtained from the EPA’s ECOTOX database. Although the model was developed in molar units (as the graph below indicates) to eliminate explicit molecular weight dependence, the model output is converted back to the LC50 expressed in units of mg/L. The model’s performance reflects the quality of experimental data used to build it. It should be noted that approximately 10% of this data set was composed of pLC50 values derived from measurements varying more than one log unit, which is a common problem with biological data obtained from different sources. On the other hand, the Spearman’s rank correlation coefficient is 0.88 for the training/verify and test sets, which implies that the artifact caused by the disparity of measurements may not be significantly impacting the qualitative side of TOX_DM model.