ELSEVIER

Regulatory Toxicology and Pharmacology

journal homepage: www.elsevier.com/locate/yrtph

Regulatory Toxicology and Pharmacology

Development of a prioritization method for chemical-mediated effects on steroidogenesis using an integrated statistical analysis of high-throughput H295R data

Check for updates

Derik E. Haggarda,b, R. Woodrow Setzerb, Richard S. Judsonb, Katie Paul Friedmanb,*

a Oak Ridge Institute for Science and Education, 100 ORAU Way, Oak Ridge, TN, 37830, USA

b National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA

ARTICLE INFO

Keywords: Steroidogenesis ToxCast Endocrine disruption Prioritization

ABSTRACT

Synthesis of 11 steroid hormones in human adrenocortical carcinoma cells (H295R) was measured in a high- throughput steroidogenesis assay (HT-H295R) for 656 chemicals in concentration-response as part of the US Environmental Protection Agency’s ToxCast program. This work extends previous analysis of the HT-H295R dataset and model by examining the utility of a novel prioritization metric based on the Mahalanobis distance that reduced these 11-dimensional data to 1-dimension via calculation of a mean Mahalanobis distance (mMd) at each chemical concentration screened for all hormone measures available. Herein, we evaluated the robustness of mMd values, and demonstrate that covariance and variance of the hormones measured appear independent of the chemicals screened and are inherent to the assay; the Type I error rate of the mMd method is less than 1%; and, absolute fold changes (up or down) of 1.5 to 2-fold have sufficient power for statistical significance. As a case study, we examined hormone responses for aromatase inhibitors in the HT-H295R assay and found high concordance with other ToxCast assays for known aromatase inhibitors. Finally, we used mMd and other ToxCast cytotoxicity data to demonstrate prioritization of the most selective and active chemicals as candidates for further in vitro or in silico screening.

1. Introduction1

The US Environmental Protection Agency (USEPA) Toxicity Forecaster, or ToxCast, program (Dix et al., 2007; Kavlock et al., 2012) currently includes alternative, high-throughput screening (HTS) assays to evaluate the endocrine bioactivity potential for hundreds to thou- sands of chemicals, including activity at the estrogen receptor, an- drogen receptor, steroid hormone biosynthesis, and thyroid hormone homeostasis (Judson et al., 2018a). These HTS assays, and the models developed based on them (Browne et al., 2015; Haggard et al., 2018; Judson et al., 2015, 2017; Kleinstreuer et al., 2017), are part of a larger continued effort to use new approach methodologies (NAMs) for prioritization and initial hazard screening for safety assessment (ECHA, 2016; Health Canada, 2016; USEPA, 2018). The acceptance of NAMs for this purpose typically requires a fit-for-purpose validation approach, as NAMs present new challenges in terms of the methodologies employed

and the approaches used for data interpretation and model building (Casati, 2018; Griesinger et al., 2016; Patlewicz et al., 2013; Rovida et al., 2015).

Disruption of hormone production can result in a wide range of diseases and adverse effects (as reviewed in Miller, 2017; Miller and Auchus, 2011). The USEPA Endocrine Disruptor Screening Program (EDSP) and the Organization of Economic Cooperation and Develop- ment (OECD) have developed testing guidelines for an in vitro ster- oidogenesis assay using the H295R cell line to measure chemical-in- duced perturbations of testosterone or estradiol synthesis (Hecker et al., 2011; OECD, 2011; USEPA, 2009). The H295R adrenocortical carci- noma cell line expresses the enzymes necessary for the synthesis of four major classes of hormones, including progestogens, corticosteroids, androgens, and estrogens (Gazdar et al., 1990), which makes these cells useful for screening for chemical effects on steroidogenesis. However, these cells do not fully resemble any one steroidogenic tissue in vivo,

*Corresponding author. 109 T.W. Alexander Drive, Mail Drop D143-02, Research Triangle Park, NC, 27711, USA.

E-mail address: paul-friedman.katie@epa.gov (K. Paul Friedman).

1 Endocrine Disruptor Screening Program (EDSP); U.S. Environmental Protection Agency, USEPA; high-throughput H295R assay, HT-H295R; mean Mahalanobis distance, mMd; maximum mean Mahalanobis distance, maxmMd; new approach methodology, NAM; progesterone, PROG; 17a-hydroxypregnenolone, OH-PREG; 17a-hydroxyprogesterone, OH-PROG; deoxycorticosterone, DOC; corticosterone, CORTIC; 11-deoxycortisol, 11-DCORT; cortisol, CORT; androstenedione, ANDR; testosterone, TESTO; estrone, E1; estrone E2.

and the toxicokinetics and toxicodynamics for H295R cells may be different compared to other model systems. For example, lower throughput models for steroidogenesis employing Leydig or ovarian cells ex vivo have demonstrated differing steroidogenic effects of che- mical exposures when compared to H295R which may be due to the presence of additional regulatory mechanisms (e.g. hypothalamic-pi- tuitary-gonadal feedback mechanisms, lack of corticosteroid synthesis in gonadal tissues, etc.) or differences in toxicokinetics/dynamics be- tween models (Botteri Principato et al., 2018; Pinto et al., 2018). H295R cells resemble zonally undifferentiated fetal adrenal cells in that the major classes of hormones are all synthesized, and the overall level of steroidogenic output can be controlled using different culture media conditions, e.g. angiotensin II and forskolin can increase output (Mangelis et al., 2016). H295R cells are useful as an in vitro model system to study chemical effects on steroidogenesis. Due to the low- throughput nature of the USEPA and OECD-validated test guidelines, the H295R assay was adapted to a high-throughput format (termed HT- H295R) to increase screening efficiency (Haggard et al., 2018; Karmaus et al., 2016). This was, in part, spurred by the thousands of substances present in the environment for which there is no information regarding potential effects on steroidogenesis and the push to adopt alternative high-throughput technologies for endocrine-related screening by USEPA and other regulatory agencies (ECHA, 2016; Health Canada, 2016; Health Canada, 2018; Judson et al., 2009; USEPA, 2011). Further development of the HT-H295R assay may result in its use as a part of NAMs that enable rapid screening of large numbers of chemicals to identify and prioritize substances with the highest potential to disrupt steroidogenesis.

The HT-H295R assay has been used to screen 2012 chemicals in single-concentration and 656 chemicals in multi-concentration. Unlike the USEPA and OECD testing guidelines, which only require measure- ment of testosterone and estradiol, 13 hormones were quantified in the HT-H295R assay using liquid-chromatography and mass spectrometry methods; of these 13 hormones, 11 were consistently measured above the lower limit of quantitation and comprised the input data for a statistical model to interpret chemical effects on steroidogenesis in the HT-H295R assay (Haggard et al., 2018). Comparison of this approach with the reference chemical set used in the OECD interlaboratory va- lidation study (Hecker et al., 2011) showed good reproducibility in the testosterone and estradiol responses. Further, inclusion of additional hormone measures provided not only more biological information in the H295R model, but also enabled a statistical model that provides a quantitative ranking measure for prioritization (Haggard et al., 2018; Hecker et al., 2011). Thus, it was demonstrated that the HT-H295R assay model represents a fit-for-purpose replacement for the low- throughput H295R assay, and that the HT-H295R model, including data from 11 hormones, may provide a means to identify chemicals that have impacts on steroidogenesis upstream of testosterone or estradiol production.

The HT-H295R assay and analysis approach are being considered for use as an alternative for the low-throughput H295R assay, sub- sequent to review by a Scientific Advisory Panel convened by the USEPA to review the performance of the HT-H295R assay and the statistical model that was implemented (USEPA, 2017). There were three main needs identified for technical refinement (USEPA, 2017): demonstration of the robustness and/or reproducibility of the metho- dology chosen using data simulations (with specific requests to un- derstand the false positive rate and normality of the data); investigation of whether the assay can identify specific mechanisms of disruption; and, demonstration of how the results may have been confounded by mitochondrial function and/or cytotoxicity despite use of a parallel mitochondrial toxicity assay in H295R cells. The objective of the work herein is to provide more context for the use of this statistical model in prioritization or hazard screening by addressing these questions, which may help move this approach further towards regulatory acceptance.

A statistical model was needed for interpretation of the HT-H295R

assay due to three major challenges: one, HT-H295R data are multi- variate, i.e. there are 11 hormone responses to understand at any given chemical concentration; two, the production of each hormone in the system is not independent of all of the other hormones, i.e. covariances and correlations between hormone outputs are observed; and, three, the high diversity of patterns in the 11 hormone panel and time-depen- dence of these patterns indicated that a mechanistic, pathway-based model might not be feasible with the available dataset, and as such, an unbiased way of describing the magnitude of chemical-induced per- turbation of the 11 hormone network was needed. Therefore, a novel statistical metric using the mean Mahalanobis distance (mMd) was used to quantify the effect of a given chemical on the overall steroidogenesis pathway measured in HT-H295R at each screened concentration; i.e. the mMd quantifies in a single metric (at each chemical concentration) the magnitude of the difference in the response of all 11 hormones compared to the DMSO control response. A primary justification for the use of the Mahalanobis distance for quantification of effects in the HT- H295R assay is its ability to control for the correlation and covariance that may be present in multivariate data (De Maesschalck et al., 2000). In our previous work, we estimated a covariance matrix for the steroid hormone responses using the HT-H295R dataset; however, it was un- known if minor changes in the dataset might introduce changes to this covariance matrix that would affect the calculation of the mMds. As previously described (Haggard et al., 2018; Karmaus et al., 2016), a phased screening approach was used to identify a subset of the 2012 unique chemicals screened in single concentration for multi-con- centration screening; many chemicals selected for multi-concentration screening demonstrated positive responses in at least four of the 11 hormone measures in the single-concentration study (Haggard et al., 2018; Karmaus et al., 2016). Our null hypothesis was that this heavy weighting of active chemicals may have resulted in a biased estimate for the covariance matrix. Here, we explore the reproducibility of the mMd method using covariance matrices derived from different subsets of the HT-H295R dataset using data simulation. In these simulations, we also included steroid response profiles indicative of a true negative response and simulated profiles with response patterns at different ef- fect sizes compared to DMSO control to better characterize the Type I (false positive) error rate and power of the mMd metric to identify experimental significance.

Beyond the multivariate mMd model that characterizes the global response, specific hormones measured in the HT-H295R assay may provide information regarding effects on specific hormone classes, namely progestogens, corticosteroids, androgens, and estrogens. In this paper, a case study for aromatase inhibition is further explored because of the importance of this mechanism of steroidogenesis disruption within the context of EDSP Tier 1 screening and breast cancer ther- apeutics development. In the EDSP Tier 1 battery, there is an in vitro assay for a mechanism of hormone synthesis disruption: inhibition of the aromatase enzyme (CYP19A1), which results in decreased estradiol and estrone production. H295R cells have previously been used to ex- amine this specific mechanism for a set of seven reference chemicals (Higley et al., 2010); however, this study was limited to only measures of testosterone, estradiol, and aromatase activity and did not include measurements for other hormones. Understanding specific effects on aromatase are difficult in the absence of a kinetic model to demonstrate enzyme inhibition. Differential effects on aromatase activity may manifest further upstream or along different terminal ends of ster- oidogenesis, e.g. corticosteroids, and other mechanisms such as es- trogen receptor modulation may also affect estrogen output. We per- formed hormone response profiling of the 656 chemicals tested in multi-concentration to identify the chemical exposures associated with decreased estrone and estradiol to understand if prototypical aromatase inhibitors would be reflected in this set. This analysis aimed to answer whether the estrogen signal in the HT-H295R assay could also be used as a screen for effects on estrogen synthesis inhibition.

Finally, a demonstration of how to use cytotoxicity and

mitochondrial bioactivity in concert with the mMd approach for prioritization was developed, as chemicals with specific effects on steroidogenesis and/or mitochondrial function might be of greater in- terest for endocrine activity than cytotoxic chemicals. Steroidogenesis is dependent on functional mitochondria, and so the HT-H295R assay should be particularly sensitive to mitochondrial toxicants or cytotoxic disruption of mitochondrial function. Here, a mitochondria-related toxicity assay, the MTT assay, was screened in parallel with the HT- H295R assay data, and these MTT data were used as a cell viability indicator filter on the HT-H295R data used in the mMd modeling ap- proach (Haggard et al., 2018). However, the sensitivity of the MTT assay for changes in mitochondrial health and cell viability versus other available methods is unknown, and as such we sought to leverage the additional data available via ToxCast for cytotoxicity, with the ex- pectation that this contextual information will be improved iteratively over time. Experience from high-throughput chemical screening pro- grams has demonstrated that the frequency of positive activity across diverse assays increases at concentrations at or approaching activity in cytotoxicity assays (termed the cytotoxicity “burst”), reducing the ability to identify mechanistic bioactivity (Judson et al., 2016). Uti- lizing cytotoxicity information from the in vitro ToxCast screening data alongside the parallel MTT in the HT-H295R assay, we developed a selectivity and ranking metric to enable prioritization of chemical samples that had the greatest potency and efficacy in the HT-H295R assay. The novel evaluation of the HT-H295R assay and mMd approach further support its use in identification of in vitro endocrine activity.

2. Materials and methods

2.1. HT-H295R assay information

The following sections describe the HT-H295R assay in brief, as this assay has been described in previous publications (Haggard et al., 2018; Karmaus et al., 2016).

2.2. Chemical library

As previously described (Haggard et al., 2018; Karmaus et al., 2016), 656 unique chemicals were tested in HT-H295R in concentra- tion-response. These chemicals were derived from the ToxCast phase I, II, and endocrine 1000 (E1K) chemical lists and cover a wide chemical landscape with diverse bioactivities (Richard et al., 2016). Tested chemical concentrations ranged from 0.041 nM to 100 uM, depending on solubility in dimethyl sulfoxide (DMSO) and cell viability. All in- formation regarding the chemicals within the ToxCast chemical library are publicly available (http://www.epa.gov/ncct/dsstox/or https:// comptox.epa.gov/dashboard/chemical_lists).

2.3. HT-H295R assay

Cell culture, treatment, hormone quantification, and cell viability components of the HT-H295R assay have been described previously (Haggard et al., 2018; Karmaus et al., 2016). Cell culture, treatments, and cell viability assessments were performed by Cyprotex US, LLC (Kalamazoo, MI). In brief, H295R cells were seeded at 50-60% con- fluency in 96-well microwell plates and allowed to adhere overnight. Cells were then exposed to media containing 10 uM forskolin for 48 h to stimulate steroidogenesis. Medium containing forskolin was then re- moved and replaced with media containing test chemical or experi- mental controls (10 uM forskolin, 3 uM prochloraz, 0.1% DMSO) to a final concentration of 0.1% DMSO for another 48 h.2 After chemical

exposure, media was removed and stored at -80 ℃ and shipped to OpAns, LLC (Durham, NC) for hormone quantification; the remaining cells in each well were assessed for cell viability using the MTT assay. Any test concentrations that resulted in lower than 70% cell viability were removed prior to analysis. Frozen medium was thawed, and hormone concentrations were quantified using high-performance liquid chromatography tandem mass spectrometry (HPLC-MS/MS) after ex- traction using methyl tert-butyl ether with an extra derivation using danzyl chloride to quantify estrone and estradiol concentrations. Thir- teen hormone measurements were attempted (pregnenolone, 17a-hy- droxypregnenolone, progesterone, 17a-hydroxyprogesterone, deox- ycorticosterone, corticosterone, 11-deoxycotrisol, cortisol, dehydroepiandrosterone, androstenedione, testosterone, estrone, 17ß- estradiol), but pregnenolone and dehydroepiandrosterone were often below the lower limit of quantitation (53.1% and 69.5% of all mea- surements, respectively), and therefore these hormones were not been included in this analysis, leaving 11 hormone measures as previously described (Haggard et al., 2018; Karmaus et al., 2016). Each test che- mical was assayed in technical duplicate, representing one biological replicate per chemical. However, as describe previously (Haggard et al., 2018), 103 and four of the 656 test chemicals have two or three, bio- logical replicates, respectively.

2.4. Software

All data management, analysis, and figures were generated using the R statistical programming language (3.5.1) using RStudio (version 0.98.501) on Linux (Red Hat version 6.10). Data simulations were run in parallel using 34 cores. All the original code and source files are available on (EPA FTP ftp://newftp.epa.gov/COMPTOX/NCCT_ Publication_Data/FriedmanPaul_K/CompTox-ToxCast_ EDSPsteroidogenesis prioritization) and the USEPA GitHub site (https://github.com/USEPA/CompTox-ToxCast_EDSPsteroidogenesis_ prioritization).

2.5. Overview of computational approach

In the sections below, we describe the computational approach employed in detail to answer several main questions. Answers to the first questions are meant to characterize the robustness of the mMd approach by evaluating: (1) how stable is the residual covariance ma- trix used in the mMd approach, and how do changes in the covariance matrix impact resulting mMd values compared to the mMd values calculated in original analysis of Haggard et al. (2018); and, (2) what is the power of the mMd approach to detect fold-changes in hormones and, conversely, what is the observed false positive rate? The second set of questions strive to define use cases for the HT-H295R data and the mMd model of these data: (1) does the HT-H295R assay detect che- micals known to disrupt estrogen synthesis; and (2) do additional cy- totoxicity and mitochondrial toxicity information help contextualize the mMd results from the HT-H295R assay?

The computational experiments to address the robustness of the mMd approach are illustrated in Fig. 1. In brief, we created simulated datasets of control and treatment responses for each chemical by sampling from multivariate normal distributions defined by estimated mean responses and covariances from the original data. The sampling

2 As described in Karmaus et al. (2016) and Haggard et al. (2018), a forskolin pre-stimulation was used. However, despite pre-stimulation, chemical-medi- ated induction of steroid hormone synthesis is still observed with the HT-H295R

(footnote continued)

assay. For example, comparison of the median DMSO control hormone response to the median response of the forskolin positive controls (10 uM) demonstrate positive fold-induction for 10 of 11 steroid hormones reported (1.99 ± 0.16, 0.45 ± 0.35, 1.25 + 0.20, 1.47 + 0.47, 2.93 ± 0.50, 1.42 ± 0.12, 1.86 ± 0.21, 1.45 ± 0.13, 1.42 + 0.09, 2.51 + 0.57, 2.59 + 0.40 log2 fold change in uM for OH-PREG, PROG, OH-PROG, DOC, CORTIC, 11-DCORT, CORT, ANDR, T, E1, and E2, respectively).

Partition Data from Original maxmMds

Block Covariance Estimation

DMSO

Low

InVitroDB HT- H295R Level 0 Data

MVN Sampling

x2000

Simulated Data

Medium

High

Mean hormone response vector for each concentration of test chemical or DMSO control

Fig. 1. Workflow of data simulation study.

Repartition Sim. Data from maxmMds

Block Covariance Estimation

Sim. Low

Simulated Low mMds

Low

x2000

Simulated Data

Sim. Global

Mahalanobis Calculation

Simulated Global mMds

Global

Sim. High

Simulated High mMds

High

(A) Original level 0 data from the ToxCast database, invitrodb, were downloaded and converted into micromolar concentrations and were subset into four datasets to represent vehicle controls, low, medium, and high responses in the HT-H295R assay based on the maxmMd values from Haggard et al. (2018). These subsets were used to calculate 4 covariance matrices for each block of assay data (for a total of 32 covariance matrices) for MVN sampling. Using mean hormone response values by chemical sample and the appropriate covariance matrices, 2000 unique simulated datasets were generated. (B) For each simulated dataset, three sets of mMds were calculated for all simulated chemical samples using three covariance matrices estimated from either low responding chemicals, high responding chemicals, or the global dataset, for a total of 6000 sets of mMd values.

(including estimation of means and covariances) was performed within experimental assay block and within four bins of chemicals: DMSO controls, and then those having low, medium, or high mMd values, and low mMds, and all chemicals (termed “DMSO”, “low”, “medium” and “high”) (Fig. 1A). Three sets of mMd values were then calculated using these different covariance matrices based on simulated data (Fig. 1B); these three repartitions of the simulated data produced covariance matrix estimates from “low” maxmMd values, all (termed “global”) maxmMd values, and “high” maxmMd values. A comparison between the original HT-H295R analysis and the data simulations was used to address the question: was the original covariance matrix estimate a reasonable approximation? These data simulations were then used in power analyses and Type I error approximation. In total, these experi- ments address the robustness and appropriateness of the mMd ap- proach. Additional details of the methods for the data simulation ex- periments follow in subsequent sections.

2.5.1. Source data

HT-H295R data for these analyses were downloaded using the ToxCast pipeline (tcpl) R package that is publicly available (https:// cran.r-project.org/web/packages/tcpl/index.html) (Filer et al., 2017). Multi-concentration level 0 data from invitrodb (version 3.1) were downloaded and converted from ug/ml into micromolar concentrations prior to calculation of mMds and data simulation (Supplemental Data 1). The data structure consisted of eight different data blocks, which represent the eight separate HT-H295R assay experiments that were performed between 2013 and 2017. Due to the presence of block ef- fects, analyses were performed within blocks. In total, 654 unique

chemicals were screened in multi-concentration with some replication within and across blocks for a total of 766 unique chemical samples. Measured hormones are abbreviated throughout the manuscript as follows: PROG, progesterone; OH-PREG, 17a-hydroxypregnenolone; OH-PROG, 17a-hydroxyprogesterone; DOC, deoxycorticosterone; CORTIC, corticosterone; 11-DCORT, 11-deoxycortisol; CORT, cortisol; ANDR, androstenedione; TESTO, testosterone; E1, estrone; E2, estrone.

2.5.2. Mahalanobis distance overview

Mahalanobis distances were calculated as previously described (Haggard et al., 2018). As demonstrated previously, there is a large degree of covariance in the residuals of the hormones measured in the HT-H295R assay. Unlike Euclidean distance, which cannot control for positive or negative covariance between multivariate measures, the Mahalanobis distance corrects for correlation and covariance in multi- variate data by including the inverse covariance matrix of all multi- variate measures in the calculation of the distance values. As such, the multivariate measures are rotated and scaled to make the variables uncorrelated and have the same variance. An example of the differences between multivariate Euclidean and Mahalanobis distance is demon- strated in Fig. 2. We employ the Mahalanobis distance as a measure of multivariate effect size of chemical treatment on the whole ster- oidogenesis pathway represented in the HT-H295R assay. This allows for the integration of all 11 hormone measures into a single value that represents the overall magnitude of effect of a specific concentration of a chemical in the HT-H295R assay. The Mahalanobis distance for a chemical at a given concentration is calculated as shown in Equation (1).

Fig. 2. Example of Mahalanobis distance. The difference between Euclidean distance and Mahalanobis distance using an example to two theoretical hormone measures with positive covariance. (A) Scatterplot for two theoretical hormone measures X and Y in Euclidean space. Grey points denote mean hormone measures for theoretical chemical samples. The error distribution, denoted by the dashed ellipse, shows the variance among measurements of hormone Y (standard deviation = 0.16) is greater than measurements of hormone X (standard deviation = 0.08), and these measures show positive covariance (correlation = 0.8). The points labeled conc 1, conc 2, and conc 3 represent the mean (natural logarithm) concentrations of three concentrations of an example test chemical. In terms of hormone concentrations, the response at conc 3 for hormone X is roughly twice as far from conc 1 as is the response at conc 2 (and is a larger number of standard deviations away from the mean of the distribution for hormone X); however, the Euclidean distances of conc 2 and conc 3 to conc 1 are the same. (B) Scatterplot of the same hormone measures as in A after Mahalanobis distance transformation. Mahalanobis distance adjusts for the covariance between measurements, which rotates and scales the data so that the data are uncorrelated and have the same variance (as denoted by the error ellipse now being a circle). After transformation, we see that conc 3 is now around four times as distant from conc 1 as is conc 2, better reflecting the difference of conc 3 from the overall distribution of measures in hormones X and Y compared to conc 2.

0.4

Measured Hormone Y (log uM)

0.2

conc 2 1

Rotated and Scaled Axis 2

conc

conc 3

0.0

conc

conc 3

conc 2

-2

-0.2

-4

-0.4

-0.2

0.0

0.2

0.4

Measured Hormone X (log uM)

-4

-2

Rotated and Scaled Axis 1

mMd = (Vc -1)” E (Vc - y1)/Nh (1)

Here, yc is the vector of natural log-transformed hormone con- centrations for a test chemical at the cth exposure concentration and y1 is the vector of natural log-transformed hormone concentrations for the matched DMSO control sample. _ is the covariance matrix which is estimated from fitting the natural log-transformed HT-H295R hormone concentration data for all chemicals and control samples to a multi- variate linear model, accounting for chemical and concentration effects. In this case, due to apparent block effects, a covariance matrix was estimated from the eight assay test date blocks as the unweighted average of the covariance matrices of these blocks. Nn is the number of hormones measured for the test chemical; this variable results in the computation of the mMd, which was done to allow for the best com- parability of Mahalanobis distance values between chemicals that have different numbers of hormone measurements, e.g. due to missing data. It is important to note that the same covariance matrix, E, is used to calculate the mMd for all chemical samples.

The maximum mMd (maxmMd) value for each chemical con- centration-response curve represents the overall magnitude of pertur- bation of the steroidogenesis pathway in HT-H295R for a chemical, which can be used to rank and prioritize chemicals for additional screening.

To determine the experimental significance of any given mMd value above background variability, we also calculated a critical value (Haggard et al., 2018) using the method developed by Nakamura and Imada (2005). This method is a multiple comparisons procedure, like the Dunnett’s procedure for univariate comparisons, and is based off the

similarity of the squared Mahalanobis distance to the Hotelling’s T2 statistic. The critical value is derived to approximate a nominal Type I error rate of 0.01.

2.5.3. Mahalanobis distance of different covariance matrices from original data

Prior to the data simulation study, we estimated additional sets of mMd values using covariance matrices derived from two data subsets of the HT-H295R data. The intent in doing this was to understand how the mMds might change using different covariance matrices based on the measured data themselves, thus serving as a check on the mMd values calculated using the covariance matrices derived via data simulation. “High” and “low” groups were defined by subsetting the data within assay date block into upper and lower halves based on maxmMd values calculated from the original HT-H295R dataset. For both the upper and lower groups, we estimated new covariance matrices following the same multivariate linear modeling approach as described above and recalculated the mMd values for all chemical samples using these covariance matrices.

2.5.4. Multivariate normal data simulation

2.5.4.1. Dataset generation. Data simulation was performed by sampling the HT-H295R data from a multivariate normal distribution (MVN) using the mvtnorm R package (https://cran.r-project.org/web/ packages/mvtnorm/index.html). Fig. 1 summarizes the workflow for the MVN data simulation (Fig. 1A) and generation of new mMds (Fig. 1B). Simulating hormone response measures of a chemical sample from an MVN distribution requires a vector of multivariate means (i.e. the mean hormone measures of a given concentration of chemical) and

a covariance matrix that should reflect the multivariate covariance of that chemical sample. In this case, each block was subset into a total of four groups (DMSO controls, low maxmMd chemical samples, medium maxmMd chemical samples, and high maxmMd chemical samples), and covariance matrices were estimated. This resulted in estimation of a total of 32 covariance matrices, i.e. one for each of the four groups for all eight experimental assay blocks. For each concentration of a test chemical or DMSO control, the average hormone response of the two technical replicates (following the same dimensions of the experimental data3) was calculated, and the covariance matrix from the set of 32 estimated covariance matrices that contains the chemical sample of interest was used as input to draw samples from a MVN distribution. The number of samples was equal to the number of technical replicates from the original mMd analysis so that each simulated dataset had the same dimensions as the original data. If the original hormone data for a test chemical was equal to or below the standardized lower limit of quantification (LLOQ) value, i.e. the LLOQ/V2 for a given hormone, it was set to that standardized value. 2000 simulated datasets were generated from the original HT-H295R data for subsequent analysis (Fig. 1A).

2.5.4.2. Derivation of covariance matrices from simulated data to examine stability of the mMd metric. To explore the possibility that covariances vary with treatment effect and may be different between highly responsive and non-responding chemicals (which would influence the calculation of the mMd), we generated three different covariance matrices to calculate new mMds (Fig. 1B). For each simulated dataset, we first estimated a pooled covariance matrix using the all simulated chemical data, termed the “global” covariance matrix, and recalculated the mMds using the same methodology as described above and in the original mMd analysis (Haggard et al., 2018). With these new mMds, we determined the maxmMd for the simulated chemical samples. The relative rank of chemicals by maxmMd was then used to subset the simulated data in half within assay date blocks into “high” maxmMd and “low” maxmMd chemical sample groups, i.e. strong and low responding chemical sets, respectively. We then estimated pooled covariance matrices using the high and low data subsets and calculated mMds using the high and low covariance matrices. To summarize, for each of the 2000 simulated datasets, three separate covariance matrices were generated (global, low, and high) and subsequently used to calculate new mMds for a total of three sets of mMds for each data simulation (for a total of 6000 sets of mMds). The output from the simulations, including the data, covariance matrices, mMds, and maxmMds can be found on USEPA FTP site (ftp://newftp.epa.gov/ COMPTOX/NCCT_Publication_Data/FriedmanPaul_K/CompTox- ToxCast_EDSPsteroidogenesis_prioritization).

2.5.5. Type I error rate computation

Using the simulated data described above, the rate of false positives, i.e. the Type I error, was also computed. This was done by generating a true null concentration response profile based off a randomly selected DMSO sample during the MVN sampling procedure. All 2000 samples generated from this null profile would be expected to be negative, with some rate of false positives due to sampling noise that inform an esti- mate of the true Type I error rate. For the mMd calculation, a threshold for a positive was defined by the critical value (described previously). For a chemical sample with 11 hormone measures and an N = 2 (N based on technical replicates), the critical value is 1.64 assuming a Type

I error rate with a = 0.01. This was the assumption of Type I error rate used in Haggard et al. (2018). If a Type I error rate with a = 0.05 is assumed, the critical value is 1.50 (when 11 hormone measures are available and N = 2). The frequency of positives in the null data si- mulation was defined as the number of null simulations that exceeded the critical value divided by the total number of null simulations (2000).

2.5.6. Power analyses

We generated ten theoretical hormone concentration response profiles with increasing effect sizes (1.1, 1.5, 2, and 2.5-fold increases compared to control) by amplifying the responses of the random DMSO sample used in the Type I error rate analysis above. These profiles were modeled after the following theoretical hormone changes: increases in all 11 measured hormones (all); increases in OH-PROG, 11-DCORT, CORT, ANDR, TESTO, E1, and E2 (some); increases in 11-DCORT and CORT (glucocorticoids); increases in DOC and CORTIC (miner- alocorticoids); increases in ANDR (androstenedione); increases in TESTO (testosterone); increases in ANDR and TESTO (androgens); in- creases in E1 (estrone); increases in E2 (estradiol); and increases in E1 and E2 (estrogens). The mMd values across the theoretical profiles were calculated using the global, low, and high covariance matrices as de- scribed above and the frequency of simulated mMd values above the critical value (assuming a Type I error rate with a = 0.01) across all the simulations was determined as the power to detect experimental sig- nificance.

2.5.7. Steroid hormone pattern analysis of putative aromatase inhibitors

Micromolar steroid hormone concentration-response profiles for all chemical samples were analyzed using an analysis of variance (ANOVA) with post hoc Dunnett’s procedure at an a = 0.05. Directionality of re- sponse was determined by the sign of the slope of the fitted data. Positive hit calls for hormone perturbation were determined following the same logic used in the OECD test guideline (TG) 456 method (OECD, 2011) and the interlaboratory validation report (Hecker et al., 2011), where a hit call was defined as a statistically significant effect at two consecutive concentrations of test chemical or at the highest con- centration tested. In this case, a chemical was coded as a hit in the positive direction (1), negative direction (-1), or a non-hit (0). A table of the hit calls can be found in Supplemental Data 2.

Since a putative aromatase inhibitor will theoretically decrease es- trogen concentrations in HT-H295R, hit call data were filtered to only include chemicals that were significant hits in the negative direction for both E2 and E1 with an efficacy threshold of - 1.5 fold change com- pared to plate-level DMSO controls. HT-H295R hormone profiles for Fig. 6 were visualized using the ComplexHeatmap package (Gu et al., 2016) using Ward’s method.

Further comparison was performed to relate these data to a putative aromatase inhibitor list identified by a related high-throughput screening initiative, Tox21 (Thomas et al., 2018). As part of the Tox21 program, a cell-based high-throughput screening for aromatase in- hibitors for the Tox21 10K chemical library (Chen et al., 2014), in combination with data filtering and target follow-up screening (Chen et al., 2015), was used to identify a list of 50 putative aromatase in- hibitors. Previously, a biochemical, cell-free assay from NovaScreen (NVS) (Sipes et al., 2013) was used by the ToxCast program in a tiered screen for 1882 chemicals in single concentration and 248 chemicals in multi-concentration response. Fourteen and 17 of the 50 putative ar- omatase inhibitors from Tox21 were included in the ToxCast NVS assay and HT-H295R assays, respectively. A simple comparison, using the HT- H295R hit call logic defined above and based on the OECD TG 456 method, is presented to understand if the HT-H295R screen can re- cognize putative aromatase inhibitors identified from previous screening efforts. As the mMd incorporates signal from other hormones in the H295R cell, a comparison to hit calls for estrogens seemed to be a more equivalent comparison.

3 No biological replicates were used in the data simulation because there were no biological replicates in the experimental dataset. In the experimental data, most chemicals had 1 biological replicate and 2 technical replicates, but ap- proximately 16% of chemicals were screened with 2-3 biological replicates and these data were used to evaluate assay result reproducibility in a previous analysis (Haggard et al., 2018).

2.5.8. Selectivity scoring and potency estimation of mMd benchmark dose values

Concentration-response curves of the mMds for each test chemical were fit to a four-parameter logistic curve and a benchmark dose (BMD), defined as the log10 concentration of a test chemical where the mMd value is equal to the critical value, was determined as previously described (Haggard et al., 2018). A minimum and maximum BMD value was set to - 4 on the log10 UM scale (0.0001 µM) and 3 on the log10 UM scale (1000 µM), respectively. A selectivity ranking metric was calcu- lated for each test chemical as defined in Equation (2).

selectivity = min(cytoburst, MTTacc) - BMD (2)

Where cytoburst is the lower bound (median AC50 - 3*baseline median absolute deviation of the 26 ToxCast cytotoxicity assays) of log10 AC50 values of the set of cytotoxicity assays used in the ToxCast high- throughput screening program (Supplemental Table 1), MTTacc is the concentration where the cell viability response was equal to the max- imum allowed cell viability of 70%, which was determined by fitting concentration-response data of the HT-H295R cell viability assay using tepl. This value represents the concentration where potential con- founding effects due to mitochondrial disruption may occur. The cy- toburst values were calculated using the tcplCytoPt function from the tcpl package (Filer et al., 2017; Judson et al., 2016). If a chemical was not considered a hit in at least two of the cytotoxicity assays, the cytoburst was set to 3. A test chemical with a selectivity score ≥0.5 log10 units were considered selective hits in the HT-H295R assay.

Potency and efficacy of chemical effects on the steroidogenesis pathway were compressed for relative prioritization by calculating the area under the curve (AUC) of the mMd concentration-response. AUC was calculated by integrating each curve using the model parameters from the four-parameter logistic curve fitting that was used to derive the BMD values for each chemical sample. The limits of integration were defined based on the lowest and highest exposure concentration for each test chemical. The calculated AUCs were used to rank chemical samples that passed the selectivity filtering described above.

3. Results

3.1. Stability of the mMd metric with different covariance matrices

To determine the stability of the mMd metric, the influence of the covariance matrix on the mMd calculation, and overall performance, we performed a data simulation study using MVN sampling of the original HT-H295R data. We generated a total of 2000 simulated HT- H295R datasets and recalculated mMds using three different covariance matrices. These covariance matrices were estimated using different subsets of the simulated data representing weak responding chemicals, all chemicals, and strong responding chemicals in the HT-H295R assay (termed low, global, and high respectively; Fig. 1). Comparison of the covariance structures of the three covariance matrices in the simula- tion, represented as the average covariance matrix of the three data subset groups, are summarized in Fig. 3. The pairwise correlation of the hormone covariances, as well as the variances, were similar across the three subsets; the maximum difference in the pairwise correlation of the hormone covariances was 0.05, with a range of 0-0.05. Running the same analysis using only the original HT-H295R data, partitioned in the same way as the simulated datasets, gave similar results (Supplemental Fig. 1).

For each data simulation, three sets of mMds were calculated, one set for each of the three covariance matrices. A scatterplot of mMd values for chemicals calculated from the HT-H295R data and the median mMd of the 2000 data simulations for the three covariance types showed that the median simulated mMds were very similar to the original, regardless of the covariance matrix type (low, global, or high) used in the mMd calculation (Fig. 4A). Linear regression by covariance matrix type showed that both the low and global covariance matrices

were the most accurate at estimating the original mMd with slopes equal to 0.98 and 0.97, respectively. In contrast, the high covariance matrix type estimated lower mMd values overall compared to the ori- ginal with a slope of 0.91. This lower slope translates to slightly lower mMd values calculated when using the high covariance matrix type, which was calculated based on the hormone effects of “high-re- sponding” chemicals only. This same general, but minor, trend is maintained when examining the mMd distributions of simulated hor- mone responses of prochloraz, butylparaben, and ethylene dimethane- sulfonate (Fig. 4B-D). In sum, the mMd values generated using 2000 data simulations and three distinct covariance matrix types were nearly identical, with only very slight differences observed for the mMd values generated using the high covariance type. The mMd-by-concentration boxplots for all chemical samples using the three covariance types can be found in Supplemental Fig. 2.

3.2. Specificity and power of the mMd metric

To determine a positive mMd response, we defined a critical value based on the method of Nakamura and Imada (2005). The critical value determines the minimum mMd needed for a multivariate response to be considered experimentally significant compared to the DMSO control response. However, the Type I error rate this critical value was eval- uated on (a = 0.01) is considered approximate. Through this data si- mulation, we determined the observed Type I error rate of the mMd metric by counting the number of times the calculated maxmMd value from “true negative” DMSO control data was larger than the defined critical value at either an approximate a of 0.05 or 0.01. As shown in Table 1, the true negative concentration-response elicited a Type I error rate slightly below the approximate value defined by the critical value regardless of the covariance matrix type used in the mMd calculation. In general, the low and global covariance matrix type were closer to the approximate Type I error rate compared to the high covariance type.

Along with generating a true negative response, we also simulated ten diverse hormone response profiles at four different effect sizes (1.1, 1.5, 2, and 2.5-fold change in hormone concentrations compared to control) to better define the power of the mMd method to detect varying experimentally significant responses. Hormone responses with effect sizes larger than 2-fold (interpreted bidirectionally, as an increase or decrease) had a power > 0.99 to detect a maxmMd above the 0.01 critical value for all tested response profiles (Fig. 5). For nearly all re- sponse profiles, except for an effect on both E2 and E1, a 1.5-fold effect size was equal to or greater than a power of 0.8. The power for most hormone response profiles was not affected by covariance type, how- ever, the high covariance type had the largest impact on the power to detect experimental significance for several response profiles including effects on both estrogens, both mineralocorticoids, E2 alone, E1 alone, and TESTO alone (Fig. 5).

3.3. Response of putative aromatase inhibitors in HT-H295R

Two approaches were taken to understanding the capability of HT- H295R assay data to identify putative aromatase inhibitors. The first approach involved naïve identification of chemicals that decreased E1 and E2, and examination to understand if these identified chemicals included recognized aromatase inhibitors. Note that chemicals that may have induced E1 and/or E2 are excluded from this analysis and visua- lization in order to isolate putative aromatase inhibitors for this ex- ample. A second approach relied on integrating information from pre- vious HTS assay data to identify putative aromatase inhibitors to characterize whether the HT-H295R assay identified experimentally- defined aromatase inhibitors as positives.

Based on significant decreases in E1 and E2 concentrations, using a 1.5-fold efficacy cutoff, we identified a set of 72 chemical samples (out of 766 samples total) that we considered putative aromatase inhibitors in HT-H295R. Fold change in hormone concentrations for the 72

Fig. 3. Hormone variance and pairwise corre- lation of covariances.
			Correlation								TESTO
											0.012 0.012
	-1.0	-0.5	0.0		0.5		1.0			ANDR	0.014
										0.007	0.67
										0.007	0.66
										0.008	0.67
									OH-PROG
									0.007	0.64	0.69
									0.006	0.64	0.67
									0.007	0.64	0.65
								CORT				face) and are clustered using Ward's method. (For
								0.012	0.56	0.54	0.61
								0.012	0.53	0.56	0.61
							11-DCORT	0.014	0.53	0.57	0.62
							0.004	0.69	0.69	0.67	0.67
							0.004	0.69	0.67	0.68	0.66
						E1	0.005	0.71	0.66	0.68	0.66
						0.017	0.55	0.46	0.31	0.45	0.45
						0.016	0.57	0.49	0.33	0.46	0.46
				E2		0.019	0.57	0.49	0.3	0.48	0.47
				0.019		0.76	0.55	0.5	0.38	0.47	0.49
				0.018		0.75	0.56	0.52	0.38	0.47	0.49
			OH-PREG	0.02		0.77	0.56	0.53	0.34	0.48	0.49
			0.013	0.22		0.25	0.39	0.29	0.32	0.3	0.27
			0.012	0.22		0.24	0.37	0.27	0.32	0.29	0.27
		CORTIC	0.012	0.23		0.24	0.39	0.29	0.32	0.3	0.28
		0.021	0.21	0.16		0.17	0.35	0.51	0.25	0.18	0.21
		0.019	0.2	0.17		0.18	0.33	0.47	0.24	0.19	0.2
	PROG	0.02	0.22	0.17		0.17	0.35	0.46	0.26	0.2	0.22
	0.014	0.19	0.21	-0.02		-0.02	0.19	0.09	0.25	0.15	0.07
	0.013	0.17	0.2	-0.01		-0.01	0.17	0.07	0.25	0.13	0.06
DOC	0.015	0.21	0.21	0		-0.02	0.16	0.08	0.27	0.12	0.05
0.01	0.4	0.29	0.23	-0.08		-0.04	0.18	0	0.11	-0.02	-0.09
0.01	0.36	0.27	0.2	-0.06		-0.02	0.18	-0.01	0.12	0	-0.09
0.011	0.4	0.28	0.21	-0.05	-0.03		0.19	0	0.15	0.02	-0.07

The pairwise correlation of the mean pooled cov- ariance matrix of the hormones for the low (ita- lics), global (boldface), and high (normal) covar- iance types are shown. The diagonal of the matrix indicates the mean variance for each measured hormone. The correlation matrix is colored (red to blue; - 1.0 to 1.0) based on the correlation coef- ficients of the global covariance matrices (bold-

interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

chemical samples were visualized as a heatmap (Fig. 6). In this case, chemical samples were hierarchically clustered using only the hor- mones directly affected by the aromatase enzyme: ANDR, TESTO, E2, and E1. This clustering identified four prominent clusters: cluster 1 has strong negative effects on both estrogens and androgens, cluster 2 with strong negative effects on the androgens with decreases in the estrogens at higher concentrations, cluster 3 with more moderate negative effects on androgens and estrogens, and cluster 4 with modest decreases in estrogens with small decreases in the androgens. Six of the 72 chemical samples that elicited a phenotype indicative of aromatase inhibition are known reference aromatase inhibitors: chrysin, metyrapone, ketoco- nazole, clotrimazole, letrozole, and fadrozole hydrochloride (Judson et al., 2018b). Two of these, fadrozole hydrochloride and letrozole, are pharmacological inhibitors of aromatase and were in the cluster with strong inhibitory effects on both estrogens and androgens. Ketocona- zole and clotrimazole are triazole fungicides, with known effects on aromatase as well as other steroidogenesis-relevant enzymes further upstream such as CYP17A1, and were in the cluster with strong in- hibitory effects on androgens and decreases in estrogens at high con- centrations (Ayub and Levell, 1987; Trosken et al., 2006). The last re- ference chemical, chrysin, is a plant-derived compound and was a member of the cluster with moderate effects on androgens and estro- gens (Kao et al., 1998). Expanding to the rest of the steroidogenesis pathway represented in HT-H295R, including progestogens and corti- costeroids, we observed a mixture of different response patterns for these chemical samples. For the two chemical clusters with strong ef- fects on estrogens and androgens, similar inhibitory pattern for OH- PROG, 11-DCORT, and CORT, particularly at high test concentrations, were apparent.

In a second approach, we characterized how available HTS methods for aromatase inhibition behaved in aggregate. Previous Tox21 screening and confirmatory assay work (Chen et al., 2015) suggested a

list of 50 putative aromatase inhibitors, of which only a small subset (17) were screened in the HT-H295R assay and the ToxCast NVS ar- omatase assay (NVS_ADME_hCYP19A1). As shown in Table 2, both HT- H295R and NVS assays identified the putative aromatase inhibitors. In the HT-H295R assay, 12 of 17 screened putative aromatase inhibitors were positive for ≥1.5-fold decreases E2 and E1, one was positive for a ≥1.5-fold decrease in E2 only (imazalil), one chemical was positive for a ≥1.5-fold decrease in E1 only (4,4’-oxydianiline), one chemical had ≥ 1.5-fold increases in E2 and E1 (cyproconazole) and two che- micals did not have ≥ 1.5-fold increases or decreases in E2 and E1 (propiconazole and tetraconazole), though they were mMd positives, with maxmMds of 6.83 and 3.51, respectively. Of the 17 putative ar- omatase inhibitors in NVS and HT-H295R, 12 were positive in NVS. Thus, we may have increased confidence for chemicals with E2 and/or E1 decreases and a NVS_ADME_hCYP19A1 positive.

3.4. Rank prioritization of steroidogenesis disruptors using the mMd metric and ToxCast viability assays

To determine which chemical samples had the most selective ac- tivity within the HT-H295R assays based on the mMd metric, and not due to non-specific bioactivity from mitochondrial toxicity or cyto- toxicity, we compared benchmark dose (BMD) values calculated from the mMd concentration-response curves to the parallel HT-H295R cell viability assay as well as ToxCast/Tox21 cytotoxicity assays. A se- lectivity scoring metric was designed to prioritize chemical samples for which the BMD values were less than the minimum of the cytotoxicity burst value as defined by Judson et al. (2016) or the concentration at the allowed cell viability threshold (70%) in the parallel MTT assay. Chemical samples with a selectivity score ≥0.5 log10 units, i.e. the BMD was 0.5 log10 UM less than the cytotoxicity or MTT values, were con- sidered selective for the HT-H295R assay. A total of 428 of the 709

Fig. 4. Stability of mMd values calculated with different covariance matrices. For each data simulation, three sets of mMds were calculated, one set for each of the three covariance types. (A) Scatterplot to compare the original mMd values calculated in Haggard et al. (2018) to the simulated mMd values calculated for each covariance type. Closed circles represent the median and the bars represent the 95% confidence interval of the mMd values for each covariance type. Dashed line indicates the identity line. (B) Boxplots of mMd concentration response values after prochloraz exposure in HT-H295R across the 2000 simulation and each covariance type. (C) Boxplots of mMd responses after butylparaben exposure in HT-H295R across the 2000 simulations and each covariance type. (D) Boxplots of mMd concentration response values after ethylene dimethanesulfonate exposure in HT-H295R across the 2000 simulation and each covariance type. For all plots, red, green, and blue correspond to the low, global, and high covariance type, respectively. Filled triangles represent mMd values estimated using the original dataset using the three covariance types. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Covariance Type

lower

global

upper

Median Simulated mMd

Original mMd

mMd

0.0041

0.012

0.037

0.11

0.33

0.041

0.12

0.37

1.1

3.3

0.41

1.2

3.7

100

Concentration (uM)

Table 1 mMd Type I error rate of true negative response in HT-H295R.
Covariance Type
Error Rate	Low	Global	High
0.05	0.015	0.019	0.012
0.01	0.007	0.008	0.006

chemical samples that were considered a maxmMd positive (maxmMd ≥ critical value at a = 0.01) and with sufficient ToxCast assay coverage to calculate the cytotoxicity burst value (Judson et al., 2016), were considered selective actives in HT-H295R. Since the BMD value is based on the calculated critical mMd value, there are some

instances where all mMd values for a test chemical are above the cri- tical value but have a small maxmMd. In this case, a test chemical would be considered an active having high selectivity, but low efficacy. To account for this possible discrepancy, we also estimated the AUC for the mMd concentration-response curves as a measure of efficacy. In general, we found that the maxmMd was more correlated with the AUC compared to the selectivity score (Supplemental Table 1; Supplemental Fig. 3). Chemical samples that passed the selectivity filter were then ranked according to the AUC (Supplemental Fig. 4). The 25 most and least efficacious chemical samples that were also selective are shown in Fig. 7. The chemical samples ranked the highest using this hybrid ap- proach included hormones, pharmaceuticals, insecticides, among others.

Fig. 5. Power analysis. As part of the MVN simulation, we generated ten different theoretical hormone response profiles at varying effects sizes (1.1, 1.5, 2, and 2.5-fold increases compared to a DMSO control sample), and calculated mMds for these profiles using the three covariance types. The percentage of maxmMd values above the pre-defined critical value at a = 0.01 (i.e. maxmMd ≥1.64) was considered the power to detect a maxmMd with experimental significance. Red, green, and blue correspond to the low, global, and high covariance type, respectively. Circles and triangles denote effect sizes of 1.1- and 1.5-fold, respectively. Note that effect sizes of 2.0- and 2.5-fold are not included in the figure due to these effect sizes having a power of ≥0.999 for all response profiles. Dashed line indicates a power of 0.8. Definitions for the response profile types are as follows: All: all measured hormones increased; Some: CORT, 11-DCORT, ANDR, TESTO, E1, and E2 increased; Glucocorticoids: 11- DCORT and CORT increased; Mineralocorticoids: DOC and CORTIC increased; Androstenedione: ANDR increased; Testosterone: TESTO increased; Androgens: ANDR and TESTO increased; Estrone: E1 increased; Estradiol: E2 increased; Estrogens: E1 and E2 increased. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

0.99

*2 and 2.5 fold changes were >0.999

0.75

Power

Covariance Type (Fold Change)

0.50

lower (1.1)

lower (1.5)

global (1.1)

global (1.5)

upper (1.1)

upper (1.5)

0.25

0.00

All

Some

Glucocorticoids

Mineralocorticoids

Androstenedione

Testosterone

Androgens

Estrone

Estradiol

Estrogens

4. Discussion

Developing screening strategies to rapidly and cost-effectively prioritize chemicals for effects on the endocrine system is an important challenge facing the global regulatory community, and high-throughput screening technologies are becoming a component of strategies to in- form and/or prioritize chemical safety assessment (ECHA, 2016; ECHA, 2017; Friedman et al., 2019; Health Canada, 2016; USEPA, 2019). Thus far, endocrine testing programs have focused on chemical bioactivity at the level of the nuclear receptor, particularly for the estrogen (Browne et al., 2015; Judson et al., 2015) and androgen receptors (Browne et al., 2015; Kleinstreuer et al., 2017; Rotroff et al., 2013). In addition, the EDSP incorporates in vitro and in vivo steroidogenesis assays in its tiered screening strategy, including the USEPA H295R guideline assay (USEPA, 2009; USEPA, 2017). These assays are low-throughput and the H295R guideline assay only examines chemical disruption of TESTO and E2, which lacks additional hormones in the steroidogenesis pathway that may serve to bolster interpretation of changes in E2 and TESTO in H295R cells. The HT-H295R assay is a demonstration of an informative high-throughput method to rapidly screen large numbers of chemicals for effects on the steroidogenesis pathway, providing a readout of 11 hormones and intermediates that covers the four major classes of hormones produced in humans (Karmaus et al., 2016). A multivariate statistical approach utilizing the Mahalanobis distance was performed on HT-H295R data in order to enable a data-driven statis- tical metric for chemical prioritization (Haggard et al., 2018). Herein, we furthered this novel approach by addressing key questions regarding

the stability and reproducibility of the mMd values, the sensitivity of the approach, the ability of the HT-H295R assay to detect aromatase inhibitors as a class of steroidogenesis disruptor, and a proof-of-concept on how to use the mMd metric with additional contextual information for prioritization of chemicals. We examined the robustness of the mMd approach via characterization of the effects on mMd of different cov- ariance and correlation relationships among hormones. To better un- derstand the false positive rate and sensitivity of this assay to fold- changes in hormones, the Type I error rate and power of the mMd method to determine experimental significance were evaluated. Fur- thermore, we examined whether the HT-H295R assay’s inclusion of hormones beyond TESTO and E2 allowed for the identification of spe- cific response profiles indicative of aromatase inhibition. Lastly, we incorporated additional cytotoxicity measures, and combined efficacy and potency estimates for the mMd analysis, to better prioritize bioactivity of chemicals in HT-H295R.

To increase confidence that the mMd metric is useful for screening level assessment, we explored some key statistical assumptions under- lying the method using a large data simulation; namely, we needed to understand how reproducible the mMd values are, what the power of the mMd approach is to detect fold-changes in hormones (increases and/or decreases), and what the observed Type I error rate (false po- sitive rate) might be. One of the main assumptions for using the Mahalanobis distance to calculate multivariate distances is the presence of substantial correlation or covariance among the multivariate mea- sures. As described earlier, the 656 chemicals tested in multi-con- centration largely demonstrated significant activity for three or more

Fig. 6. Heatmap of hormone response profiles. Putative aromatase inhibitors were identified after filtering HT-H295R response profiles for chemical samples with statistically significant decreases in E1 and E2 concentrations compared to plate-level DMSO controls (as measured by one-way ANOVA) with a fold decrease of ≥1.5. Rows denote chemical samples and columns are hormones. Rows were clustered using Ward's method only considering ANDR, TESTO, E1, and E2, and four prominent clusters were identified and are labeled. The aromatase relevant hormones, specifically ANDR, TESTO, E1, and E2, are presented to the left whereas the other hormones are presented to the right. The log10- maxmMd value is also denoted with an annotation bar to the right to indicate effect size for the 11 hormones. Reference chemicals (chrysin, ketoconazole, clo- trimazole, metyrapone, letrozole, and fadrozole hydrochloride) known to inhibit aromatase are specifically labeled to the right (Judson et al., 2018b). Grey denotes missing data due to sample loss during screening. Cluster numbers 1-4 referenced in the text are labeled.

Aromatase-relevant

Rest of Pathway

- Letrozole

Cluster 1

- Fadrozole hydrochloride

Cluster 2

- Ketoconazole

-Clotrimazole

- Metyrapone

log2 fold-change

2 1

Cluster 3

log10 maxmMd 1.5

0.5

Chrysin

Cluster 4

[

ANDR

TESTO

PROG

OHPREG

OHPROG

DOC

CORTIC

11DCORT

CORT

Table 2 Putative Aromatase Inhibitors from Tox21/ToxCast HTS and HT-H295R Assays. Available HT-H295R and NVS hCYP19A1 assay data are summarized for seventeen aromatase positives from Tox21 screening.
Name	Cas No.	maxmMd	HT-H295R E1ª	HT-H295R E2ª	HT-H295R Fold Changeb	HT-H295R Aromatase Positivec	NVS Aromatase Positived	Tox21 Aromatase Positivee
4,4'-Oxydianiline	101-80-4	18.35	↓	–	Yes	Yes	No	Yes
Clotrimazole	23593-75-1	18.35	↓	↓	Yes	Yes	No	Yes
Cyproconazole	94361-06-5	8.58	↑	↑	Yes	No	Yes	Yes
Epoxiconazole	133855-98-8	14.68	↓	↓	Yes	Yes	Yes	Yes
Fadrozole hydrochloride	102676-31-3	16.90	↓	↓	Yes	Yes	Yes	Yes
Fenarimol	60168-88-9	16.70	↓	↓	Yes	Yes	No	Yes
Flusilazole	85509-19-9	9.27	↓	↓	Yes	Yes	Yes	Yes
Imazalil	35554-44-0	22.62	↑	↓	Yes	Yes	Yes	Yes
Letrozole	112809-51-5	14.02	↓	↓	Yes	Yes	Yes	Yes
Myclobutanil	88671-89-0	16.43	↓	↓	Yes	Yes	Yes	Yes
Prochloraz	67747-09-5	18.56	↓	↓	Yes	Yes	Yes	Yes
Propiconazole	60207-90-1	5.14	–	–	No	No	Yes	Yes
SSR150106	NOCAS_47362	9.06	↓	↓	Yes	Yes	No	Yes
Tetraconazole	112281-77-3	3.41	↓	–	No	No	Yes	Yes
Triadimenol	55219-65-3	17.25	↓	↓	Yes	Yes	Yes	Yes
Triflumizole	68694-11-1	17.19	↓	↓	Yes	Yes	Yes	Yes
Vatalanib dihydrochloride	212141-51-0	8.34	↓	↓	Yes	Yes	No	Yes

a Arrows for E1 (estrone) and E2 (estradiol) indicate direction of effect on the hormone in the HT-H295R system based on a minimum 1.5-fold change (see Methods). A ”-” indicates no change.

b HT-H295R Fold-Change indicates that the chemical was positive (Yes) or negative (No) for meeting a statistically significant fold-change criteria (> 1.5-fold change; ANOVA p-value ≤ 0.05).

” HT-H295R Aromatase Positive indicates that the chemical would have been flagged for effects on estrogens consistent with an aromatase inhibitor on the basis of the HT-H295R data (i.e., E1 and/or E2 significantly decreased).

d NVS aromatase positives were pulled from the level 5 information for the assay in invitrodb (where positive is equivalent to hitcall of 1).

e All of the chemicals in this table were Tox21 Aromatase positives from Chen et al. (2015).

4-Androstene-3,17-dione

Estrone

Corticosterone

Mifepristone

17beta-Estradiol-

17alpha-Estradiol -

4,4’-Sulfonylbis[2-(prop-2-en-1-yl)phenol] -

Dehydroepiandrosterone

Testosterone propionate

Estrone

Equilin

4-(2-Phenylpropan-2-yl)-N-[4-(2-phenylpropan-2-yl)phenyl]aniline -

2,2-Bis(4-hydroxyphenyl)-1,1,1-trichloroethane

Pitavastatin calcium

Cerivastatin sodium

5alpha-Dihydrotestosterone

Daidzein

17alpha-Hydroxyprogesterone

4,4’-Methylenebis(2-methylaniline)

17alpha-Estradiol -

17beta-Estradiol -

Parathion

Melatonin

Danazol

Model Type

Methyl parathion

Methylbenzethonium chloride

BMD

Pirimicarb

MTTacc

Tetraconazole

cytoburst

Endrin

Propham

Flumioxazin

PharmaGSID48514-

Chloroneb

1-Bromo-3-chloro-5,5-dimethylhydantoin

3-Methylaniline

N,O-bis(Trimethylsilyl)acetamide

Bromacil

Tri-o-cresyl phosphate

Chloridazon

Metribuzin

Tralopyril

Octyl decyl phthalate

(2Z)-3,7-Dimethylocta-2,6-dien-1-ol-

(4Z)-4-Decenal-

Dicyclohexylcarbodiimide

Imidacloprid -

Hydroxyprogesterone caproate

Pirinixic acid-

Chlorfenapyr

5HPP-33

-4

-3

-2

-1

log10 Concentration (uM)

Selectivity	AUC	maxmMd
4.33	146.30	53.43
3.14	145.36	34.73
4.75	118.51	29.91
5.01	117.05	34.71
4.77	114.89	30.61
4.80	114.72	31.81
4.99	112.35	36.61
2.16	107.75	40.53
3.71	107.01	35.88
2.36	105.61	30.56
1.54	101.59	44.78
2.89	99.87	21.66
3.24	94.88	25.88
2.02	93.59	21.14
1.53	88.81	20.92
3.37	85.44	30.22
1.17	84.51	30.13
2.34	82.46	38.19
2.18	80.61	27.48
4.49	75.63	25.68
3.98	71.00	25.29
2.40	68.77	29.00
3.59	68.63	24.19
2.20	68.47	23.11
2.37	68.40	24.66
1.23	7.84	4.85
1.00	7.84	2.85
1.62	7.81	3.41
0.62	7.78	4.48
0.92	7.69	3.64
0.51	7.66	3.13
4.58	7.56	1.83
0.89	7.44	3.16
0.82	7.16	2.64
0.80	7.06	3.20
0.72	6.91	2.14
0.74	6.84	2.88
0.93	6.74	2.79
0.68	6.68	2.21
1.78	6.65	2.71
1.31	6.59	5.10
0.91	6.45	4.82
0.64	6.44	2.94
0.57	6.40	1.94
0.62	6.38	2.32
0.76	6.31	1.89
1.16	5.89	4.28
1.16	5.71	3.05
0.86	1.73	1.66
0.71	1.72	2.03

Fig. 7. Selectivity, area under the curve, and maxmMd as context for prioritization.

The most (top 25) and least (bottom 25) efficacious selective chemicals are illustrated in Fig. 7 (for all, see Supplemental Fig. 5 and Supplemental Table 1). The BMD is the potency at the critical value for the mMd curve (red); the MTTace is the potency at the threshold for a positive response in that assay (green); and the cytoburst is the potency value for the lower bound of a cytotoxicity distribution (blue). Selectivity (minimum of cytoburst or MTTace minus the BMD), AUC (area under the curve for mMd versus concentration), and maxmMd are shown to the right. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

hormones from the original single concentration HT-H295R chemical screen. Since this dataset was biased towards steroidogenesis positive chemicals due to the use of a tiered screening approach, a hypothesis to address was: did the disproportionately high number of positive che- micals in the dataset influence approximations of the covariance matrix used, which heavily influenced the mMd obtained? If we had screened a different chemical set, would the covariance matrix approximation have been different? Data simulation allowed us to explore the effects of different datasets on the covariance between hormones, and further how this may affect the mMd values. In short, this helped inform conclusions regarding the stability of the original mMd values from Haggard et al. (2018). As demonstrated in Fig. 3 and Supplemental Fig. 1 for the original HT-H295R data, the pairwise correlation of the hormone covariances, as well as the variances, were largely similar across the low (with a covariance matrix derived from “low”-re- sponding actives), global (with a covariance matrix derived from all chemical samples), and high (with a covariance matrix derived from “high”-responding actives) covariance types. This suggests the under- lying covariance of the hormone measures in the HT-H295R assay are

not heavily influenced by the chemical effects observed in screening; rather, the covariance between hormones may reflect some funda- mental biological aspect of the system of hormone synthesis in H295R cells. This similarity in covariance between hormone measures for the three data subsets of the original and simulated HT-H295R datasets was reflected in the calculated mMd values in which the median values, even in the presence of sampling noise, were still very good estimates of the original mMd values (Fig. 4A). We did observe a minor overall decrease in mMd values when using the high covariance type. As the Mahalanobis distance tends to diminish the significance of changes in measures with very large covariances, it is possible that the high cov- ariance type, despite having a similar correlation in the covariance of hormone measures, has an overall larger magnitude in covariances compared to the low and global covariance type. The diagonal of the correlation matrix shown in Fig. 3 indicates the mean variance of the 11 hormone measures across the simulations. We did observe that the high covariance type had slightly larger variances for E2, E1, 11-DCORT, CORT, ANDR, TESTO, DOC, and PROG (Fig. 3); and although these differences appear small, they could contribute to the slightly decreased

mMd values we observed. Given that the covariance (and variance) observed is stable during data simulation experiments, we conclude that the Mahalanobis distance-based methods that rely on the covar- iance matrix appear to be appropriate.

The definition of a critical value as a threshold signifying experi- mental significance is important for the mMd as a prioritization metric; often, users of this HT-H295R mMd model will want to understand in binary whether a chemical was positive or negative, in addition to the mMd values obtained. Here, we utilized a multivariate multiple com- parisons procedure first defined by Nakamura and Imada (2005) to establish the minimum mMd value to be considered statistically sig- nificant compared to the control response. To calculate this threshold, the variance-covariance matrix is assumed to be known and the sample size is set to the median sample size across all concentrations of a test chemical. Due to these requirements, the nominal Type I error rate is considered an approximation. As part of the data simulation study, we sought to determine how well the critical value approximated the Type I error rate at a = 0.05 and 0.01 by using a true null HT-H295R con- centration-response profile. As shown in Table 1, across all covariance types and a levels, the observed Type I error rates were similar. As we found that the approximate Type I error rate (a = 0.01) was slightly higher than the observed false positive rate in the data simulation, this indicated that the use of a 1% error rate was reasonable for establishing the critical value that sets a threshold for positive mMd values. Thus far, most guideline and experimental studies using H295R cells to measure chemical effects on hormone production rely on parametric and non- parametric analyses of variance to identify significant differences compared to control, many incorporating some form of efficacy threshold for activity (Hecker et al., 2011; Higley et al., 2010; Ohlsson et al., 2009; Strajhar et al., 2017; Tremoen et al., 2014). Having es- tablished that the Type I error rate used in the calculation of the critical value is a good approximation, it is important to further define the level of change in hormone concentrations required to detect experimental significance in HT-H295R using the mMd approach.

As the mMd reduces all hormone measures into a single value, it is necessary to examine a diverse set of possible hormone response pat- terns to determine whether the mMd metric is more responsive to certain hormone changes over others. We found that, across nine the- oretical combinations (all, some, glucocorticoids, mineralocorticoids) or single hormones (ANDR, TESTO, E2, E1), there was sufficient power (power > 0.8) to detect a 1.5-fold change (increase or decrease) in hormone(s) (Fig. 5). As the mMd approach adjusts for covariance, it is not surprising that some hormone responses have decreased power compared to others. The only combination with reduced power (~0.6) to detect a 1.5-fold change was for E1 and E2 combination. In general, the mMd calculation decreases the significance of changes in hormones that have large correlation in covariance. E2 and E1 had the largest correlation in covariance compared to all other measures; therefore, a larger fold-change is required to result in a significant mMd for these two hormones. However, a 2-fold change in any hormone, alone or in combination, demonstrated a power that approached 1. The OECD inter-laboratory validation of the low-throughput H295R assay sug- gested that based on inter-laboratory variability, a threshold for posi- tive responses for TESTO and E2 likely approached 1.5-fold and 2-fold, respectively (Hecker et al., 2011). As a result, the power of the HT- H295R mMd analysis is similar to the low-throughput version of the assay with only TESTO and E2 measures. An important conclusion is that the mMd approach for the HT-H295R dataset appears to have a reasonable false positive rate approximated at 1% and sufficient power to observe positive mMd responses when there are 1.5- to 2-fold changes.

The data generated using HT-H295R represents the largest chemical screen to date using H295R cells. This offers a unique opportunity to determine whether specific classes of steroidogenesis disruptors are detected in this model system. One widely studied mechanism of ster- oidogenesis disruption, which was incorporated into the EDSP Tier 1

screening strategy (USEPA, 2016), is inhibition of aromatase (CYP19A1) activity. Due to its regulatory and therapeutic relevance and because the synthesis of estrogens in HT-H295R are reliant on ar- omatase activity, we chose to examine the ability of the HT-H295R assay to identify putative aromatase inhibitors as a case study of how data from this assay can be used to classify chemicals with specific steroidogenic bioactivity. Reduction of the 11 hormone measures into a singular value using the mMd method appears to be a good ranking metric and indicator of pathway-level effects but does not directly in- dicate mechanism or hormone class affected (though this may be in- spected in the dataset). Inhibition of aromatase is a mechanism of ac- tion of interest, with two of the five in vitro assays in the EDSP Tier 1 screening battery targeting this enzyme, one of which is the low- throughput H295R guideline assay (USEPA, 2016). Within HT-H295R, aromatase inhibition should theoretically cause decreases in E2 and E1 concentrations and increases in ANDR and TESTO concentrations, with unknown effects on the remainder of the measured hormones. Filtering the HT-H295R dataset for this steroid response pattern only identified two chemical samples; therefore, we relaxed our search to only sig- nificant decreases in E2 and/or E1. We found that 72 of the 766 che- mical samples matched this response pattern. Contrary to what was expected, clustering the response profiles of these chemicals showed a prominent response of decreases in both the measured estrogens and androgens and appeared to group chemicals based on increasing po- tency and efficacy of disruption in these hormones; i.e., most potent disruptors in cluster 1 and the least potent disruptors in cluster 4 (Fig. 6). Another study of the activity of aromatase inhibitors, including letrozole and ketoconazole, observed similar changes in both E2 and T concentrations in H295R cells (Higley et al., 2010). Like what was observed for Higley et al. (2010), except at their maximum tested concentration of 100 uM, letrozole did not have a strong effect on an- drogen levels. This contrasts with the other pharmacological aromatase inhibitor, fadrozole hydrochloride, as well as the remaining chemical samples within cluster 1 of Fig. 6, which resulted in significant de- creases in both the estrogens and androgens in our study. It may be that an expanded concentration range is needed to better define the con- centration-dependent bioactivity of letrozole on estrogen and androgen levels in HT-H295R, since significant bioactivity on E1 and E2 was observed at all exposure concentrations. The two conazole reference chemicals, ketoconazole and clotrimazole, are known to inhibit ar- omatase as well as other steroidogenic enzymes, including CYP17A1 (Ayub and Levell, 1987; Trosken et al., 2006). Although we highlight known reference aromatase inhibitors in Fig. 6, it should be noted that there were many other conazoles including myclobutanil, epox- iconazole, prochloraz, triadimenol, triadimefon, and tebuconazole as well as other drugs such as danazol and lynestrenol that were members of clusters 1 and 2. This qualitative analysis demonstrates that known and suspected aromatase inhibitors appear to cause decreases in both estrogens and androgens, with variable effects for the other hormones. The HT-H295R assay as implemented can identify aromatase inhibitors (i.e. maxmMd > critical value), but the myriad responses across the rest of the hormone measures for these chemicals suggests that proper mechanistic evaluation of chemical effects requires multiple timepoints to better model the complex and dynamic steroidogenesis pathway present in H295R cells (Breen et al., 2010, 2011; Saito et al., 2016). The lack of a standard pattern observed for putative aromatase inhibitors extends to all chemicals tested in HT-H295R. In total, 608 of the 766 unique chemical samples tested had a unique hormone response pat- terns as measured by ANOVA (Supplemental Data 2), providing support that specific mechanistic hypotheses using the HT-H295R assay would require additional information regarding the kinetics of steroidogenesis and incorporation of orthogonal assays.

The quantification of 11 hormones in the HT-H295R assay enables identification of chemicals with diverse bioactivity on steroidogenesis, more so than solely E2 and TESTO measures in the H295R guideline assays. Additionally, including E1 and ANDR added greater certainty to

conclusions of chemical effects on estrogen and androgen production, especially with the observation that these intermediates had strong, positive covariance with their terminal steroid products (Fig. 3; Supplemental Fig. 1). Indeed, there are many non-guideline studies examining chemical effects across the steroidogenesis pathway present in H295R (Ahmed et al., 2018; Nielsen et al., 2012; Strajhar et al., 2017). Overall, this case study examining putative aromatase inhibitors highlights how HT-H295R data and the mMd metric presented here and previously (Haggard et al., 2018) can assist in identifying and prior- itizing chemicals with specific bioactivities of interest such as disrup- tion in glucocorticoid synthesis via disruption of CYP21A2 or HSD3B1, for example.

Development of a ranked prioritization metric should account for non-specific bioactivity and factor in potency and efficacy estimates for the target of interest, and so we demonstrate an approach for using the mMd prioritization metric in this way. As mentioned previously, the HT-H295R assay should be sensitive to cytotoxicity and mitochondrial dysfunction, and when the assay was conducted, a parallel MTT assay was performed to limit the upper concentration considered in the HT- H295R assay. The MTT assay, an indicator of mitochondrial toxicity and cell viability, may be confounded by potential dye reduction in other cellular compartments (Meyer et al., 2018). However, in ToxCast there are many cytotoxicity assays in a variety of cell types that could confirm a cytotoxicity threshold, and so these data were considered herein. A current limitation for this work is a lack of highly specific mitochondrial toxicity assays in ToxCast/Tox21 to identify mitochon- drial toxicants at lower concentrations than cytotoxicity. The hybrid ranking approach used in this study incorporates the cytotoxicity “burst” estimates from ToxCast, the parallel HT-H295R cell viability assay (70%), and the BMD and AUC estimates from the mMd con- centration-response curves of the chemical samples. We found that se- lectivity scoring alone was not a sufficient ranking metric. This appears to be a result of lack of coverage in the ToxCast cytotoxicity assays for some chemicals as well as artificially low BMD estimates that occur when the lowest exposure concentration has a mMd value above the critical value but the chemical demonstrated limited efficacy. However, filtering chemical samples that had BMDs within the cytotoxicity ‘burst’ range using the selectivity score, and then ordering by the AUC esti- mates, appeared to efficiently rank active chemicals. The highest ranked chemicals included chemicals that were identical to hormone measures in HT-H295R (e.g. ANDR, E1, and CORTIC), estrogen analo- gues (i.e. 17a-estradiol and the equine estrogen, equilin), the phytoes- trogen daidzein, and drugs with reproductive health applications (i.e. danazol and mifepristone; Fig. 7). Two statin drugs, pitavastatin cal- cium and cerivastatin sodium, cholesterol synthesis inhibitors that block HMG-CoA reductase activity, and melatonin, which helps reg- ulate circulating hormone levels due to its roles in circadian rhythms (Qin et al., 2015; Wu et al., 2001), were also ranked highly. We did observe that two organophosphate pesticides, parathion and methyl parathion, were a part of this highly prioritized chemical set. Although there are some observations showing induction of aromatase activity (Laville et al., 2006) and anti-glucocorticoid receptor activity (Vrzal et al., 2015), these chemicals are a good example of how this prior- itization metric can identify candidates to consider for additional study. It is important to note that the AUC and maxmMd appeared to be po- sitively linearly correlated, in that a large maxmMd was predictive of a large AUC for a test chemical (Supplemental Fig. 2). This suggests the maxmMd alone is a good indicator of both potency and efficacy, and that combining maxmMd with information on potential non-specific activity from cytotoxicity is sufficient for prioritization.

Here, we demonstrated that the multivariate approach we pre- viously established using the mMd to combine the hormone measures into a single value to prioritize chemicals is robust and stable. Due to the similarities in mMds, Type I error rate, and power to detect ex- perimental significance for data with different covariance structures, it appears that the initial decisions for covariance estimation and

assumptions were appropriate. The use of HT-H295R data to identify specific mechanisms of action across the whole steroidogenesis pathway remains a challenge. Hormone profiling of putative aromatase inhibitors showed a possible unique profile with respect to the direct substrates and products of aromatase, but after expanding to the rest of the hormone measures there was no obvious signature of aromatase inhibition. Further work is needed to understand the complex dynamics of the steroidogenesis pathway in H295R cells, both in the presence and absence of toxicants, to associate hormone response patterns to specific mechanistic perturbations. Despite some limitations in mechanistic in- terpretation of this dataset, the mMd approach for the HT-H295R data is a very useful method to prioritize chemicals for additional con- sideration. Accounting for non-specificity due to cytotoxicity alongside potency and efficacy further strengthens the use of the method and presents a data-driven strategy to identify chemicals with the highest potential for disruption of steroidogenesis. Together, these analyses of the HT-H295R assay suggest that the mMd is a robust, reproducible metric that can be used for prioritization, and that examination of ef- fects on specific hormone classes, such as the estrogens, may provide a binary determinant for identification of putative hormone class dis- ruptors. The novel evaluation of the HT-H295R assay and mMd ap- proach further support its use in identification of in vitro endocrine activity, with potential to be incorporated into more comprehensive tiered strategies for screening level chemical safety assessment.

Disclaimer

The United States Environmental Protection Agency (U.S. EPA) through its Office of Research and Development has subjected this ar- ticle to Agency administrative review and approved it for publication. Mention of trade names or commercial products does not constitute endorsement for use. The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the US EPA.

Funding

No external funding was received for this work. We were all re- searchers at the US Environmental Protection Agency. D.E.H. was supported by appointment to the Research Participation Program of the U.S. Environmental Protection Agency, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. EPA.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influ- ence the work reported in this paper.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https:// doi.org/10.1016/j.yrtph.2019.104510.

References

Ahmed, K.E.M., et al., 2018. LC-MS/MS based profiling and dynamic modelling of the steroidogenesis pathway in adrenocarcinoma H295R cells. Toxicol. In Vitro 52, 332-341.

Ayub, M., Levell, M.J., 1987. Inhibition of testicular 17 alpha-hydroxylase and 17,20- lyase but not 3 beta-hydroxysteroid dehydrogenase-isomerase or 17 beta-hydro- xysteroid oxidoreductase by ketoconazole and other imidazole drugs. J. Steroid Biochem. 28, 521-531.

Botteri Principato, N.L., et al., 2018. The use of purified rat Leydig cells complements the H295R screen to detect chemical-induced alterations in testosterone production. Biol. Reprod. 98, 239-249.

Breen, M., et al., 2011. Mechanistic computational model of steroidogenesis in H295R cells: role of oxysterols and cell proliferation to improve predictability of biochemical response to endocrine active chemical-metyrapone. Toxicol. Sci. 123, 80-93.

Breen, M.S., et al., 2010. Computational model of steroidogenesis in human H295R cells to predict biochemical response to endocrine-active chemicals: model development for metyrapone. Environ. Health Perspect. 118, 265-272.

Browne, P., et al., 2015. Screening chemicals for estrogen receptor bioactivity using a computational model. Environ. Sci. Technol. 49, 8804-8814.

Casati, S., 2018. Integrated approaches to testing and assessment. Basic Clin. Pharmacol. Toxicol. 123 (Suppl. 5), 51-55.

Chen, S., et al., 2015. Cell-based high-throughput screening for aromatase inhibitors in the Tox21 10K library. Toxicol. Sci. 147, 446-457.

Chen, S., et al., 2014. AroER tri-screen is a biologically relevant assay for endocrine disrupting chemicals modulating the activity of aromatase and/or the estrogen re- ceptor. Toxicol. Sci. 139, 198-209.

De Maesschalck, R., et al., 2000. The Mahalanobis distance. Chemometr. Intell. Lab. Syst. 50, 1-18.

Dix, D.J., et al., 2007. The ToxCast program for prioritizing toxicity testing of environ- mental chemicals. Toxicol. Sci. 95, 5-12.

ECHA, 2016. New approach methodologies in regulatory science: proceedings of a sci- entific workshop. In: Agency, E. C ..

ECHA, 2017. Non-animal Approaches: Current Status of Regulatory Applicability under the REACH. CLP and Biocidal Products regulations.

Filer, D.L., et al., 2017. tepl: the ToxCast pipeline for high-throughput screening data. Bioinformatics 33, 618-620.

Friedman, K.P., et al., 2019. Utility of in vitro bioactivity as a lower bound estimate of in vivo adverse effect levels and in risk-based prioritization. Toxicol. Sci. https://doi. org/10.1093/toxsci/kfz201.

Gazdar, A.F., et al., 1990. Establishment and characterization of a human adrenocortical carcinoma cell line that expresses multiple pathways of steroid biosynthesis. Cancer Res. 50, 5488-5496.

Griesinger, C., et al., 2016. Validation of alternative in vitro methods to animal testing: concepts, challenges, processes and tools. Adv. Exp. Med. Biol. 856, 65-132.

Gu, Z., et al., 2016. Complex heatmaps reveal patterns and correlations in multi- dimensional genomic data. Bioinformatics 32, 2847-2849.

Haggard, D.E., et al., 2018. High-throughput H295R steroidogenesis assay: utility as an alternative and a statistical approach to characterize effects on steroidogenesis. Toxicol. Sci. 162, 509-534.

Health Canada, 2016. Integrating new approach methodologies within the CMP: identi- fying priorities for risk assessment, existing substances risk assessment program. In: Committee, C. M. P. C. S ..

Health Canada, 1999. Chemicals Management Plan Science Committee: Advancing Consideration of Endocrine-Disrupting Chemicals under the Canadian Environmental Protection Act, vol. 2019 2018.

Hecker, M., et al., 2011. The OECD validation program of the H295R steroidogenesis assay: phase 3. Final inter-laboratory validation study. Environ. Sci. Pollut. Res. Int. 18, 503-515.

Higley, E.B., et al., 2010. Assessment of chemical effects on aromatase activity using the H295R cell line. Environ. Sci. Pollut. Res. Int. 17, 1137-1148.

Judson, R., et al., 2016. Editor’s highlight: analysis of the effects of cell stress and cyto- toxicity on in vitro assay activity across a diverse chemical and assay space. Toxicol. Sci. 152, 323-339.

Judson, R., et al., 2009. The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685-695.

Judson, R.S., et al., 2017. On selecting a minimal set of in vitro assays to reliably de- termine estrogen agonist activity. Regul. Toxicol. Pharmacol. 91, 39-49.

Judson, R.S., et al., 2015. Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor. Toxicol. Sci. 148, 137-154.

Judson, R.S., et al., 2018a. New approach methods for testing chemicals for endocrine disruption potential. Curr. Opin. Toxicol. 9, 40-47.

Judson, R.S., et al., 2018b. Workflow for Defining Reference Chemicals for Assessing Performance of in Vitro Assays. ALTEX.

Kao, Y.C., et al., 1998. Molecular basis of the inhibition of human aromatase (estrogen synthetase) by flavone and isoflavone phytoestrogens: a site-directed mutagenesis study. Environ. Health Perspect. 106, 85-92.

Karmaus, A.L., et al., 2016. High-throughput screening of chemical effects on ster- oidogenesis using H295R human adrenocortical carcinoma cells. Toxicol. Sci. 150, 323-332.

Kavlock, R., et al., 2012. Update on EPA’s ToxCast program: providing high throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 25,

1287-1302.

Kleinstreuer, N.C., et al., 2017. Development and validation of a computational model for androgen receptor activity. Chem. Res. Toxicol. 30, 946-964.

Laville, N., et al., 2006. Modulation of aromatase activity and mRNA by various selected pesticides in the human choriocarcinoma JEG-3 cell line. Toxicology 228, 98-108.

Mangelis, A., et al., 2016. Computational analysis of liquid chromatography-tandem mass spectrometric steroid profiling in NCI H295R cells following angiotensin II, forskolin and abiraterone treatment. J. Steroid Biochem. Mol. Biol. 155, 67-75.

Meyer, J.N., et al., 2018. Mitochondrial toxicity. Toxicol. Sci. 162, 15-23.

Miller, W.L., 2017. Disorders in the initial steps of steroid hormone synthesis. J. Steroid Biochem. Mol. Biol. 165, 18-37.

Miller, W.L., Auchus, R.J., 2011. The molecular biology, biochemistry, and physiology of human steroidogenesis and its disorders. Endocr. Rev. 32, 81-151.

Nakamura, T., Imada, T., 2005. Multiple comparison procedure of Dunnett’s type for multivariate normal means. J. Jpn. Soc. Comput. Stat. 18, 21-32.

Nielsen, F.K., et al., 2012. H295R cells as a model for steroidogenic disruption: a broader perspective using simultaneous chemical analysis of 7 key steroid hormones. Toxicol. In Vitro 26, 343-350.

OECD, 2011. Test No. 456: H295R Steroidogenesis Assay.

Ohlsson, A., et al., 2009. A biphasic effect of the fungicide prochloraz on aldosterone, but not cortisol, secretion in human adrenal H295R cells-underlying mechanisms. Toxicol. Lett. 191, 174-180.

Patlewicz, G., et al., 2013. Use and validation of HT/HC assays to support 21st century toxicity evaluations. Regul. Toxicol. Pharmacol. 65, 259-268.

Pinto, C.L., et al., 2018. Identification of candidate reference chemicals for in vitro steroidogenesis assays. Toxicol. In Vitro 47, 103-119.

Qin, F., et al., 2015. Inhibitory effect of melatonin on testosterone synthesis is mediated via GATA-4/SF-1 transcription factors. Reprod. Biomed. Online 31, 638-646.

Richard, A.M., et al., 2016. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem. Res. Toxicol. 29, 1225-1251.

Rotroff, D.M., et al., 2013. Using in vitro high throughput screening assays to identify potential endocrine-disrupting chemicals. Environ. Health Perspect. 121, 7-14.

Rovida, C., et al., 2015. Integrated testing strategies (ITS) for safety assessment. ALTEX 32, 25-40.

Saito, R., et al., 2016. Estimation of the mechanism of adrenal action of endocrine-dis- rupting compounds using a computational model of adrenal steroidogenesis in NCI- H295R cells. J. Toxicol. 2016, 4041827.

Sipes, N.S., et al., 2013. Profiling 976 ToxCast chemicals across 331 enzymatic and re- ceptor signaling assays. Chem. Res. Toxicol. 26, 878-895.

Strajhar, P., et al., 2017. Steroid profiling in H295R cells to identify chemicals potentially disrupting the production of adrenal steroids. Toxicology 381, 51-63.

Thomas, R.S., et al., 2018. The US Federal Tox21 Program: a strategic and operational plan for continued leadership. ALTEX 35, 163-168.

Tremoen, N.H., et al., 2014. Exposure to the three structurally different PCB congeners (PCB 118, 153, and 126) results in decreased protein expression and altered ster- oidogenesis in the human adrenocortical carcinoma cell line H295R. J. Toxicol. Environ. Health 77, 516-534.

Trosken, E.R., et al., 2006. Inhibition of human CYP19 by azoles used as antifungal agents and aromatase inhibitors, using a new LC-MS/MS method for the analysis of estradiol product formation. Toxicology 219, 33-40.

USEPA, 2009. Endocrine Disruptor Screening Program Test Guidelines: OPPTS 890.1550 : Steroidogenesis (Human Cell Line - H295R). U.S. Environmental Protection Agency, Office of Chemical Safety and Pollution Prevention.

USEPA, 2011. Endocrine Disruptor Screening Program for the 21st Century: (EDSP21 Work Plan) the Incorporation of in Silico Models and in Vitro High Throughput Assays in the Endocrine Disruptor Screening Program (EDSP) for Prioritization and Screening. U.S. Environmental Protection Agency, Endocrine Disruptor Screening Program.

USEPA, 2016. Endocrine Disruptor Screening Program Tier 1 Battery of Assays, vol. 2018.

USEPA, 2017. Continuing Development of Alternative High Throughput Screens to Determine Endocrine Disruption, Focusing on Androgen Receptor, Steroidogenesis, and Thyroid Pathways. U.S. Environmental Protection Agency, Endocrine Disruptor Screening Program.

USEPA, 2018. Strategic plan to promote the development and implementation of alter- native test methods within the TSCA program. In: Prevention, O. O. C. S. a. P ..

USEPA, 2019. Directive to Prioritize Efforts to Reduce Animal Testing.

Vrzal, R., et al., 2015. Environmental pollutants parathion, paraquat and bisphenol A show distinct effects towards nuclear receptors-mediated induction of xenobiotics- metabolizing cytochromes P450 in human hepatocytes. Toxicol. Lett. 238, 43-53.

Wu, C.S., et al., 2001. Melatonin inhibits the expression of steroidogenic acute regulatory protein and steroidogenesis in MA-10 cells. J. Androl. 22, 245-254.

Quartz 4

Explorer

31676319

ARTICLE INFO

ABSTRACT

1. Introduction1

2. Materials and methods

2.1. HT-H295R assay information

2.2. Chemical library

2.3. HT-H295R assay

2.4. Software

2.5. Overview of computational approach

2.5.1. Source data

2.5.2. Mahalanobis distance overview

2.5.3. Mahalanobis distance of different covariance matrices from original data

2.5.4. Multivariate normal data simulation

2.5.5. Type I error rate computation

2.5.6. Power analyses

2.5.7. Steroid hormone pattern analysis of putative aromatase inhibitors

2.5.8. Selectivity scoring and potency estimation of mMd benchmark dose values

3. Results

3.1. Stability of the mMd metric with different covariance matrices

3.2. Specificity and power of the mMd metric

3.3. Response of putative aromatase inhibitors in HT-H295R

3.4. Rank prioritization of steroidogenesis disruptors using the mMd metric and ToxCast viability assays

4. Discussion

Disclaimer

Funding

Declaration of competing interest

Appendix A. Supplementary data

References

Graph View

Table of Contents

Backlinks