An International Ki67 Reproducibility Study in Adrenal Cortical Carcinoma

Thomas G. Papathomas, MD, ** Eugenio Pucci, MD, *¿ Thomas J. Giordano, MD, PhD,§ Hao Lu, PhD, | Eleonora Duregon, MD, [ Marco Volante, MD, PhD, | Mauro Papotti, MD, Ricardo V. Lloyd, MD, PhD,# Arthur S. Tischler, MD, ** Francien H. van Nederveen, MD, PhD,tt Vania Nose, MD, PhD, ¿¿ Lori Erickson, MD,§§ Ozgur Mete, MD, | | Sylvia L. Asa, MD, PhD, | | John Turchini, BMedSc, MBBS, || Anthony J. Gill, MD, FRCPA, 1|| Xavier Matias-Guiu, MD, PhD,## Kassiani Skordilis, MD, FRCPath, *** Timothy J. Stephenson, MD, FRCPath,ttt Frédérique Tissier, MD, PhD,###§§§ Richard A. Feelders, MD, PhD, || || | Marcel Smid, BSc, 111 Alex Nigg, BSc,* Esther Korpershoek, PhD,* Peter J. van der Spek, PhD, | Winand N.M. Dinjens, PhD,* Andrew P. Stubbs, PhD, | and Ronald R. de Krijger, MD, PhD *;

Abstract: Despite the established role of Ki67 labeling index in prognostic stratification of adrenocortical carcinomas and its recent integration into treatment flow charts, the reproducibility of the assessment method has not been determined. The aim of this study was to investigate interobserver variability among endocrine pathologists using a web-based virtual microscopy approach. Ki67-stained slides of 76 adrenocortical carcinomas were analyzed independently by 14 observers, each according to their method of preference including eyeballing, formal manual counting, and digital image analysis. The interobserver variation was statistically significant (P < 0.001) in the absence of any correlation between the various methods. Subsequently, 61 static images were distributed among 15 observers who were instructed to follow a category-based scoring approach. Low levels of in- terobserver (F = 6.99; Ferit = 1.70; P < 0.001) as well as intra- observer concordance (n = 11; Cohen k ranging from - 0.057 to 0.361) were detected. To improve harmonization of Ki67 anal-

ysis, we tested the utility of an open-source Galaxy virtual ma- chine application, namely Automated Selection of Hotspots, in 61 virtual slides. The software-provided Ki67 values were vali- dated by digital image analysis in identical images, displaying a strong correlation of 0.96 (P < 0.0001) and dividing the cases into 3 classes (cutoffs of 0%-15%-30% and/or 0%-10%-20%) with significantly different overall survivals (P < 0.05). We con- clude that current practices in Ki67 scoring assessment vary greatly, and interobserver variation sets particular limitations to its clinical utility, especially around clinically relevant cutoff values. Novel digital microscopy-enabled methods could provide critical aid in reducing variation, increasing reproducibility, and improving reliability in the clinical setting.

Key Words: Ki67 labeling index, proliferation, adrenal cortical carcinoma, interobserver variation, digital pathology (Am J Surg Pathol 2016;40:569-576)

From the Departments of *Pathology; | Bioinformatics; |1|Medical Oncology; | | | Internal Medicine, Division of Endocrinology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam; 11Laboratory for Pathology, PAL Dordrecht, Dordrecht; Department of Pathology, Reinier de Graaf Hospital, Delft; **** Department of Pathology, University Medical Center Utrecht, Princess Maxima Center for Pediatric Oncology, Utrecht, The Netherlands; +Department of Histopathology, King’s College Hospital, London; *** Department of Pathology, University Hospitals Birmingham, Birmingham; *** Department of Histopathology, Royal Hallamshire Hospital, Sheffield, UK; ¿ Department of Clinical and Molecular Medi- cine, Pathology Unit, Sant’ Andrea Hospital, Sapienza University, Rome; [Department of Oncology, University of Turin at San Luigi Hospital, Orbassano, Italy; §Department of Pathology, Department of Internal Medicine, University of Michigan Comprehensive Cancer Center, University of Michigan Health System, Ann Arbor, MI; Department of Pathology and Laboratory Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI; ** Department of Pathology and Laboratory Medicine, Tufts Medical Center, Tufts University School of Medicine; ¿¿ Department of Pathology, Massachusetts General Hospital, Boston, MA; §§Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN; || | Department of Pathology, University Health Network, University of Toronto, Toronto, ON, Canada; 11|Department of Anatomical Pathology, Royal North Shore Hospital and University of Sydney, Sydney, NSW, Australia; Department of Pathology and Molecular Genetics and Research Laboratory, Hospital Universitari Arnau de Vilanova, IRBLLEIDA, University of Lleida, Lleida, Spain; *¿¿ INSERM U1016 CNRS UMR8104, Institut Cochin, Paris Descartes University, Sorbonne Paris Cité; and §§§Department of Pathology, Pitié-Salpetrière Hospital, APHP, Pierre and Marie Curie University, Sorbonne Universities, Paris, France.

T.G.P. and E.P. contributed equally.

Conflicts of Interest and Source of Funding: Supported by the Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 259735 (ENS@T- Cancer). Further support was partially provided by grants from AIRC, Milan no. IG/14820/2013 (to M.P.). The authors have disclosed that they have no significant relationships with, or financial interest in, any commercial companies pertaining to this article.

Correspondence: Thomas G. Papathomas, MD, Department of Histopathology, King’s College Hospital, Denmark Hill, London, SE5 9RS, UK (e-mails: t.papathomas@erasmusmc.nl, thomaspapathomas@nhs.net).

Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Website, www.ajsp.com.

Copyright @ 2015 Wolters Kluwer Health, Inc. All rights reserved.

A drenocortical carcinoma (ACC) is a rare endocrine malignancy with a poor overall prognosis and an esti- mated incidence of 0.7 to 2 cases per million.1 When con- fronted with this tumor, pathologists are expected to provide the Weiss score, the status of resection margins, and prog- nosticators including the Weiss score, mitotic grade, and Ki67 labeling index (LI) and, if diagnostically challenging, confirm its adrenocortical origin on immunohistochemical grounds.2,3 It has been shown4 that ACCs can be subdivided using a variety of methods including the mitotic frequency into low grade (<20 mitoses/50 high-power fields) and high grade (> 20 mitoses/50 high-power fields),5 Stereoidogenic Factor-1 immunohistochemistry,6,7 and other proliferation- based scoring methods such as phosphohistone H3-specific immunohistochemistry.8

According to recent data generated by the European Network for the Study of Adrenal Tumors (ENS@T) ACC study group,9,10 the resection status and the Ki67 LI in both localized and advanced ACCs constitute the most relevant prognostic parameters.2 In accordance, Duregon et al8 demonstrated that Ki67 LI is the most powerful tool in terms of prognostic stratification. In addition to its emerging value as a critical determinant of prognosis, Ki67 LI has been recently integrated in treatment flow charts for adre- nocortical cancer patients suffering from tumors either amenable to radical resection or at advanced presentation. Accordingly, thresholds of 10%, 20%, and 30% seem to be crucial in therapeutic decisions, including adjuvant mitotane, radiotherapy of the tumor bed, and combination therapy of mitotane and 3 cycles of cisplatin, respectively.1,2

The standardized assessment of Ki67 LI is im- portant and remains a key issue and responsibility of histopathologists. Nevertheless, various factors, such as preanalytical, analytical, interpretation, scoring, and data analysis, might affect the Ki67 LI.11 In particular, lack of uniformity and consistency in quantification12 as well as intratumoral heterogeneity of proliferation5,11,13,14 might limit its assessment. In this context, we have implemented an open-source toolset, namely Automated Selection of Hotspots (ASH) aiming at improved accuracy and re- producibility of reporting of the Ki67 LI.15

In the present study, we determined the interobserver variability for Ki67 LI and examined the current practices among expert endocrine pathologists in a multicenter cohort of conventional ACCs using virtual microscopy. The impact of various parameters, that is, readout technique of prefer- ence in diagnostics, selected fields for evaluation, and esti- mated total number of cells on Ki67 assessment was further investigated. Moreover, we evaluated the variability of Ki67 LI around clinically relevant cutoffs1,2 and validated the ef- ficiency of ASH as compared with the human independent selection of hotspot areas.

MATERIALS AND METHODS

Case Selection and Ki67 (MIB1) Immunohistochemistry

A total of 101 conventional ACCs were collected from 4 specialized centers from Europe and United

States: (1) San Luigi Gonzaga Hospital and University of Turin, Turin, Italy (25 samples), (2) Erasmus MC Cancer Institute, Rotterdam, the Netherlands (12 samples), (3) University of Wisconsin School of Medicine and Public Health (5 samples), and (4) University of Michigan Health System (59 samples). Borderline/atypical adrenocortical neoplasms and ACC variants (oncocytic, myxoid, and sarcomatoid) were not included in the present study. Each case was thoroughly reviewed and representative unstained glass slide(s) were selected and provided for immunohistochemical analysis within a single center (Department of Pathology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands) with the following protocol. Slides and formalin-fixed paraffin-embedded whole-tissue sections of 4 um thickness were stained with a commer- cially available antibody: mouse monoclonal MIB1 M7240 antibody (1:400 dilution; Dako, Glostrup, Denmark) against Ki67 on an automatic Ventana Benchmark Ultra System (Ventana Medical Systems Inc., Tucson, AZ) using Ultraview DAB detection system pre- ceded by heat-induced epitope retrieval with Ventana Cell Conditioning 1 (pH 8.4) at 97℃ for 52 minutes. Dia- minobenzidine was used as the chromogen. All cases were assessed anonymously according to the proper secondary use of Human Tissue code established by the Dutch Federation of Medical Scientific Societies (http://www. federa.org). The Medical Ethical Committee of the Eras- mus MC approved the study. Cases displaying artifactual intratumoral variation in labeling were excluded by use of Ki67-labeled mitotic figures as internal positive controls.

Digital Pathology Application

High-resolution, whole-slide images were acquired from all Ki67 (MIB1)-stained slides using a NanoZoomer Digital Pathology (NDP) System (Hamamatsu Photonics K.K., Japan) working at a resolution of 0.23 um/pixel. The immunostains were scanned at ×40 magnification and automatically digitized in their proprietary NDP Image (NDPI) file format. Between October 2013 and March 2014, digital files were consecutively uploaded in 1 set to a server at Erasmus MC through the standard file transfer protocol with URL: http://digimic.erasmusmc.nl/; en- abling online worldwide viewing through a virtual micro- scopy interface (NDP.view Viewer Software; Hamamatsu Photonics K.K.)

Participants and Interpretation of Staining Results

In the first round (Supplemental Digital Content 1, http://links.lww.com/PAS/A325), 14 observers, among which 11 expert endocrine pathologists (R.V.L., L.E., V.N., O.M., S.L.A., X.M .- G., T.J.S., K.S., F.T., F.H.v.N., and R.R.d.K.) and 3 residents (T.G.P., E.D., and J.T) received: (i) an email detailing the objectives of the project and clearly stating that only nuclear staining (plus mitotic figures that are stained by Ki67) should be incorporated into the Ki67 score defined as the percentage of positively stained cells among the total number of malignant cells scored with staining intensity being of no relevance,11 (ii) the corresponding link providing access

570 | www.ajsp.com

to the virtual slides, and (iii) a scoring list to be completed during Ki67 immunohistochemical evaluations.

All virtual slides were distributed online, reviewed by each observer in a blinded manner without knowledge of the corresponding clinicopathologic data or scores assigned by other pathologists. In particular, participants were asked to assess (i) the Ki67 LI based on (ii) the method of their preference/practice in diagnostics (visual estimation, formal manual count, or digital image anal- ysis [DIA]) reporting on (iii) the estimated total number of cells and (iv) the selected fields for evaluation, that is, hot spot area(s) or average score across the section, or average score across the section adding hot spot area(s).

Twenty-five cases were excluded from the analysis due to suboptimal staining, poor scan quality, and fix- ation artifacts. The remaining tumors from 76 patients of a mean age of 47.6 years (ranging from 8 to 85y; 1.17 female:male ratio) comprised 62 primary tumors, 6 re- currences, and 8 metastases. Thirty-four patients died of the disease, whereas 42 are alive with or without evidence of disease. The latter are currently in follow-up at various institutions with a mean of 34.27 months (range, 1 wk to 169 mo).

In the second round of assessment performed 9 months later (Supplemental Digital Content 1, http://links. lww.com/PAS/A325), 61 static images (.JPG files) were circulated among 15 observers, including 11 expert endo- crine pathologists (R.V.L., L.E., O.M., S.L.A., T.J.S., K.S., M.V., A.S.T., A.J.G., F.H.v.N., and R.R.d.K.) and 4 res- idents (T.G.P., E.P., E.D., and J.T). These images were selected as the most active areas based on an automated approach.15 The participants were instructed to follow a category-based evaluation of the Ki67 LI on the basis of visual estimation without performing formal manual count or DIA.

Software Application

Seventy-six virtual slides were assessed with a re- cently developed open-source Galaxy virtual machine application designed for Ki67 hotspot detection in adre- nocortical cancer (Supplemental Digital Content 1, http:// links.lww.com/PAS/A325). In brief, ASH comprises 3 classes: NDPI Segmentation, Adaptive Step Finding, and a Reporting Visualization, which utilizes the NDPI splitter to convert the specific NDPI format digital slide into a conventional tiff or jpeg format image for auto- mated segmentation and adaptive step finding hotspots detection algorithm.15 Quantitative hotspot ranking is provided by the functionality from the open-source ap- plication ImmunoRatio16 as part of the ASH protocol. Accordingly, the output is a ranked set of hotspots with concomitant quantitative values based on whole-slide ranking.

Statistical Analysis

Interobserver variability using either virtual micro- scopy (first evaluation) or visual estimation on static im- ages (second evaluation) as well as differences in the type of assessment was assessed with analysis of variance

(ANOVA) single factor. To evaluate intraobserver agree- ment, Cohen k was performed after conversion of the Ki67 values of the initial numerical assessment into categorical variables. With regard to automatically selected areas, we compared computerized counts based on ImmunoRatio and DIA, respectively, in identical images using Pearson correlation.17 The correlation between human independent selection and software selection of hotspot areas was ex- amined with Spearman rank order correlation. To compare the results of Ki67 assessment with the overall survival, Kaplan-Meier curves were plotted and P-values were cal- culated using the log rank test. The level of significance was set at P < 0.05. All other statistical analyses were per- formed using SPSS software (SPSS version 21; SPSS Inc., Chicago, IL).

RESULTS

Interobserver Variation in KI67 LI Assessment

Seventy-six cases were initially analyzed displaying statistically significant variance between 14 observers (ANOVA, F = 10.43; Forit = 1.73; P < 0.001) (Fig. 1). Dif- ferences in current practices concerning the Ki67 LI assess- ment are highlighted in Figure 2. Of the 14 observers, 8 preferred formal manual counting, 4 visual estimation, and 2 DIA (ImageJ software, 1.47v; Wayne Rasband, NIH and KS400 image analysis software, version 3.0; Carl Zeiss Vi- sion GmbH). With regard to the residents, 2 used formal manual count and 1 DIA (KS400 image analysis software, version 3.0; Carl Zeiss Vision GmbH). The overall agree- ment was not affected by different levels of experience in endocrine pathology (data not shown). No statistical sig- nificance was found between the different methods of as- sessment (ANOVA, P = 0.079), except between visual estimation and formal manual count (t test, P = 0.014). Kaplan-Meier curves based on the overall survival were plotted against 0%-15%-30% cutoffs (Fig. 3).

Impact of Visual Estimation on Variation

Given the large variation observed in the initial Ki67 assessment, we decided to reduce potential com- plexities by using visual estimation and following a cat- egory-based approach in 61 predetermined images. In this context, the variation remained statistically significant between 15 observers (ANOVA, F = 6.99; Fcrit = 1.70; P < 0.001). To evaluate interobserver concordance, ASH maximum values were utilized as “gold standard” and transformed into categorical variables. The highest levels of concordance were achieved within the lowest range of Ki67 values, that is, 0% to 10% (Fig. 4). Likewise, the overall agreement was not affected by different levels of experience in endocrine pathology (data not shown). To assess intraobserver concordance, we transformed those numerical values of the initial assessment into categorical variables. A very low degree of concordance was detected for every observer (n = 11) (Supplemental Digital Con- tent 2, http://links.lww.com/PAS/A326) with the majority having a higher score on visual estimation of pre- determined images (Fig. 5).

Ki67 LI (%)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

ACC 1

ACC 2

ACC 3

ACC 4

ACC 5

ACC 6

ACC 7

ACC 8

ACC 9

ACC 10

ACC 11

ACC 12

ACC 13

ACC 14

ACC 15

ACC 16

ACC 19

ACC 20

ACC 21

ACC 24

ACC 27

ACC 28

ACC 29

ACC 30

ACC 32

ACC 33

ACC 36

ACC 37

ACC 38

ACC 39

ACC 42

ACC 45

ACC 46

ACC49

ACC 50

ACC 51

ACC 52

ACC 54

ACC 56

ACC 57

ACC 58

ACC 62

ACC 63

ACC 64

ACC 65

ACC 68

ACC 69

ACC 70

ACC 74

ACC 75

ACC 78

ACC 80

ACC 81

ACC 82

ACC 84

ACC 85

ACC 87

ACC 90

ACC 91

ACC 92

ACC 93

ACC 95

ACC 96

ACC 97

ACC 98

ACC 99

ACC 101

ACC 102

ACC 104

ACC 109

ACC 117

ACC 118

ACC 121

ACC 123

ACC 124

ACC 127

the count.

FIGURE 2. Observers’ evaluation as referred to the method of assessment, fields of evaluation, and total number of cells utilized in

Method of assessment

8

2

4

4

Analysis

# Digital Image

count

= Formal manual

= Eyeballing

3

Number of cells

7

1500-2000

1000-1500

=500-1000

4

Fields of evaluation

5

2

3

Single Hotspot

= Hotspots

section + Hotspots

=Average across the

section

Average across the

(ImmunoRatio) and as generated by human independent

selection (DIA) in different images displaying the highest Ki67 expression (n = 61; 1 image per virtual slide). To this end, an observer (T.G.P.) selected 10 hotspot areas by visual estimation on a virtual microscopy interface and subsequently performed DIA (KS400 image analysis software, version 3.0; Carl Zeiss Vision GmbH). In this setting, strong correlations of 0.96 and 0.84 were detected, respectively (P < 0.001). From a clinical standpoint, we determined whether software-provided Ki67 values could divide the cases into 3 classes with significantly different overall survivals. In fact, when overall survival Kaplan- Meier curves were plotted against 0%-15%-30% and/or

of Ki67 LI

After software assessment, 15 of 76 cases were ex- cluded due to artifacts interfering with the analysis. To verify its applicability in the remaining 61 cases, we de- termined the degree of concordance (i) between compu- terized counts as provided by the software (ImmunoRatio) and by DIA (KS400 image analysis software, version 3.0; Carl Zeiss Vision GmbH) in identical images (n = 610; 10 images as selected by the ASH per virtual slide); and (ii) between computerized counts as provided by the software

ASH: A Virtual Microscopy-enabled Assessment

FIGURE 1. Ki67 LI determined by 14 observers on 76 virtual slides with various methods of assessment. Ki67 was quantified as percentage of positive immunoreactive tumor cells against total tumor cells and was expressed as mean.

Cases

FIGURE 3. Overall survival for DIA-MC performers (A), best performer (B), eyeballers (C), and all pathologists (D) using 0%-15%- 30% as cutoffs.

A

C

1.0

p=0.001

1.0

p=0.001

0.8

0.8

Overall Survival (%)

<15% (n=32, censored 23)

Overall Survival (%)

≥15-30%(n=19, censored 14)

0.6

0.6

p=0.07

p=0.001

<15% (n=29, censored 20) p= 0.945

p= 0.138

0.4

≥15-30% (n=23,censored 12)

0.4

>30% (n=21, censored 7)

>30% (n= 28, censored 8)

0.2

0.2

0.0

0.0

.00

50.00

100.00

150.00

200.00

.00

50.00

100.00

150.00

200.00

Months

Months

B

D

1.0

p=0.001

1.0

p=0.001

0.8

0.8

Overall Survival (%)

<15% (n=32, censored 23)

Overall Survival (%)

<15% (n=32, censored 23)

0.6

0.6

p=0.064

p=0.031

p=0.18

p=0.052

≥15-30% (n=21, censored 12)

0.4

≥15-30% (n=26, censored 14)

0.4

0.2

>30% (n=18, censored 5)

>30% (n= 23, censored 7)

0.2

0.0

0.0

.00

50.00

100.00

150.00

200.00

.00

50.00

100.00

150.00

200.00

Months

Months

FIGURE 4. Levels of concordance between observers follow- ing a category-based Ki67 scoring by visual estimation.

50

% of concordance

45

40

35

30

25

20

15

10

5

0

0-10

11-20

21-30

31-40

41-50

51-60

FIGURE 5. Intraobserver concordance of 11 observers partic- ipating both in numerical and category-based assessment of the Ki67 LI (, equal; >, higher; < , lower score on visual esti- mation of predetermined images).

70

Number of observations

== > .<

60

50

40

30

20

10

0

observer 1

observer 2

observer 3

observer 4

observer 5

observer 6

observer 7

observer 8

observer 9

observer 10

observer 11

0%-10%-20% cutoffs (Fig. 6), overall comparisons were statistically significant (P < 0.05).

DISCUSSION

Ki67 immunohistochemistry has been integrated in routine pathology practice not only in diagnostics, that is, grading and tumor classification, diagnosis of intra- epithelial neoplasia, and assessment of malignant poten- tial, but also as a prognostic and predictive biomarker. With regard to ACCs, it has been proposed in diag- nostics, 18,19 prognostics,8-10,20 and in guiding treatment decisions.1,2 The current study highlights the need for standardized use of the Ki67 LI discouraging visual esti- mation and verifies the applicability of ASH in Ki67 as- sessment.

A large variation was noted among 14 observers in Ki67 index determination using a virtual microscopy in- terface. Because of the stringent centralized staining protocol, all participants were seeing the same slides. The

variation therefore could not be explained by technical issues and had to be attributed to different practices with respect to interpretation and scoring such as area(s) of slide read, total number of cells in fields of evaluation, and methods of assessment.11 In support of the last, we still observed significant levels of variation even when reducing complexities by estimating Ki67 LI levels in preselected areas and following a category-based ap- proach using visual estimation. This is consistent with studies in breast carcinomas using a tissue microarray platform21 as well as in gastroenteropancreatic neuro- endocrine tumors using predetermined images.12

Although visual estimation has been suggested as an acceptable method of assessment on expert diagnostic13 and/or research grounds,22-24 our findings further reinforce the notion that this readout technique is subjective, in- accurate, and thus unreliable.12,21,25,26 Importantly, low levels of concordance were revealed around categorical cutoff values recently proposed in ACCs. This is in keeping with Tang et al12 who reported significant discordance

FIGURE 6. Overall survival determined by pathologists using 0%-15%-30% (A) and 0%-10%-20% (B) cutoffs compared with the software (ASH) cutoff ranges of 0%-15%-30% (C) and 0%-10%-20% (D), respectively.

A

C

1.0

p=0.002

1.0

p< 001

0.8

0.8

Overall Survival (%)

Overall Survival (%)

<15%

0.6

<15%

0.6

0.4

≥15-30%

0.4

>30%

>30%

≥15-30%

0.2

0.2

0.0

0.0

.00

20.00

40.00

60.00

80.00

100.00

120,00

140,00

.00

20.00

40.00

60.00

80.00

100.00

120.00 140.00

Months

Months

B

D

1.0

p=0.02

1.0

p=0.007

0.8

0.8

≥10-20%

<10%

Overall Survival (%)

Overall Survival (%)

0.6

0.6

0.4

0.4

≥10-20%

<10%

0.2

0.2

>20%

0.0

>20%

0.0

.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

.00

20.00

40.00

60.00

80.00

100.00 120.00 140.00

Months

Months

among 18 observers, which was sufficient to alter the final grade of the majority of 45 neuroendocrine tumors. Whether such discordances could be solely ascribed to the method of assessment or partly to parameters residing in the realm of cognitive psychology22 remains uncertain.

The aforementioned data challenge the clinical ap- plicability of clinically relevant cutoffs in ACCs. In ac- cordance with Polley et al21 and Mengel et al,27 Beuschlein et al9 suggested that Ki67 LI variability is to be expected in ACCs at different clinical centers high- lighting the issue of interlaboratory variation due to preanalytical and analytical parameters.21,27 In this set- ting, rigorous methods in tissue preparation, that is, fix- ation, processing, and generation of uniform sections, would seem to be important. Interlaboratory variables at play, for example, variation affecting controlled con- ditions, variability in microtomes used, and differences in the temperature of the formalin-fixed paraffin-embedded blocks, might have affected the thickness of the im- munostained sections in the current study. In addition to the interlaboratory variation, interobserver and intra- observer variation12,22,28,29 and tumor heterogeneity of Ki67 expression levels13,14,30,31 seem to add further levels of complexity to the issue of reproducibility, thereby hampering its clinical utility. This issue was emphasized by the International Ki67 in Breast Cancer Working Group11 that was unable to reach a consensus in the absence of harmonized methodology with respect to ideal thresholds that could be useful in clinical routine practice. Accordingly, they recommended that cutoffs for prog- nosis, prediction, and monitoring should be applied only if the results from local practice have been validated against the respective ones in studies that have defined these particular cutoffs.11,21

Various approaches have been developed to obtain standardized Ki67 scoring. These include efforts to reduce interlaboratory variation by calibrating to a common scoring method using a web-based tool32 and efforts to reduce inter- observer and intraobserver variation by either selecting the most representative tumor areas based on an automated ap- proach15,33,34 or providing a software-automated quantitation of Ki67 LI.16,35-37 In the setting of computerized image analysis, we verified the applicability of a digital microscopy- enabled method for assessment of Ki67 expression in adre- nocortical cancer. The novel approach of software-selected areas aims not only to reduce the interobserver variation, but also to characterize Ki67 levels of heterogeneity in primary tumors, recurrences, and metastases.

User interaction is recommended before virtual slide analysis to ensure that areas leading to miscalculations, that is, intrinsic and extrinsic pigmentation (deposit arti- facts), necrotic areas, tissue folds, etc., are excluded.15 In this series, excluding certain tissue regions was not suffi- cient to avoid serious miscalculations with regard to 15 cases (15/76, 20%) that were subsequently excluded from the analysis, calling into question potential clinical ac- tions based on such cases. Future efforts should focus on software amendments to overcome technical short- comings in addition to improving methods of scoring.

In conclusion, current practices in Ki67 scoring as- sessment vary greatly, and interobserver variation sets particular limitations to the clinical utility of Ki67 LI, especially around clinically relevant cutoff values, in adrenocortical cancer. Our results highlight the need for standardization and suggest that visual estimation should be strongly discouraged as a readout technique, while computerized DIA seems to provide a reliable alternative. To drive forward harmonization of Ki67 analysis, we have previously developed and now validated an open- source Galaxy virtual machine application, namely ASH. Given certain preanalytical and analytical concerns, quality assurance schemes, that is, standardized tissue fixation along with fine-tuned immunohistochemical staining protocols, are expected to additionally increase reproducibility and reliability of the Ki67 LI in endocrine pathology practice.

REFERENCES

1. Fassnacht M, Libé R, Kroiss M, et al. Adrenocortical carcinoma: a clinician’s update. Nat Rev Endocrinol. 2011;7:323-335.

2. Fassnacht M, Kroiss M, Allolio B. Update in adrenocortical carcinoma. J Clin Endocrinol Metab. 2013;98:4551-4564.

3. van’t Sant HP, Bouvy ND, Kazemier G, et al. The prognostic value of two different histopathological scoring systems for adrenocortical carcinomas. Histopathology. 2007;51:239-245.

4. Mouat IC, Giordano TJ. Assessing biological aggression in adrenocortical neoplasia. Surg Pathol Clin. 2014;7:533-541.

5. Giordano TJ. The argument for mitotic rate-based grading for the prognostication of adrenocortical carcinoma. Am J Surg Pathol. 2011;35:471-473.

6. Sbiera S, Schmull S, Assie G, et al. High diagnostic and prognostic value of steroidogenic factor-1 expression in adrenal tumors. J Clin Endocrinol Metab. 2010;95:E161-E171.

7. Duregon E, Volante M, Giorcelli J, et al. Diagnostic and prognostic role of steroidogenic factor 1 in adrenocortical carcinoma: a validation study focusing on clinical and pathologic correlates. Hum Pathol. 2013;44:822-828.

8. Duregon E, Molinaro L, Volante M, et al. Comparative diagnostic and prognostic performances of the hematoxylin-eosin and phos- pho-histone H3 mitotic count and Ki-67 index in adrenocortical carcinoma. Mod Pathol. 2014;27:1246-1254.

9. Beuschlein F, Weigel J, Saeger W, et al. Major prognostic role of Ki67 in localized adrenocortical carcinoma after complete resection. J Clin Endocrinol Metab. 2015;100:841-849.

10. Libé R, Borget I, Ronchi CL, et al. Prognostic factors in stage III-IV adrenocortical carcinomas (ACC): an European Network for the Study of Adrenal Tumor (ENSAT) study. Ann Oncol. 2015;10: 2119-2125.

11. Dowsett M, Nielsen TO, A’Hern R, et al. Assessment of Ki67 in breast cancer: recommendations from the International Ki67 in Breast Cancer working group. J Natl Cancer Inst. 2011;103: 1656-1664.

12. Tang LH, Gonen M, Hedvat C, et al. Objective quantification of the Ki67 proliferative index in neuroendocrine tumors of the gastro- enteropancreatic system: a comparison of digital image analysis with manual methods. Am J Surg Pathol. 2012;36:1761-7170.

13. Adsay V. Ki67 labeling index in neuroendocrine tumors of the gastrointestinal and pancreatobiliary tract: to count or not to count is not the question, but rather how to count. Am J Surg Pathol. 2012;36:1743-1746.

14. Yang Z, Tang LH, Klimstra DS. Effect of tumor heterogeneity on the assessment of Ki67 labeling index in well-differentiated neuro- endocrine tumors metastatic to the liver: implications for prognostic stratification. Am J Surg Pathol. 2011;35:853-860.

15. Lu H, Papathomas TG, van Zessen D, et al. Automated Selection of Hotspots (ASH): enhanced automated segmentation and adaptive

step finding for Ki67 hotspot detection in adrenal cortical cancer. Diagn Pathol. 2014;9:216.

16. Tuominen VJ, Ruotoistenmäki S, Viitanen A, et al. ImmunoRatio: a publicly available web application for quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki-67. Breast Cancer Res. 2010;12:R56.

17. Wessa P2015. Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7. Available at: http:// www.wessa.net/. Accessed November 1, 2014.

18. Schmitt A, Saremaslani P, Schmid S, et al. IGFII and MIB1 immunohistochemistry is helpful for the differentiation of benign from malignant adrenocortical tumours. Histopathology. 2006;49: 298-307.

19. Soon PS, Gill AJ, Benn DE, et al. Microarray gene expression and immunohistochemistry analyses of adrenocortical tumors identify IGF2 and Ki-67 as useful in differentiating carcinomas from adenomas. Endocr Relat Cancer. 2009;16:573-583.

20. Ip JC, Pang TC, Glover AR, et al. Immunohistochemical validation of overexpressed genes identified by global expression microarrays in adrenocortical carcinoma reveals potential predictive and prognostic biomarkers. Oncologist. 2015;20:247-256.

21. Polley MY, Leung SC, McShane LM, et al. An international Ki67 reproducibility study. J Natl Cancer Inst. 2013;105:1897-1906.

22. Varga Z, Diebold J, Dommann-Scherrer C, et al. How reliable is Ki- 67 immunohistochemistry in grade 2 breast carcinomas? A QA study of the Swiss Working Group of Breast- and Gynecopathologists. PLoS One. 2012;7:e37379.

23. Hida AI, Bando K, Sugita A, et al. Visual assessment of Ki67 using a 5-grade scale (Eye-5) is easy and practical to classify breast cancer subtypes with high reproducibility. J Clin Pathol. 2015;68:356-361.

24. Hida AI, Oshiro Y, Inoue H, et al. Visual assessment of Ki67 at a glance is an easy method to exclude many luminal-type breast cancers from counting 1000 cells. Breast Cancer. 2015;22:129-134.

25. Reid MD, Bagci P, Ohike N, et al. Calculation of the Ki67 index in pancreatic neuroendocrine tumors: a comparative analysis of four counting methodologies. Mod Pathol. 2015;28:686-694.

26. Mikami Y, Ueno T, Yoshimura K, et al. Interobserver concordance of Ki67 labeling index in breast cancer: Japan Breast Cancer Research Group Ki67 ring study. Cancer Sci. 2013;104:1539-1543.

27. Mengel M, von Wasielewski R, Wiese B, et al. Inter-laboratory and inter-observer reproducibility of immunohistochemical assessment of the Ki-67 labelling index in a large multi-centre trial. J Pathol. 2002;198:292-299.

28. Niikura N, Sakatani T, Arima N, et al. Assessment of the Ki67 labeling index: a Japanese validation ring study. Breast Cancer. 2014. [Epub ahead of print].

29. Gudlaugsson E, Skaland I, Janssen EA, et al. Comparison of the effect of different techniques for measurement of Ki67 proliferation on reproducibility and prognosis prediction accuracy in breast cancer. Histopathology. 2012;61:1134-1144.

30. Shi C, Gonzalez RS, Zhao Z, et al. Liver metastases of small intestine neuroendocrine tumors: Ki-67 heterogeneity and World Health Organization grade discordance with primary tumors. Am J Clin Pathol. 2015;143:398-404.

31. Couvelard A, Deschamps L, Ravaud P, et al. Heterogeneity of tumor prognostic markers: a reproducibility study applied to liver metastases of pancreatic endocrine tumors. Mod Pathol. 2009;22: 273-281.

32. Polley MY, Leung SC, Gao D, et al. An international study to increase concordance in Ki67 scoring. Mod Pathol. 2015;28:778-786.

33. Lopez XM, Debeir O, Maris C, et al. Clustering methods applied in the detection of Ki67 hot-spots in whole tumor slide images: an efficient way to characterize heterogeneous tissue-based biomarkers. Cytometry A. 2012;81:765-775.

34. Elie N, Plancoulaine B, Signolle JP, et al. A simple way of quantifying immunostained cell nuclei on the whole histologic section. Cytometry A. 2003;56:37-45.

35. Klauschen F, Wienert S, Schmitt W, et al. Standardized Ki67 diagnostics using automated scoring-clinical validation in the GeparTrio breast cancer study. Clin Cancer Res. 2015;21:3651-3657.

36. Samols MA, Smith NE, Gerber JM, et al. Software-automated counting of Ki-67 proliferation index correlates with pathologic grade and disease progression of follicular lymphomas. Am J Clin Pathol. 2013;140:579-587.

37. Schaffel R, Hedvat CV, Teruya-Feldstein J, et al. Prognostic impact of proliferative index determined by quantitative image analysis and the International Prognostic Index in patients with mantle cell lymphoma. Ann Oncol. 2010;21:133-139.