-
available at www.sciencedirect.com
Science of the Total Environment An International Journal for Scientific Research into the Emironment and its Relationship with Humankind
ScienceDirect
ELSEVIER
Exploratory analysis of potential risk factors of a rare disease: Spatial distribution of adrenocortical carcinoma in Israel as a case study
Boris A. Portnova,*, Micha Barchanac,d, Jonathan Dubnoub,d
a Department of Natural Resources & Environmental Management, Graduate School of Management, University of Haifa, Israel
b Haifa District Health Office, Ministry of Health, Israel
” Israel National Cancer Registry, Ministry of Health, Israel
d School of Public Health, University of Haifa, Israel
ARTICLE DATA
Article history: Received 6 December 2007 Received in revised form 16 October 2008 Accepted 17 October 2008 Available online 30 November 2008
Keywords: Risk factors Rare disease Spatial clustering
ABSTRACT
The underlying assumption of the proposed exploratory approach is that, if the geographic patterns of different diseases are compared, the cases of a ‘subject’ disease should occur closer to cases of a disease with similar environmental risk factors (etiology) and farther away from cases of a disease with different etiology. In the present study, the performance of proposed approach is investigated by cross-examination of the spatial patterns of three widespread cancers - lung, larynx and colorectal (CRC) - with that of a rare malignant disease - Adrenocortical Carcinoma (ACC). As the analysis indicates, the spatial distribution of ACC is more likely to be related to hereditary factors than to environmental causes, in accordance with current knowledge about this rare disease.
@ 2008 Elsevier B.V. All rights reserved.
1. Introduction
Malignant diseases (cancer) are a result of multi-factorial processes, which manifest themselves in the alteration, trans- formation of cells and loss of their regular functions (Adami et al., 2002). Current knowledge about potential risk factors of these diseases is still limited, but it appears that most of the processes that lead to cancer are related to the indoor and outdoor environment (including personal habits and atti- tudes), while only a small fraction of causes is attributed to genetic and hereditary factors that facilitate the potential harmful effect of the environment on predisposed individuals (Doll and Peto, 1981; Doll, 1998; NCI and NIEHS, 2003).
The use of geography to explain disease etiology is not new and dates back to 61 A.D. Thus, the Roman Philosopher Seneca describes the ‘bad air’ hidden inside the earth and released during earthquakes as the main cause of epidemics. He also
points out at the ‘dose-response’ relationship claiming that sheep are more prone to inhale this bad air than men due to their lower stature and the fact that they are closer to the ground (Seneca; Gummere, 1972).
Modern technology provides us with powerful tools for analyzing spatial patterns of disease occurrence and the most prominent of them are the Geographical Information Systems (GIS). The GIS technology makes it possible to investigate the geographic patterns of various diseases, improve exposure measurements to various environmental risk factors and faci- litate a study of the effects of these factors on public health (Elliott et al., 1992, 2000; Elliott and Wartenberg, 2004; Nuckols et al., 2004).
Since people live in population conglomerates (villages, towns, and cities) and people in certain conglomerates share most of the same environmental risk factors, the geographical patterns of certain diseases are likely to be similar. Although such clustering does not demonstrate causality, it may serve
* Corresponding author. E-mail address: portnov@nrem.haifa.ac.il (B.A. Portnov).
| Number of ACC cases in SCA | Frequency | Percent |
|---|---|---|
| 0 | 2103 | 89.6 |
| 1 | 189 | 8.05 |
| 2 | 23 | 0.98 |
| 3 | 3 | 0.13 |
| Total | 2347 | 100.00 |
| ACC rate (per 100,000 residents) |
| Minimum | Maximum | Mean |
|---|---|---|
| 0.00 | 147.60 | 3.15 |
| Normality test | # of cases | Rate |
| Kolmogorov-Smirnov Z | 25.494 | 24.956 |
| Asymp. sig. (2-tailed) | <0.001 | <0.001 |
as a useful indication about the similarity of environmental risk factors presented in terms of geographical patterns (Elliott et al., 2000; Jacquez and Greiling, 2003; Jacquez, 2004). A practical implementation of this approach can be in exploring the disease etiology by comparing the spatial distribution of a disease, which etiology is unclear or unknown, with that of another disease with known etiological risk factors. The present study describes this approach.
2. Materials and methods
2.1. Data sources
The present analysis is based on the examination of geo- graphic patterns of three widespread cancers - lung, larynx and colorectal (CRC), - for which risk factors are known (at least for the former two of them), and the comparison of their spatial distribution with that of a rare malignant disease - Adrenocortical Carcinoma (ACC).
We used data from the Israel National Cancer Registry (INCR), a population based cancer registry established in 1960 and covering the entire country. Since 1982, reporting to the registry is mandatory for all medical facilities (i.e., medical institutions and pathology laboratories, both public and private) with high completeness of the registration above 94% (Fishler et al., 2003)1.
At present, the database contains 126,875 exact addresses (including street names and house numbers) for 59,007 CRC patients who were diagnosed with cancer from 1980 to 2005 (an average of 2.15 addresses for each patient). Lung cancer patients (30,910 in the corresponding period) had 67,489 different addresses (2.18 on average per person), and 6952 larynx cases having 15,298 different addresses (2.2 addresses per person, on the average). The database also contains 244 cases of ACC diagnosed in the same period (1980-2005). [Some of these cases were corrected to adrenal carcinoma (and not cortex) but were kept in our analysis].
The notification about address change is mandatory by law, and all address changes are retrieved routinely for all new cancercases registered in the INCR. In the present study, only patients residing at the same address for 10 years or more prior to the date of diagnosis were included in the analysis. Because can- cer occurrence is most common among the elderly, who do not change their address frequently, this condition was easily met.
During the past two decades, only a few dozens of cases of ACC have been recorded in Israel and they are spread unevenly across small census areas (Table 1). Their scarcity and deviation from normality make it unfeasible to calculate the rates of the disease and to subject them to a thorough epidemiological investigation, using ‘traditional’ multivariate analysis techni- ques. As Table 1 shows, approximately 90% of small census areas (SCA) of the country have no single recorded case of ACC, while 189 SCAs have only one case of ACC and only 1% of SCAs have two or more recorded ACC cases. Disease rates calculated for such sparse data will naturally reflect differences in the population sizes of SCAs, rather than the incidence rate of the disease itself.
While smoking is the most important risk factor for larynx and lung cancer, and personal/family history of colorectal polyps or inflammatory bowel diseases are risk factors for colorectal cancer, the etiology of Adrenocortical Carcinoma (ACC) is largely unknown (Adami et al., 2002; Boushey and Dackiw, 2001; Gicquel et al., 1997). However, several recent studies point out at the hereditary origin of this disease, attributing ACC development to genetic alterations in chromosomal regions (such as 11p15) and p53 gene mutation (see inter alia, Allolio and Fassnacht, 2006; Dackiw et al., 2001; Figueiredo et al., 2006).
We assumed that the spatial patterns of non-rare malig- nancies with known risk factors (such as larynx, lung and CRC) can be compared with spatial pattern of cancer with unknown risk factors (that is, ACC). The pair-wise comparisons of spatial distributions of ‘cases’ and ‘controls’ can then be used to verify whether ACC risk factors are either environment or ‘family history’ (genetics) related.
The approach to comparing the spatial patterns of different diseases we propose and use in this study is based on the general logic of several commonly used measures of spatial association (such as Moran’s I, Geary’s C, Getis-Ord Gi*(d) etc.)2
2 Indicators of spatial association (Moran’s I, Geary’s C, etc.) provide summary information about the intensity of spatial interaction, thus helping to determine whether the values of a variable are arranged in space in a systematic manner. If such a systematic distribution of values occurs, this phenomenon is called ‘spatial association’. Thus, Moran’s I measure of spatial autocorrelation, which is commonly used in applied studies. It is calculated as follows:
i=nj=n
DEWi(Xi -x) (x ;- x)
n
i= nj =1
I= i= nj=n Σ Wij i=1j =1
d
i=n
;
Σ (xi -x)2
i=1
where Wij is a binary weight matrix of the general cross-product statistic (Wij=1 if two localities, i and j, are adjacent, and Wij=0 otherwise; Wii also equals zero, meaning that a locality is not adjacent to itself); x¡ and xj are values of a particular variable in localities; x is the mean value of the data sequence, and n is the total number of observations (localities) included in the sample.
1 In accordance with the law, INCR retrieves each cancer patient’s personal data from the central population registry, including the place of birth, immigration date, current and historical place of residence (street address including house number), ethnicity and gender (smoking habits and occupational information are not reported in the registry).
Nearest neighbor distance
Disease A
Disease B (control)
Disease C (control)
Small Statistical Areas
Search radius
0
125
250
500 Meters
1
(see inter alia Anselin, 1999; Getis and Ord, 1992; Cockings et al., 2004). The main underlying assumption of our approach is simple: cases of a ‘subject’ disease should occur closer to cases of a disease with similar risk factors (etiology) and farther from cases of a disease characterized by different etiologies.
The proposed approach uses two analytical techniques - the ‘nearest neighbor’ approach and the ‘immediate neighbor- hood’ approach, both of which are illustrated by Fig. 1. According to the former approach, among the diseases under investigation (A, B and C), those with similar etiology should expectedly show the smallest average ‘nearest neighbor’ distance, due to their likely clustering in the same environmental ‘risk areas’. Concurrently, for a given search radius, the overall number of neighboring cases of diseases sharing similar etiological factors should be significantly larger, due to the same clustering effect, than for diseases with different etiologies.
The analysis was carried out in three phases. First, we mapped ACC cases and its three ‘control’ groups (lung, larynx and CRC) as separate layers in the ArcGIS 9TM software.
Second, we compared the spatial distributions of the three groups of controls (lung, larynx and CRC), to verify that the proposed technique of the analysis works. At the final phase of the analysis, we juxtaposed the geographic distribution of ACC cases with that of its ‘controls’, to determine the degree of similarity between them.3
3 The process of geocoding results in a precise coordinate of a patient’s house location on the map. The geocoding was done using the ArcGIS 9TM software and commercially available street maps. The geocoding procedure was done on a single run with no manual correction for misspelled addresses and non-permanent residential addresses (such as hotels, community houses, etc.) We obtained 27,927 house coordinates for CRC patients, 13,590 lung and 6950 laryngeal cancer addresses. In the second step of the analysis, we randomly selected 6950 CRC cases and the same number of lung cancer patients to match the number of cases of laryngeal cancer. The 244 ACC cases were all geocoded as well.
| Control group | Whole sample ª | Metro areas b | ||||||
|---|---|---|---|---|---|---|---|---|
| Distance, m | SDC | t | Sig. | Distance, m | SDC | t | Sig.º | |
| Test 1 (distance to the nearest neighbor) | ||||||||
| CRC | 126.27 | 185.29 | 2.531 | 0.011 | 113.00 | 238.60 | 4.858 | <0.001 |
| Lung | 121.62 | 178.54 | 104.02 | 223.10 | ||||
| Test 2 (average number of neighbors within a 500-m search radius) | ||||||||
| # of neighbors | SD | t | Sig. d | # of neighbors | SD | t | Sig. d | |
| CRC | 21.77 | 17.96 | -2.623 | 0.009 | 25.72 | 18.28 | -4.027 | <0.001 |
| Lung | 22.02 | 18.43 | 26.23 | 19.09 | ||||
| Test 3 (average number of neighbors within a 5000-m search radius) | ||||||||
| # of neighbors | SD | t | Sig. d | # of neighbors | SD | t | Sig. d | |
| CRC | 542.11 | 477.66 | -6.688 | <0.001 | 720.36 | 461.35 | -7.364 | <0.001 |
| Lung | 546.66 | 485.47 | 727.40 | 469.76 | ||||
| a Total number of cases =6950. | ||||||||
| b Total number of cases=4888. " Standard deviation. | ||||||||
| d 2-tailed significance level. | ||||||||
Depending on the actual distribution of cases, the analysis of ‘case-control’ differences was carried out either using a t- test or, alternatively, the Friedman non-parametric test, if the normality assumption was not upheld.
3. Results
The results of the analysis are reported in Tables 2 and 3. Table 2 shows the results of comparison of geographic patters of larynx with CRC and lung, whereas Table 3 reports the comparison between the distribution of ACC cases and their ‘controls’ - larynx, CRC, and lung.
| Control group | Distance to the nearest neighbor | Number of neighbors within a 5000-m radius | ||||
|---|---|---|---|---|---|---|
| Mean | SDb | Mean rank | Mean | SDb | mean rank | |
| Larynx | 106.53 | 77.34 | 2.18 | 544.60 | 325.91 | 1.53 |
| CRC | 86.61 | 71.94 | 1.89 | 659.11 | 426.86 | 2.23 |
| Lung | 91.08 | 88.00 | 1.93 | 648.19 | 421.58 | 2.24 |
| Chi- | 8.96 | 60.72 | ||||
| square | ||||||
| df | 2 | 2 | ||||
| Asymp. | 0.011 | <0.001 | ||||
| sig. | ||||||
| a The total number of ACC cases=244. b Standard deviation. | ||||||
Two different tests - ‘distance to the nearest neighbor’ and the ‘average number of neighbors in a given search radius’ - were run. Because the size of a search radius may affect the outcome of the analysis, we used two different search radii - 500 m and 5000 m, - to account for such a possibility.
We also treated separately cases spread over the entire country (the ‘whole sample’) and cases located its major metropolitan areas - Jerusalem, Tel Aviv and Haifa (‘metro areas’; see Table 2). Because small towns are often wide apart, especially in sparsely populated peripheral areas, this may affect the outcome of the analysis. Therefore, the separate treatment of cases recorded in major metropolitan areas may account for this ‘sparseness’ effect and improve our estimates.
As Table 2 shows, the overall trend remains consistent for all tests and different scales of the analysis. Expectedly, the dis- tribution of larynx appears to be significantly closer to lung, than to CRC. In particular, each case of larynx appears to be closer to its nearest lung ‘counterpart’ (Distance=121.62 m (Larynx-Lung) vs. 126.27 m (Larynx-CRC) for ‘all sample’ and Distance=104.02 m (Larynx-Lung) vs. 113.00 m (Larynx-CRC) for ‘metro areas’; P<0.01; see Table 2). Moreover, each larynx case has more neighbors in the lung distribution than in the CRC one (P<0.01).
Notably, an increase of the search radius from 500 m to 5000 m raises the statistical significance of the estimates, while the cross-distribution differences are stronger for ‘metro areas’ then for the ‘whole sample’ (see Table 2).
After the verification phase of the analysis was completed and that the proposed approach was found to be sufficiently sensitive, we moved to the joint analysis of ACC vs. its ‘controls’. As Table 3 demonstrates, the ACC distribution appears to be significantly closer to CRC, than to larynx and lung, which is demonstrated by both our tests - ‘nearest neighbor’ and ‘search radius’ (P<0.01; Table 3).
4. Discussion
Comparisons of the geographical distributions of malignant diseases are frequently used for the detection of disease clusters (Goovaerts, 2006; Jacquez et al., 2006; Wheeler, 2007) and for verifying hypotheses about specific environmental risk factors (Benigni and Giuliani, 2002; Elliott et al., 2000; Jemal et al., 2002). In our study, we compared the spatial distribution of Adrenocortical Carcinoma (ACC) with three types of cancer (Larynx, Lung and CRC), used as controls. The analysis indicated that the spatial distribution of ACC was closer to CRC, than to larynx and lung and is thus more likely to be associated with hereditary factors (rather than with environmental causes), which generally corresponds to the current state of knowledge about this rare disease (Adami et al., 2002; Allolio and Fassnacht, 2006; Beuschlein et al., 2001; Boushey and Dackiw, 2001; Figueiredo et al., 2006; Gicquel et al., 1997).
The proposed approach to the exploratory analysis of disease etiology is somewhat similar to that used in previous studies conducted elsewhere (Brody et al., 2004; Langholz et al., 2002; Reynolds et al., 2004; Wheeler, 2007). The main difference is however that we chose not healthy individuals as a control group, but individuals with diagnosed different types of cancers. Although, as we believe, the proposed approach (based on pair-wise comparison of spatial patterns) is limi- ted in its ability to make causality assumptions (Jacquez, 2004), it may help nevertheless to explore rare/new types of diseases and possible influences of environmental factors on them.
Some other rare diseases and syndromes (such as Li- Fraumeni and Beckwith-Weidemann) associated with ACC (Figueiredo et al., 2006; Fottner et al., 2004) could have been used in the present analysis as additional test cases. However, it appeared that these health conditions were extremely rare in the population under study. For instance, we found only one breast cancer diagnosed at age of 29 (just before the ACC diagnosis), two malignant melanoma of skin (diagnosed at the age of 41) and one concomitant diagnosis of hepatocellular carcinoma in an ACC patient. Out of 25 second primaries in the entire group, there were four other malignancies occurring prior to 40 years of age (one Hodgkin disease, one CLL, one melanoma and the above mentioned breast cancer case). Although the suggested analytical method could have been used for the analysis of spatial clustering of those syndromes as well, the extremely small number of cases could not make such an analysis feasible.
An apparent limitation of our study is the use of national cancer registry which lacks information about behavioral, occupation and geographical exposures prior to cancer diagnosis (Boscoe et al., 2004). A typical limitation of a national cancer registry is partly collected notifications and a lack of critical exposure data. However this limitation was unlikely to have a significant influence in our case, because the three types of ‘control’ cancers used in the study were chosen by random procedure with good overall completeness compared to the whole registry (P<0.01). Moreover, our study is only one of exploration approaches in spatial epidemiology without any claiming of causality that we believe could be helpful for public health practitioners.
REFERENCES
Adami H-O, Hunter D, Trichopoulos D, editors. Textbook of cancer epidemiology. New York: Oxford University Press; 2002.
Allolio B, Fassnacht M. Clinical review: adrenocortical carcinoma: clinical update. J Clin Endocrinol Metab 2006;91(6):2027-37 [Jun, Review].
Anselin L. Spatial Econometrics: Bruton Center, School of Social Sciences, University of Texas at Dallas; 1999.
Benigni R, Giuliani A. Cancer incidence and socioeconomic geography of Finland: a correlation study. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 2002;20(1):29-43 [May].
Beuschlein F, Fassnacht M, Klink A, Allolio B, Reincke M. ACTH-receptor expression, regulation and role in adrenocor- tical tumor formation. Eur J Endocrinol. 2001;144(3):199-206 [Mar, Review].
Boscoe FP, Ward MH, Reynolds P. Current practices in spatial analysis of cancer data: data characteristics and data sources for geographic studies of cancer. Int J Health Geogr 2004;3(1):28 [December 1].
Boushey RP, Dackiw AP. Adrenal cortical carcinoma. Curr Treatm Opt Oncol 2001;2(4):355-64 [Aug].
Brody JG, Aschengrau A, McKelvey W, Rudel RA, Swartz CH, Kennedy T. Breast cancer risk and historical exposure to pesticides from wide-area applications assessed with GIS. Environ Health Perspect 2004;112:889-97.
Cockings S, Dunn CE, Bhopal RS, Walker DR. Users’ perspectives on epidemiological, GIS and point pattern approaches to analysing environment and health data. Health & Place 2004;10(2):169-82.
Dackiw AP, Lee JE, Gagel, RF, Evans, DB. Adrenal cortical carcinoma. World J Surg. 2001 Jul;25(7):914-26. Review.
Doll R. Epidemiological evidence of the effects of behavior and the environment on the risk of human cancer. Recent Results Cancer Res 1998;154:3-21.
Doll R, Peto R. The causes of cancer: quantitative estimates of avoidable risks of cancer in the United States today. J Natl Cancer Inst 1981;66:1191-308.
Elliott P, Wartenberg D. Spatial epidemiology: current approaches and future challenges. Environ Health Perspect 2004; 112(9):998-1006 [Jun].
Elliott P, Cuzick J, English D, Stern R. (eds.) 1992 (1996 reprint). Geographical and environmental epidemiology. Methods for small area studies. Oxford: Oxford University Press; 404 pp.
Elliott P, Wakefield JC, Best NG, Briggs DJ, editors. Spatial epidemiology: methods and applications. USA: Oxford University Press; 2000.
Figueiredo BC, Sandrini R, Zambetti GP, Pereira RM, Cheng C, Liu W, et al. Penetrance of adrenocortical tumours associated with the germline TP53 R337H mutation. J Med Genet 2006;43(1):91-6 [Jan]. Fishler Y, Chitrit A, Barchana M, Modan B. Examination of Israel national cancer data accumulation completeness for 1991.
[Hebrew], The National Center for Disease Control, publication no. 230. Israel: Tel Hashomer; 2003.
Fottner CH, Hoeflich A, Wolf E, Weber MM. Role of the insulin-like growth factor system in adrenocortical growth control and carcinogenesis. Horm Metab Res. 2004 Jun;36(6):397-405. Review.
Getis A, Ord JK. The analysis of spatial association by use of distance statistics. Geogr Anal 1992;24(3):189-206.
Gicquel C, Baudin E, Lebouc Y, Schlumberger M. Adrenocortical carcinoma. Ann Oncol 1997;8(5):423-7 [May].
Goovaerts P. Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging. Int J Health Geogr 2006;5:52 [Nov 30].
Jacquez GM. Current practices in the spatial analysis of cancer: flies in the ointment. Int J Health Geogr 2004;3(1):22 [Oct 12].
Jacquez GM, Greiling DA. Local clustering in breast, lung and colorectal cancer in Long Island, New York. Int J Health Geogr 2003;2(1):3 [Feb 17].
Jacquez GM, Meliker JR, Avruskin GA, Goovaerts P, Kaufmann A, Wilson ML, et al. Case-control geographic clustering for residential histories accounting for risk factors and covariates. Int J Health Geogr 2006;5:32 [Aug 3].
Jemal A, Kulldorff M, Devesa SS, Hayes RB, Fraumeni Jr JF. A geographic analysis of prostate cancer mortality in the United States, 1970-89. Int J Cancer 2002;101(2):168-74 [September 10]. Langholz B, Ebi KL, Thomas DC, Peters JM, London SJ. Traffic density and the risk of childhood leukemia in a Los Angeles case-control study. Ann Epidemiol 2002;12:482-7.
National Cancer Institute & National Institute of Environmental Health Sciences (NCI & NIEHS). Cancer and the environment, National Institute of Health pub. 03-2039; 2003.
Nuckols JR, Ward MH, Jarup L. Using geographic information systems for exposure assessment in environmental epidemiology studies. Environ Health Perspect 2004;112 (9):1007-15 [June].
Reynolds P, Von Behren J, Gunier RB, Goldberg DE, Hertz A. Residential exposure to traffic in California and childhood cancer. Epidemiology 2004;15:6-12. Seneca L. In: Gummere RM, editor. Epistles, vol. 5. Harvard University Press; 1972. p. 66-92.
Wheeler DC. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003. Int J Health Geogr 2007;6:13 [Mar 27].