TY - JOUR
T1 - Clustering by phenotype and genome-wide association study in autism
AU - Narita, Akira
AU - Nagai, Masato
AU - Mizuno, Satoshi
AU - Ogishima, Soichi
AU - Tamiya, Gen
AU - Ueki, Masao
AU - Sakurai, Rieko
AU - Makino, Satoshi
AU - Obara, Taku
AU - Ishikuro, Mami
AU - Yamanaka, Chizuru
AU - Matsubara, Hiroko
AU - Kuniyoshi, Yasutaka
AU - Murakami, Keiko
AU - Ueno, Fumihiko
AU - Noda, Aoi
AU - Kobayashi, Tomoko
AU - Kobayashi, Mika
AU - Usuzaki, Takuma
AU - Ohseto, Hisashi
AU - Hozawa, Atsushi
AU - Kikuya, Masahiro
AU - Metoki, Hirohito
AU - Kure, Shigeo
AU - Kuriyama, Shinichi
N1 - Funding Information:
We are grateful to all of the families at the participating SSC sites, as well as the staff at the Simons Foundation Autism Research Initiative (SFARI). The present study was supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) KAKENHI grant numbers 19390171, 16H05242 and 19H03894. MEXT had no role in the design or execution of the study.
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate hidden heritability. We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a data set of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC data set of 712 probands and 354 controls in the replication stage. In the preliminary study, which was conducted in conventional GWAS design, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P < 5.0 × 10−8. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs controls in the replication cohort. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.
AB - Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate hidden heritability. We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a data set of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC data set of 712 probands and 354 controls in the replication stage. In the preliminary study, which was conducted in conventional GWAS design, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P < 5.0 × 10−8. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs controls in the replication cohort. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.
UR - http://www.scopus.com/inward/record.url?scp=85089287556&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089287556&partnerID=8YFLogxK
U2 - 10.1038/s41398-020-00951-x
DO - 10.1038/s41398-020-00951-x
M3 - Article
C2 - 32807774
AN - SCOPUS:85089287556
SN - 2158-3188
VL - 10
JO - Translational Psychiatry
JF - Translational Psychiatry
IS - 1
M1 - 290
ER -