Genomic Sequence Variation Markup Language (GSVML)

Jun Nakaya, Michio Kimura, Kaei Hiroi, Keisuke Ido, Woosung Yang, Hiroshi Tanaka

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


Objective: With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. Design and method: We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. Results: We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. Conclusion: GSVML was developed as a potential data exchanging format for genomic sequence variation data exchange focusing on human health applications. The international standardization of GSVML is necessary, and is currently underway. GSVML can be applied to enhance the utilization of genomic sequence variation data worldwide by providing a communicable platform between clinical and research applications.

Original languageEnglish
Pages (from-to)130-142
Number of pages13
JournalInternational Journal of Medical Informatics
Issue number2
Publication statusPublished - 2010 Feb


  • Clinical genomics
  • Data interchange
  • Global interoperability
  • Information model
  • Markup Language
  • Sequence variation

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Genomic Sequence Variation Markup Language (GSVML)'. Together they form a unique fingerprint.

Cite this