Monday, October 7 (Day 1) -10:00 to 17:00 h-

Satoru Miyano
miyano@ims.u-tokyo.ac.jp
Human Genome Center, Institute of Medical Science, University of Tokyo
Tokyo, Japan

We present our computational methods and their analyses in Cancer Systems Biology that use the supercomputers at Human Genome Center of The University of Tokyo and K computer at RIKEN Advanced Institute of Computational Science. The first challenge is a pipeline Genomon (https://github.com/Genomon-Project/) (Fig. 1) for cancer genome analysis that is a suite of bioinformatics tools for analyzing cancer genome data (WGS, WES, RNA-seq). It enables us to perform very sensitive and accurate detection of most types of genomic variants (single nucleotide variants, short indels, mid-size indels and large scale structural variations), and transcriptomic changes (gene fusions, aberrant splicing patterns). It adopts an efficient job scheduling framework that enables us easily analyzing several hundreds of genome and transcriptome sequencing data simultaneously. We present some of our recent contributions to cancer genomics with Genomon.

The second is computational strategy for unraveling gene networks and their diversity lying over genetic variations, mutations, environments and diseases from gene expression profiles of cancer cells. We developed methods for exhibiting how gene networks vary from patient to patient according to a modulator, which is any score representing characteristics of cells, e.g. survival, drug resistance [3-4]. We also developed a microRNA/mRNA gene network analysis with Bayesian network method that revealed subnetworks with hub genes that may switch cancer survival. On-going cancer research is also introduced, including a discovery of the first lncRNA modulating MYC gene regulation using K computer.

Kotoe Katayama
k-kataya@hgc.jp
Human Genome Center, The Institute of Medical Science, The University of Tokyo
Tokyo, Japan

The detection of somatic mutations is a crucial component of the analysis of cancer genome to the characterization of cancer. Human Genome Center has been developing a computational pipeline called Genomon for analyzing cancer genome sequence data and RNA-seq data produced from next-generation sequencers. It is already installed on the supercomputer system, SHIROKANE, at Human Genome Center and users of SHIROKANE can use Genomon immediately. Genomon efficiently detects genomic variants including single nucleotide variants, insertions, deletions and structural variants from whole genome or exome sequence data and transcriptomic changes from RNA-seq data. Furthermore, Genomon automatically produces detailed analysis reports including data qualities and summary of detected variants. In the presentation, we will introduce the methods used on Genomon pipeline, and provide a hands-on experience to the participants.

Siew-Kee (Amanda) Low
siewkee.low@jfcr.or.jp
Cancer Precision Medicine Center, Japanese Foundation for Cancer Research
Tokyo, Japan

The common disease-common variant hypothesis postulated the cumulative effects of common genetic variations, represented by single nucleotide polymorphism (SNP), are associated with the susceptibility of complex diseases, responsiveness to drugs and likelihood of adverse drug reactions (Pharmacogenomics studies). With the advancement of biotechnology and the development of tagging SNP algorithm, it is now feasible to evaluate the associations of SNPs across the genome by genome-wide association studies (GWAS). Common genetic variations are known for its small effect size and required a large-study population in order to identify significant signal in a study.

In this workshop, I will introduce the methods to perform GWAS that include
i)     Quality control (QC): Sample and SNP QC
ii)     Types of association analyses and multiple testings
iii)     Post-GWAS analyses that include eQTL, gene-enrichment pathway analysis and weighted genetic risk score.
by using atrial fibrillation as study example.

I will also discuss the challenges and points to be considered when performing
GWAS.

Reference:

1. SK Low et al., Nat Genet. 2017 Jun;49(6):953-958.
2. SK Low et al., Clin Cancer Res. 2014 May 15;20(10):2541-52.

Seiya Imoto
imoto@ims.u-tokyo.ac.jp
Health Intelligence Center, The Institute of Medical Science, The University of Tokyo
Tokyo, Japan

We introduce some statistical models such as Bayesian networks, dynamic Bayesian networks and state space models to estimate gene networks from RNA expression data and other biological data. We will explain (1) the fundamental elements of statistical gene network modeling, e.g. Bayesian networks, regression models, regularized parameter estimation methods using lasso-type shrinkage and derivation of score functions for structural learning of networks, and (2) efficient algorithms enhanced with high-performance computing on supercomputer for learning gene network structure.

We illustrate our gene network estimation method with an actual example where human endothelial cell gene networks were generated from a time course of RNA expression following treatment with the drug fenofibrate, and from 270 gene knock-downs. Finally, we succeeded in inferring the gene network related to PPAR-alpha, which is a known target of fenofibrate. Based on the analysis, we show some practical examples of computational analysis for drug target discovery. A series of computational software for our gene network estimation methods are implemented on our supercomputer system “SHIROKANE” and the information can be seen at http://sign.hgc.jp/signbn/index.html for Bayesian networks.

Tuesday, October 8 (Day 2) -10:00 to 17:00 h-

Takuya Moriyama
moriyama@hgc.jp
Human Genome Center, The Institute of Medical Science, The University of Tokyo
Tokyo, Japan

Acquired somatic mutations have a large effect on cancer evolution, and mutation profiles from multi-regional tumor sequencing data sets give helpful information to understand the tumor evolutionary process or the intratumoral heterogeneity. For better understanding of the intratumoral heterogeneity, it is important to detect subclonal mutations with lower variant allele frequencies. Therefore, researchers have developed mutation calling methods that are suitable for multi-regional tumor data sets.

Here, we introduce a Bayesian method termed MultiMuC for accurate detection of somatic mutations in multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the ‘’No-TP(True Positive)'' case: even if we could detect mutation candidates in multiple regions, no true mutations exist, unfortunately. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods, e.g., Strelka2 and NeuSomatic, for a single-regional tumor. We overcome the first drawback through evaluating the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model.

Kazuma Kiyotani
kazuma.kiyotani@jfcr.or.jp
Cancer Precision Medicine Center, Japanese Foundation for Cancer Research,
Tokyo, Japan

Advances in genomic sequencing technologies have improved our understanding of immunopharmacogenomics and allowed us to identify novel cancer-specific immune targets. Neoantigens, which are highly cancer-specific antigens generated by somatic nonsynonymous mutations in cancer cells, are considered as good targets for T cells to eradicate cancer cells. However, it is still challenging to accurately predict neoantigens, which are targeted by T cells in tumors, from genome sequence data. To predict possible targeted neoantigens, we need information on somatic mutations, gene expression, HLA genotypes. In this presentation, I will introduce our analysis pipeline to identify somatic mutations, to determine HLA genotypes, and to predict candidate neoantigens from whole-exome and RNA sequencing data.

These analyses would help to develop personalized immunotherapies targeting neoantigens.

 

 

Yao-zhong Zhang
yaozhong@ims.u-tokyo.ac.jp
Human Genome Center, The Institute of Medical Science, The University of Tokyo
Tokyo, Japan

Nanopore sequencing is a rapidly developing third-generation sequencing technology, which can generate long nucleotide reads of molecules within a portable device in real time. Through detecting the change of ion currency signals during a DNA/RNA fragment’s pass through a nanopore, genotypes are determined. Currently, the accuracy of nanopore base-calling has a higher error rate than short-read base-calling. Through utilizing deep neural networks, the-state-of-the art nanopore base-callers achieve base-calling accuracy in a range from 85% to 95%.

In this talk, we first introduce latest nanopore basecalling methods. From an algorithmic view, we illustrate how currency signals can be transformed to nucleotide bases using deep neural networks. From a practical application perspective, we demonstrate using publicly available toolkits to build a basecalling and genome assembly pipeline with the state-of-the-art performance. In the later part of this talk, we introduce how nanopore sequencing can be used for structural variant detection.

Rui Yamaguchi
r.yamaguchi@aichi-cc.jp
Division of Cancer Systems Biology, Aichi Cancer Center Research Institute,
Nagoya, Japan


Detection of differences in gene regulatory systems among multiple types of cells from gene expression data is an important task for elucidating their hallmarks, e.g., drug resistances and susceptibilities. However, there are fundamental difficulties to identify such regulatory differences in systems by simply detecting differentially expressed genes between cells, since identical systems may produce differential gene expressions. To overcome such difficulties, we developed a methodology to distinguish differentially regulated genes between case-control samples from time-course gene expression data by using a state space model (SSM) as a model of gene regulatory system. By using SSM, we can infer gene regulatory relationships and also obtain a predictive model. By employing predictive ability of SSM, we can discriminate the following two situations behind differentially expressed genes in time-course: 1) genes that are differentially expressed from the different regulatory systems for the case and control, and 2) genes that are differentially expressed from the same regulatory system but with different states of regulators. The method was applied to time-course gene expression data of human normal lung cell treated with(case)/without(control) gefitinib, an inhibitor of EGFR and found candidates of genes under differential regulations between the case and control. Furthermore, the identified gene set was applied to build a classifier for prognosis prediction of lung cancer patients and showed good performances for independent data sets. These results suggest that the proposed method is a promising tool for systems biology and precision medicine. Our tool, SiGN-SSM, for estimating gene regulatory networks and prediction models by using SSM is available as open source software http://sign.hgc.jp/signssm/.