Ethical statement
Written informed consent was obtained from all patients, and all aspects of this study were approved by the Institutional Review Board of Shizuoka Cancer Center (authorization number 25–33). In this study, pathogenic germline mutations could be unintentionally predicted from retrospective FFPE specimens. To avoid disadvantaging specimen donors, we implemented appropriate informed consent with the approval of the Ethics Review board, including the possibility of secondary findings, such as those found in blood-based constitutional analysis. All the experiments using clinical samples were performed in accordance with the approved Japanese ethical guidelines (Human Genome/Gene Analysis Research, 2017, provided by the Ministry of Health, Labor, and Welfare; https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/hokabunya/kenkyujigyou/i-kenkyu/index.html).
Clinical samples
Two diffuse-type and two intestinal gastric cancers were extracted from the Japanese pan-cancer cohort (project HOPE) comprising 5521 tumor specimens3. These samples were clinicopathologically diagnosed by a pathologist after surgery. Tumors were dissected from surgical specimens immediately after resection of the lesion at the Shizuoka Cancer Center Hospital, and then the specimens were stored as FFPE tissue. In addition, peripheral blood was collected as a paired control to exclude germline mutations. Details of experimental protocols have been previously described3,6,9,14,15,16. Briefly, DNA was extracted from tissues and peripheral blood samples using a QIAamp DNA Blood Mini Kit (Qiagen, Venlo, The Netherlands). Purified DNA was quantified using a NanoDrop and Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA).
Dissociation and suspension of FFPE tissue samples
In the cohort, we selected relatively new samples with low tumor content and high tumor mutational burden (TMB) for whole exome sequencing to avoid the effect of DNA degradation. A FFPE tissue block of gastric cancers was cut into 10, 20, and 50 μm thick sections. These sections were dewaxed by 10 min incubation in xylene thrice and then rehydrated by 30 s incubation sequentially in each of the following dilutions of ethanol: 100% (two times), 70%, 50%, and 30%. The hydration process was completed with 30 s incubations in deionized water. The dewaxed samples were suspended using gentleMACS Octo Dissociator with Heaters (program, 37C_FFPE_1; Miltenyi Biotec, Bergisch Gladbach, Germany), after heat-induced antigen retrieval was performed according to the manufacturer’s protocol.
Isolation and staining of cells
Fully automated cell labeling and separation was performed using the autoMACS Pro Separator (Miltenyi Biotec) according to the manufacturer’s protocol. The cell suspensions derived from FFPE tissue sections were separated using the Anti-Cytokeratin MicroBeads (Miltenyi Biotec), and were stained using the anti-cytokeratin-FITC (clone REA831, Miltenyi Biotec), anti-vimentin-APC (clone REA409, Miltenyi Biotec), and CD235a (Glycophorin A)-PE (clone REA175, Miltenyi Biotec) antibodies. The nuclei were stained with the DAPI Staining Solution (Miltenyi Biotec).
DNA isolation
DNA was extracted from FFPE tissue and peripheral blood samples using a GeneRead DNA FFPE Kit and QIAamp DNA blood Mini Kit (Qiagen), respectively. Purified DNA was quantified using a NanoDrop and Qubit 2.0 Fluorometer (Thermo Fisher Scientific). To check the quality of DNA, DIN was determined using TapeStation (Agilent Technologies, Santa Clara, CA).
Targeted sequencing of the gene panel
For targeted sequencing genes in DNA isolated from the FFPE tissue, a library of 225 genes (listed in Table S1) was constructed using a hybridization-based enrichment protocol (SureSelect Custom panel, Agilent). In total, 2.427 Mb of the human genome, including 0.723 Mb exon regions of RefSeq genes, were encompassed by 55,765 biotinylated RNA oligomers (120 bp length). Binary raw data derived from the sequencer were converted into sequence reads using bcl2fastq (ver. 2.20, Illumina) that were mapped to the reference human genome (UCSC hg19). Genomic alterations were identified using VarDictJava (https://github.com/AstraZeneca-NGS/VarDictJava)17. To reduce false-positive findings, mutations fulfilling any of following criteria were eliminated: (1) quality score < 20; (2) depth of coverage < 100; (3) depth of coverage for the alternate allele < 5; (4) VAF < 0.5%; (5) not fitting the filtering criteria of the variant caller (the FILTER field of the VCF record was not “PASS”). After annotating the mutations, those with an allele frequency of 1% or more in any of the databases shown below were excluded as common SNVs: (1) the 1000 genomes project (global or East Asia); (2) ExAC; (3) gnomAD. In addition, mutations that appeared to affect protein structure, namely missense variants, splice acceptor variants, splice donor variants, splice region variants, stop-gain variants, stop-lost variants, stop-retained variants, 5′-untranslated region premature start codon gain variants, exon-loss variants, disruptive inframe deletions, disruptive inframe insertions, frameshift variants, inframe deletions, inframe insertions, or initiator codon variants, were extracted. To ensure reproducibility of the sequencing, mutations with VAF ≥ 3% were defined as valid mutations. The tumor content was estimated by All-FIT algorithm based on tumor-only sequencing data18. All mutations identified as somatic were manually verified using the Integrative Genomics Viewer (IGV, https://software.broadinstitute.org/software/igv/).
Whole-exome sequencing
To accurately distinguish germline mutations without an estimation based on databases, we used a pipeline constructed by us3. In brief, the exome library was constructed using an Ion Torrent AmpliSeq RDY Exome Kit (Thermo Fisher Scientific). The exome library supplied 292,903 amplicons covering 57.7 Mb of the human genome, comprising 34.8 Mb of exonic sequences from 18,835 genes registered in RefSeq. Raw binary data produced by sequencers were processed using the Torrent Suite Software (ver.5, Thermo Fisher Scientific). Processed sequence reads were mapped to the reference human genome (UCSC hg19) and genomic alterations were identified using the Torrent Variant Caller (ver.5, Thermo Fisher Scientific). To avoid sequencer- and amplicon-derived errors, arbitrary somatic mutations (VAF ≥ 10%) were manually inspected using the IGV, and somatic mutation candidates containing multiple nucleotide variations (~ 1000 sites) were validated by Sanger sequencing.
Statistical analysis
A significant difference in read depth and VAF (including VAF ratio) was determined using the Welch’s t-test. Bonferroni correction was performed for multiple comparisons. A P-value < 0.01 was considered significant.

