My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The Office of Cancer Genomics at the National Cancer Institute is sponsoring a series of studies as part of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing technologies in cancer. The CGCI program includes comprehensive characterization of the genetic aberrations found in different pediatric and/or adult tumors.

CGCI is currently characterizing a number of B-cell non-Hodgkin lymphomas (including diffuse large B-cell lymphoma (DLBCL) from patients with and without HIV+ infection, follicular lymphoma (FL), as well as adult and pediatric Burkitt lymphomas), medulloblastoma (MB), additional HIV-associated tumors (including lung and cervical cancers); additional tumor types may be characterized in the future. All data from these projects will be released into publicly accessible databases, with a majority of data in an open-access tier. A subset of data will be available only through a controlled-access tier due to patient privacy concerns.

Individual project descriptions are available by disease on the substudy pages, (can be found on the righthand side of this page), however brief summaries are as follows:

  • Non-Hodgkin Lymphoma (NHL) - CGCI investigators are probing genomic alterations more deeply than has been previously possible by using state-of-the-art RNA sequencing (mRNA-seq) and whole genome shotgun sequencing (WGS) coupled with leading edge bioinformatics, data management and analysis approaches. To date the project has sequenced tumor DNA and/or RNA from 117 NHL tumor samples and 10 cell lines. This includes the genomes or exomes of 1 Follicular Lymphoma (FL) and 13 diffuse large B-cell lymphoma (DLBCL) cases, all with matched constitutional DNA sequenced to comparable depths, RNA-sequencing (mRNA-seq) of 92 DLBCL, 12 FL and 8 B-cell NHL cases with other histologies and 10 DLBCL-derived cell lines. The DLBCL cases and cell-lines are from the two major subtypes of DLBCL: germinal center B-cell (GCB) and activated B-cell (ABC).

  • Medulloblastoma (MB) - In order to identify the genetic alterations in MB, copy number alterations were sought using high-density microarrays and sequenced all known protein-coding genes and miRNA genes using Sanger sequencing in a set of 22 pediatric MB samples and one matched normal blood sample. All tumor samples were obtained at the time of original surgery (pre-treatment) except for one sample, which was obtained at the time of MB recurrence. The protein encoding transcripts were supplemented with microRNA transcripts downloaded from the Sanger miRBase Sequence Database (Release 13.0) in order to yield a combined set of transcripts representing 24,893 genes (24,178 protein encoding and 715 microRNA). The regions of interest (ROIs) targeted for sequencing comprised the entire transcribed portion of the microRNA exons and the protein encoding portion plus 4 bases of flanking sequence for the protein encoding exons. Illumina Infinium II Whole Genome Genotyping Assay employing the BeadChip platform was used to analyze the same set of tumor samples at 1,199,187 (1M-Duo) SNP loci in order to detect copy number alterations in the same set of tumors.

  • HIV+ Tumor Molecular Characterization Project (HTMCP) - This project is a joint effort of the Office of Cancer Genomics (OCG) and the Office of HIV and AIDS Malignancy (OHAM). Its goals are to characterize HIV-associated cancers (obtained from HIV-infected patients) and compare them to the same types of cancers from patients without HIV infection. Investigators will perform 30X genome sequencing of 100 cases of paired tumor and germline DNA, along with transcriptome sequencing in each of 3 types of HIV+ tumors (DLBCL, lung and cervical cancers). These platforms allow discovery of mutations both in coding and non-coding genomic regions, gene expression and genomic alterations (including translocations, insertions and deletions). Comparing tumors of cancer patients both with and without HIV-infection will provide insight into the potential function of this virus in certain cancers.

  • Burkitt Lymphoma Genome Sequencing Project (BLGSP) - This project is a collaborative effort between the National Cancer Institute and the Foundation for Burkitt Lymphoma Research to develop a databank of the many alterations found in Burkitt lymphoma (BL), an uncommon type of Non-Hodgkin lymphoma that occurs most often in children and young adults. The goal of the BLGSP is to explore potential genetic changes in patients with BL that could lead to better prevention, detection and treatment of the cancer. The project will characterize the alterations of the tumors' genomes (with matched normal as control) and transcriptomes by sequencing the DNA and RNA of each case. Using the data generated, the ultimate goals of the project are to discover the molecular changes that are present in BL patients and then determine how those changes correlate with treatment regimen and outcome.

As described above, the projects currently involved in CGCI will provide various data to include whole genomic, transcriptomic (mRNA-seq and miRNA-seq) and mutational analyses of the tumor types being studied. This page will be amended as additional projects and characterization platforms are added to the CGCI portfolio.

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

All specimens and all clinical and laboratory data gathered for this project meet the strict set of criteria established by The Cancer Genome Atlas (TCGA). In particular, the following specific criteria will be met.

  1. Focus on primary untreated tumors that were snap frozen upon tissue resection.
  2. All samples are collected and utilized following strict human subjects protection guidelines, informed consent and IRB reviewed protocols.
  3. Whenever possible, clinical data are gathered prospectively and stored in continuously updated electronic format using a standard relational database (MS Access) employing caDSR compliant terminology and from which the data can be easily exported.

Additional information on specimen inclusion and exclusion criteria for the specific tumor types investigated as part of CGCI can be found on the CGCI website and within referenced publications for this initiative.

Study History

Cancer is a genetic disease. Alterations at the DNA level drive the cellular changes that are hallmarks of cancer including aberrant cell division and survival. Historically, genetic causes of cancer were studied by analysis of one or a few genes at a time. More recently however, novel high-throughput technologies have provided unprecedented capabilities to examine the cancer genome. These technologies allow systematic characterization of genetic and epigenetic alterations, allowing investigators to identify the underlying genetic changes found in cancer. The CGCI incorporates multiple approaches for genomic characterization including exome sequencing and transcriptome analysis using next generation sequencing. To encourage collaboration and leverage the collective knowledge and innovation of the entire cancer research community, all data collected will be publicly available through databases supported by the National Institutes of Health and National Cancer Institute.

NHL, Medulloblastoma, and some HTMCP and BLGSP cases have data available at NCBI and NCI repositories. Sequence data stored in the NCBI databases can be accessed through the dbGaP parent and substudy sites. Additional data can be accessed through CGCI Data Matrix (

Selected publications
Diseases/Traits Related to Study (MESH terms)
Links to Related Resources
Authorized Data Access Requests
Study Attribution