Research Program

Unraveling the molecular complexity of B-ALL

Integrating advanced computational analysis with rigorous biological validation to drive precise diagnostics and targeted therapies.

Gu Lab Research

Overarching Goal

We aim to advance the understanding of the genetic and epigenetic alterations that drive the pathogenesis of different leukemias, and to develop faithful experimental models and novel diagnostic and therapeutic approaches.

Research Background

Our research focuses on acute lymphoblastic leukemia (ALL), the most common pediatric cancer. ALL comprises dozens of distinct subtypes, each characterized by unique genetic alterations and gene expression profiles. Precise classification of these subtypes is critical for accurate prognosis and effective treatment strategy.

In our recent work, we conducted an integrative genomic analysis of over 3,000 RNA-seq samples of B-cell ALL (B-ALL) and developed a molecular classification system based solely on bulk RNA-seq (Hu et al., 2024). This system enables automated classification of all known B-ALL subtypes and has been widely adopted in both basic research and clinical practice.

Beyond B-ALL classification, our lab has extensive expertise in genomics and bioinformatics. We develop computational tools powered by machine learning models to identify novel molecular markers of each B-ALL subtype, which further advances our understanding of leukemia genomics. Some of these tools are specifically designed for B-ALL, reflecting our in-depth knowledge of this highly aggressive disease. On the experimental side, we employ both in vitro and in vivo models to study how these molecular markers contribute to leukemia initiation, progression, and drug response. By dissecting the underlying mechanisms, we aim to develop and test novel therapeutic strategies that could lead to more effective treatments for high-risk B-ALL subtypes.

By integrating advanced computational analysis with biological validation, we aim to unravel the molecular complexities of B-ALL and drive the development of more precise diagnostic tools and targeted therapies.

Research Support (ongoing & completed)

NIH/NCI, Research Project Grant (R01CA303874)Gu (PI)09/01/2025 – 08/31/2030
NIH/NCI, MERIT Award (R37CA300358)Gu (PI)08/18/2025 – 07/31/2030
Pediatric Cancer Research Foundation, Translational Research GrantGu (PI)01/01/2024 – 12/31/2025
The V Foundation, V Scholar AwardGu (PI)09/01/2023 – 08/31/2026
Leukemia Research Foundation, New Investigator GrantGu (PI)07/01/2023 – 06/30/2025
Andrew McDonough B+ Foundation, Childhood Cancer Research GrantGu (PI)01/01/2023 – 12/31/2024
NIH/NCI, Pathway to Independence Award (K99/R00CA241297)Gu (PI)07/01/2019 – 06/30/2023
Leukemia & Lymphoma Society, Special Fellow AwardGu (PI)07/01/2018 – 12/31/2021
American Society of Hematology, Scholar AwardGu (PI)07/01/2018 – 06/30/2019
MD-ALL Platform
Direction 01 · Platform

MD-ALL platform & fusion/SV callers

Advances in RNA-seq have revealed novel B-ALL subtypes and improved risk stratification, yet implementing these advances in research and clinical settings remains challenging.

MD-ALL (Molecular Diagnosis of ALL) is an integrative computational platform for accurate, comprehensive B-ALL classification. Using machine learning, it identifies key feature genes and classifies subtypes from gene expression and genetic alterations — integrating sequence mutations, copy number variations, and gene rearrangements for definitive classification. It outperformed existing tools across a validation cohort of 974 samples.

Because fusions and structural variations are the most common drivers in B-ALL, we are also building highly sensitive, accurate fusion/SV callers from RNA-seq, customized for B-ALL.

B-cell Differentiation
Direction 02 · Development

B-cell differentiation & BCR development

B-ALL is caused by a blockage of B-cell differentiation. B cells originate from hematopoietic stem cells and progress through several stages to become mature B cells with a functional B cell receptor (BCR), composed of immunoglobulin heavy and light chains.

Our work shows that different B-ALL subtypes correlate with different stages of B-cell differentiation and BCR development. This opens an opportunity to study the regulation of the 3D structure of IgH/IgL gene organization and the roles of key factors such as PAX5, WAPL, cohesin, CTCF, RAG1, and RAG2.

PAX5 B-ALL Subtypes
Direction 03 · Mechanism

PAX5alt and PAX5 P80R B-ALL subtypes

PAX5 mutations define two distinct B-ALL subtypes with markedly different clinical outcomes, but the underlying mechanisms remain largely unknown. To study PAX5's role in B-ALL initiation and progression, we developed multiple genetically engineered knock-in mouse models.

Using single-cell multi-omics (scRNA-seq, scBCR-seq, and single-cell mutational profiling), we identified the stepwise mutagenesis and leukemogenesis process — from WT Pax5 allele deletion to the acquisition of secondary mutations (e.g., Jak mutations) leading to overt leukemia. Leveraging our mouse models and PDX samples, we uncovered mechanisms behind the distinct outcomes of different PAX5 mutations, informing future precision-medicine strategies.

A978 in High-Risk B-ALL
Direction 04 · Therapy

Driver molecular markers in high-risk B-ALL

Philadelphia chromosome-positive (Ph) and Ph-like B-ALL are high-risk subtypes with very poor outcomes. Through large-scale transcriptomic analysis, we identified A978 as the most significant RNA marker of Ph/Ph-like B-ALL.

To investigate its role, we employ growth-competition assays, scRNA-seq, Bio-ID proximity labeling, dTAG fast degradation, CUT&RUN/CUT&Tag and 4C assays, and (immuno-)TEM to localize A978 within cellular compartments. For translation, we designed a CpG-conjugated siRNA (CpG-siRNA) to specifically target A978 in Ph B-ALL, tested in vitro and in vivo in PDX models with significant therapeutic potential.

Molecular Markers
Direction 05 · Discovery

Novel molecular markers in B-ALL subtypes

B-ALL is driven by diverse genetic and molecular changes. RNA-based markers help classify subtypes and reveal how they contribute to development, progression, drug response, and therapeutic vulnerability.

  • Long noncoding RNA (lncRNA): long RNAs that don't code for proteins but regulate gene expression; some promote leukemia by altering signaling or blocking B-cell development.
  • Circular RNA (circRNA): stable, circular RNAs that act as molecular sponges, trapping other RNAs or proteins to regulate gene activity.
  • Alternative splicing: lets a single gene produce different protein versions by including or skipping RNA segments.
  • Alternative polyadenylation (APA): controls mRNA 3′-end length, affecting protein output and RNA stability; changes can make leukemia cells more aggressive or drug-resistant.