Home

Back

Publications

Peer-reviewed

BB

DiffusionST: a deep generative diffusion model-based framework for enhancing spatial transcriptomics data quality and identifying spatial domains [Link]

Briefings in Bioinformatics

Cui, Y.*, Cui, Y.*, Wang, R., Zhu, Z., Zeng, X., Nakai, K., Cui, F., Zhang, Z., Shi, H., Chen, Y., Ye, X., Sakurai, T., & Wei, L.

Abstract

Recent advancements in spatial transcriptomics (ST) technology have generated substantial volumes of spatial transcriptome data. However, the quality of this data is often compromised due to the limitations of current sequencing technologies. To address this issue, DiffusionST proposes a method for imputing ST data and clustering the imputed data. The method employs a graph convolutional network model combined with a newly designed loss function, denoising data using the zero-inflated negative binomial distribution, and data enhancement through a diffusion model to improve clustering accuracy. DiffusionST demonstrates superior clustering accuracy compared to eight of the most popular ST clustering algorithms. DiffusionST also excels in data imputation when compared to five single-cell RNA sequencing imputation algorithms. Additionally, DiffusionST’s robustness against noise is quantitatively validated by manually introducing random dropout noise into the dataset, where our model significantly enhances the quality of ST data. Moreover, DiffusionST is well-suited for high-resolution ST data and has been demonstrated, through survival analysis and cell–cell communication studies, to dissect spatial domains within breast cancer tissues. These findings provide strong evidence of DiffusionST’s efficacy in handling ST data especially with strong noise, making it a valuable tool in this field.

Methods

OmniClust: A versatile clustering toolkit for single-cell and spatial transcriptomics data [Link]

Methods

Cui, Y.*, Cui, Y.*, Ding, Y., Nakai, K., Wei, L., Le, Y., Ye, X., & Sakurai, T.

Abstract

In recent years, RNA transcriptome sequencing technology has been continuously evolving, ranging from single-cell transcriptomics to spatial transcriptomics. Although these technologies are all based on RNA sequencing, each sequencing technology has its own unique characteristics, and there is an urgent need to develop an algorithmic toolkit that integrates both sequencing techniques. To address this, we have developed OmniClust, a toolkit based on single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. OmniClust employs deep learning algorithms for feature learning and clustering of spatial transcriptomics data, while utilizing machine learning algorithms for clustering scRNA-seq data. OmniClust was tested on 12 spatial transcriptomics benchmark datasets, demonstrating high clustering accuracy across multiple clustering evaluation metrics. It was also evaluated on four scRNA-seq benchmark datasets, achieving high clustering accuracy based on various clustering evaluation metrics. Furthermore, we applied OmniClust to downstream analyses of spatial transcriptomics and single-cell RNA breast cancer data, showcasing its potential to uncover and interpret the biological significance of cancer transcriptome data. In summary, OmniClust is a clustering tool designed for both single-cell transcriptomics and spatial transcriptomics data, demonstrating outstanding performance.

NC

STAIG: Spatial transcriptomics analysis via image-aided graph contrastive learning for domain exploration and alignment-free integration [Link]

Nature Communications

Yang, Y., Cui, Y., Zeng, X., Zhang, Y., Loza, M., Park, S. J., & Nakai, K.

Abstract

Spatial transcriptomics is an essential application for investigating cellular structures and interactions and requires multimodal information to precisely study spatial domains. Here, we propose STAIG, a deep-learning model that integrates gene expression, spatial coordinates, and histological images using graph-contrastive learning coupled with high-performance feature extraction. STAIG can integrate tissue slices without prealignment and remove batch effects. Moreover, it is designed to accept data acquired from various platforms, with or without histological images. By performing extensive benchmarks, we demonstrate the capability of STAIG to recognize spatial regions with high precision and uncover new insights into tumor microenvironments, highlighting its promising potential in deciphering spatial biological intricates.

NAR

Comparative single-cell transcriptomic analysis reveals putative differentiation drivers and potential origin of vertebrate retina [Link]

NAR Genomics and Bioinformatics

Zeng, X., Gyoja, F., Cui, Y., Loza, M., Kusakabe, T. G., & Nakai, K.

Abstract

Despite known single-cell expression profiles in vertebrate retinas, understanding of their developmental and evolutionary expression patterns among homologous cell classes remains limited. We examined and compared approximately 240 000 retinal cells from four species and found significant similarities among homologous cell classes, indicating inherent regulatory patterns. To understand these shared patterns, we constructed gene regulatory networks for each developmental stage for three of these species. We identified 690 regulons governed by 530 regulators across three species, along with 10 common cell class-specific regulators and 16 highly preserved regulons. RNA velocity analysis pinpointed conserved putative driver genes and regulators to retinal cell differentiation in both mouse and zebrafish. Investigation of the origins of retinal cells by examining conserved expression patterns between vertebrate retinal cells and invertebrate Ciona intestinalis photoreceptor-related cells implied functional similarities in light transduction mechanisms. Our findings offer insights into the evolutionarily conserved regulatory frameworks and differentiation drivers of vertebrate retinal cells.

FI

Computational analysis of the functional impact of MHC-II-expressing triple-negative breast cancer [Link]

Frontiers in Immunology

Cui, Y., Zhang, W., Zeng, X., Yang, Y., Park, S. J., & Nakai, K.

Abstract

The tumor microenvironment (TME) plays a crucial role in tumor progression and immunoregulation. Major histocompatibility complex class II (MHC-II) is essential for immune surveillance within the TME. While MHC-II genes are typically expressed by professional antigen-presenting cells, they are also expressed in tumor cells, potentially facilitating antitumor immune responses. To understand the role of MHC-II-expressing tumor cells, we analyzed triple-negative breast cancer (TNBC), an aggressive subtype with poor prognosis and limited treatment options, using public bulk RNA-seq, single-cell RNA-seq, and spatial transcriptomics datasets. Our analysis revealed a distinct tumor subpopulation that upregulates MHC-II genes and actively interacts with immune cells. We implicated that this subpopulation is preferentially present in proximity to regions in immune infiltration of TNBC patient cohorts with a better prognosis, suggesting the functional importance of MHC-II-expressing tumor cells in modulating the immune landscape and influencing patient survival outcomes. Remarkably, we identified a prognostic signature comprising 40 significant genes in the MHC-II-expressing tumors in which machine leaning models with the signature successfully predicted patient survival outcomes and the degree of immune infiltration. This study advances our understanding of the immunological basis of cancer progression and suggests promising new directions for therapeutic strategies.

BB

HyGAnno: hybrid graph neural network–based cell type annotation for single-cell ATAC sequencing data [Link]

Briefings in Bioinformatics

Zhang, W., Cui, Y., Liu, B., Loza, M., Park, S. J., & Nakai, K.

Abstract

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference–target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.

Preprints

bioRxiv

DenoiseST: A dual-channel unsupervised deep learning-based denoising method to identify spatial domains and functionally variable genes in spatial transcriptomics [Link]

bioRxiv

Cui, Y., Wang, R., Zeng, X., Cui, Y., Zhu, Z., Nakai, K., Ye, X., Sakurai, T. & Wei, L.

Abstract

Spatial transcriptomics provides a unique opportunity for understanding cellular organization and function in a spatial context. However, spatial transcriptome exists the problem of dropout noise, exposing a major challenge for accurate downstream data analysis. Here, we proposed DenoiseST, a dual-channel unsupervised adaptive deep learning-based denoising method for data imputing, clustering, and identifying functionally variable genes in spatial transcriptomics. To leverage spatial information and gene expression profiles, we proposed a dual-channel joint learning strategy with graph convolutional networks to sufficiently explore both linear and nonlinear representation embeddings in an unsupervised manner, enhancing the discriminative information learning ability from the global perspectives of data distributions. In particular, DenoiseST enables the adaptively fitting of different gene distributions to the clustered domains and employs tissue-level spatial information to accurately identify functionally variable genes with different spatial resolutions, revealing their enrichment in corresponding gene pathways. Extensive validations on a total of 18 real spatial transcriptome datasets show that DenoiseST obtains excellent performance and results on brain tissue datasets indicate it outperforms the state-of-the-art methods when handling artificial dropout noise with a remarkable margin of ∼15%, demonstrating its effectiveness and robustness. Case study results demonstrate that when applied to identify biological structural regions on human breast cancer spatial transcriptomic datasets, DenoiseST successfully detected biologically significant immune-related structural regions, which are subsequently validated through Gene Ontology (GO), cell-cell communication, and survival analysis. In conclusion, we expect that DenoiseST is a novel and efficient method for spatial transcriptome analysis, offering unique insights into spatial organization and function.

Posters

Computational Transcriptomic Analysis Reveals MHC-II Expressing Tumor Cells Influence Immune Surveillance and Prognostic Outcomes in the Tumor Microenvironment of Triple-Negative Breast Cancer

Cui, Y., Zhang, W., Zeng, X., Yang, Y., Park, S. J., & Nakai, K.

1st Asia & Pacific Bioinformatics Joint Conference (APBJC2024), Oct 22-25, 2024, Okinawa, Japan.

Computational Transcriptomic Analysis Identifies a Novel Immune-dysregulated TNBC Subtype Regulated by STAT3

Cui, Y., Nakai, K.

The 47th Annual Meeting of the Molecular Biology Society of Japan (MBSJ2024), Dec 4-7, 2024, Kobe, Japan.

Unveiling a Distinct Immunological Characteristic and a Prognostic Prediction Model in Breast Cancer through Integrated Transcriptomic Analysis

Cui, Y., Nakai, K.

The 21st Awaji International Forum on Infection and Immunity, Sep 3-6, 2023, Karuizawa, Japan.