주요메뉴 바로가기 본문 바로가기

주메뉴

IBS Conferences

Scientists Tackle Single-Cell Data’s “Reliability Crisis” with New Tool ‘scICE’

– scICE validates clustering results up to 30× faster, resolving instability in one of biology’s most widely used analysis methods –

The ability to analyze gene expression at the single-cell level — known as single-cell RNA sequencing (scRNA-seq) — has transformed life sciences, driving discoveries across immunology, oncology, and developmental biology. Over 40,000 studies have leveraged this technique to map the complex diversity of cells within tissues and organisms.

Yet beneath this explosive growth lies a persistent problem: clustering instability. When researchers attempt to group cells by expression patterns to identify cell types or disease states, they often face inconsistent results — even when analyzing the same dataset repeatedly.

Inaccurate clustering can lead to misclassifying normal cells as cancerous or missing rare but critical cell types — jeopardizing interpretation and therapeutic decisions. This “reliability crisis” forces scientists to rerun analyses or rely on computationally expensive pipelines to extract trustworthy insights.

Now, a research team led by Professor KIM Jae Kyoung of the Korea Advanced Institute of Science and Technology (KAIST) and the Institute for Basic Science (IBS) has developed a solution: a mathematical framework named scICE (single-cell Inconsistency Clustering Estimator).

Traditionally, clustering reliability is assessed by deriving a consensus through repeated analysis of whether individual cell pairs are classified into the same cluster. However, this approach is a computationally demanding process, ill-suited for large-scale datasets with tens of thousands of cells.

In contrast, scICE can be applied to large-scale datasets as it bypasses the computationally demanding process of pairwise co-clustering. It instead employs a mathematically defined Inconsistency Coefficient (IC) to assess the stability of cell assignments directly. This allows the tool to efficiently detect and filter out unreliable assignments, preserving only the most stable and biologically meaningful clusters.

Dr. KIM Hyun, the first author of the paper (IBS), explained, “Reliability in single-cell clustering has long been overlooked. scICE opens a new path for quickly and easily verifying results.”

The research team validated the effectiveness of scICE by applying it to 48 real and simulated scRNA-seq datasets collected from various tissues, including the brain, lungs, and blood. The results revealed that approximately two-thirds of existing analyses were statistically unstable and unreliable. Meanwhile, scICE efficiently selected only a small number of reliable results, saving researchers' time and computational resources while maintaining high accuracy.

scICE provides a way to validate clustering outcomes mathematically, ensuring higher confidence in conclusions drawn from single-cell data. Additionally, scICE has drawn attention for its ability to effectively detect rare cell types, which are often overlooked by conventional clustering methods. In practice, scICE reliably identified rare immune cells that can be easily missed in conventional analyses, using subclustering based on its framework.

Corresponding author Professor KIM Jae Kyoung stated, “scICE will help researchers swiftly pursue follow-up studies based on reliable results. I hope it becomes a standard tool for trustworthy data interpretation across the life sciences.”

The research team made scICE publicly available on [GitHub](https://github.com/Mathbiomed/scICE) for easy access by all users.


Figure 1. Cell clustering can vary wildly depending on algorithm settings like the random seed — even with the exact same data. scICE automatically detects and removes unstable groupings, giving researchers results they can trust.
Figure 1.
Cell clustering can vary wildly depending on algorithm settings like the random seed — even with the exact same data. scICE automatically detects and removes unstable groupings, giving researchers results they can trust.

Figure 2. Overview of the scICE method.
(a) scICE begins by selecting a range of “resolution” values that produce a specific target number of clusters. These resolutions are then tested to find the most stable one.
(b) To measure stability, scICE calculates the Inconsistency Coefficient (IC) for each resolution setting — with an IC of 1.0 representing perfect stability, and higher values indicating greater inconsistency. The resolution with the lowest IC is selected as the most stable for that cluster number.
(c) The selected result is further validated by repeating the clustering multiple times to confirm that its IC remains low and stable across tests.
(d) scICE identifies a “consistent clustering set”: only cluster numbers that consistently show low IC values are accepted as reliable, while unstable results are excluded. This ensures that only the most trustworthy cluster groupings are reported.
Figure 2.
Overview of the scICE method.
(a) scICE begins by selecting a range of “resolution” values that produce a specific target number of clusters. These resolutions are then tested to find the most stable one.
(b) To measure stability, scICE calculates the Inconsistency Coefficient (IC) for each resolution setting — with an IC of 1.0 representing perfect stability, and higher values indicating greater inconsistency. The resolution with the lowest IC is selected as the most stable for that cluster number.
(c) The selected result is further validated by repeating the clustering multiple times to confirm that its IC remains low and stable across tests.
(d) scICE identifies a “consistent clustering set”: only cluster numbers that consistently show low IC values are accepted as reliable, while unstable results are excluded. This ensures that only the most trustworthy cluster groupings are reported.

Figure 3. Using scICE, the researchers detected rare macrophage subtypes in mouse GWAT data (top) and resolved critical pneumocyte populations in SARS-CoV-2-infected mouse lung data (bottom).
Figure 3.
Using scICE, the researchers detected rare macrophage subtypes in mouse GWAT data (top) and resolved critical pneumocyte populations in SARS-CoV-2-infected mouse lung data (bottom).

Notes for editors

- References
Kim, H., Park, I., Park, J.E., Kim, J.K., Seo, M. & Kim, J.K. scICE: Enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation. Nat. Commun. (2025).


- Media Contact
For further information or to request media assistance, please contact William I. Suh at the IBS Public Relations Team (willisuh@ibs.re.kr).


- About the Institute for Basic Science (IBS)
IBS was founded in 2011 by the government of the Republic of Korea with the sole purpose of driving forward the development of basic science in South Korea. IBS has 8 research institutes and 33 research centers as of May 2025. There are nine physics, three mathematics, five chemistry, seven life science, two earth science, and seven interdisciplinary research centers.


Research

Are you satisfied with the information on this page?

Content Manager
Public Relations Team : Yim Ji Yeob   042-878-8173
Last Update 2023-11-28 14:20