Datasets

The HoneyBee framework includes large-scale public datasets, including TCGA, with processed embeddings available for various oncology applications.

TCGA Dataset

The Cancer Genome Atlas (TCGA) dataset contains over 11,000 samples, spanning 33 cancer types, and includes clinical data, pathology images, radiology images, and molecular data.

Hugging Face Integration

All processed datasets are available on the Hugging Face platform, enabling researchers to access high-quality embeddings and integrate them into their workflows.