Welcome to HoneyBee

A scalable modular framework for building multimodal oncology datasets

ArXiv GitHub Datasets

Data Acquisition and Integration

HoneyBee extends data integration capabilities by incorporating preprocessing steps to ensure data quality and compatibility across modalities.

Embedding Generation

Foundation models are used to generate embeddings from raw medical data, facilitating various downstream tasks such as similarity search, clustering, and ML model training.

Data Storage and Accessibility

The generated embeddings are stored using the Hugging Face datasets library, organized in a structured format for easy access and integration into ML pipelines.