Installation

You can install HoneyBee using pip:

pip install honeybee-ml

Alternatively, you can install from source:

git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee
pip install -e .

Dependencies

HoneyBee requires the following dependencies:

  • Python 3.8+
  • PyTorch 1.9+
  • HuggingFace Transformers
  • HuggingFace Datasets
  • OpenSlide (for pathology)
  • PyDicom (for radiology)
  • NumPy, Pandas, Scikit-learn
  • CUDA-compatible GPU (recommended)

Basic Usage

Here's a simple example to get you started with HoneyBee:


from honeybee import HoneyBee

# Initialize HoneyBee
hb = HoneyBee()

# Load and process data (example with clinical text)
clinical_text = "Patient presents with stage III non-small cell lung cancer..."
processed_text = hb.process_clinical(clinical_text)

# Generate embeddings
embeddings = hb.generate_embeddings(processed_text, modality="clinical")

# Use embeddings for downstream tasks
# Example: Classification
results = hb.classify(embeddings, task="cancer_type")
print(results)

# Example: Survival analysis
survival_prediction = hb.predict_survival(embeddings)
print(survival_prediction)

Working with Different Modalities

HoneyBee supports various data modalities:

Clinical Data


from honeybee.processors import ClinicalProcessor

# Initialize processor
clinical_processor = ClinicalProcessor()

# Process clinical text
processed_text = clinical_processor.process("Patient clinical notes...")

# Generate embeddings
embeddings = clinical_processor.generate_embeddings(processed_text)

Pathology Images


from honeybee.processors import PathologyProcessor

# Initialize processor
pathology_processor = PathologyProcessor()

# Load and process whole slide image
wsi = pathology_processor.load_wsi("path/to/slide.svs")
processed_wsi = pathology_processor.process(wsi)

# Generate embeddings
embeddings = pathology_processor.generate_embeddings(processed_wsi)

Radiological Images


from honeybee.processors import RadiologyProcessor

# Initialize processor
radiology_processor = RadiologyProcessor()

# Load and process radiology image
image = radiology_processor.load_dicom("path/to/dicom_series/")
processed_image = radiology_processor.process(image)

# Generate embeddings
embeddings = radiology_processor.generate_embeddings(processed_image)

Molecular Data


from honeybee.processors import MolecularProcessor

# Initialize processor
molecular_processor = MolecularProcessor()

# Load and process molecular data
molecular_data = molecular_processor.load_data("path/to/gene_expression.csv")
processed_data = molecular_processor.process(molecular_data)

# Generate embeddings
embeddings = molecular_processor.generate_embeddings(processed_data)

Multimodal Integration

Integrate multiple modalities for comprehensive analysis:


from honeybee import HoneyBee

# Initialize HoneyBee
hb = HoneyBee()

# Generate embeddings for each modality
clinical_embeddings = hb.generate_embeddings(clinical_data, modality="clinical")
pathology_embeddings = hb.generate_embeddings(pathology_data, modality="pathology")
radiology_embeddings = hb.generate_embeddings(radiology_data, modality="radiology")
molecular_embeddings = hb.generate_embeddings(molecular_data, modality="molecular")

# Integrate embeddings
multimodal_embeddings = hb.integrate_embeddings([
    clinical_embeddings, 
    pathology_embeddings, 
    radiology_embeddings, 
    molecular_embeddings
])

# Use integrated embeddings for downstream tasks
results = hb.predict_survival(multimodal_embeddings)

Next Steps

Now that you understand the basics, you can explore the specific processing pipelines for each modality: