Installation
You can install HoneyBee using pip:
pip install honeybee-ml
Alternatively, you can install from source:
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee
pip install -e .
Dependencies
HoneyBee requires the following dependencies:
- Python 3.8+
- PyTorch 1.9+
- HuggingFace Transformers
- HuggingFace Datasets
- OpenSlide (for pathology)
- PyDicom (for radiology)
- NumPy, Pandas, Scikit-learn
- CUDA-compatible GPU (recommended)
Basic Usage
Here's a simple example to get you started with HoneyBee:
from honeybee import HoneyBee
# Initialize HoneyBee
hb = HoneyBee()
# Load and process data (example with clinical text)
clinical_text = "Patient presents with stage III non-small cell lung cancer..."
processed_text = hb.process_clinical(clinical_text)
# Generate embeddings
embeddings = hb.generate_embeddings(processed_text, modality="clinical")
# Use embeddings for downstream tasks
# Example: Classification
results = hb.classify(embeddings, task="cancer_type")
print(results)
# Example: Survival analysis
survival_prediction = hb.predict_survival(embeddings)
print(survival_prediction)
Working with Different Modalities
HoneyBee supports various data modalities:
Clinical Data
from honeybee.processors import ClinicalProcessor
# Initialize processor
clinical_processor = ClinicalProcessor()
# Process clinical text
processed_text = clinical_processor.process("Patient clinical notes...")
# Generate embeddings
embeddings = clinical_processor.generate_embeddings(processed_text)
Pathology Images
from honeybee.processors import PathologyProcessor
# Initialize processor
pathology_processor = PathologyProcessor()
# Load and process whole slide image
wsi = pathology_processor.load_wsi("path/to/slide.svs")
processed_wsi = pathology_processor.process(wsi)
# Generate embeddings
embeddings = pathology_processor.generate_embeddings(processed_wsi)
Radiological Images
from honeybee.processors import RadiologyProcessor
# Initialize processor
radiology_processor = RadiologyProcessor()
# Load and process radiology image
image = radiology_processor.load_dicom("path/to/dicom_series/")
processed_image = radiology_processor.process(image)
# Generate embeddings
embeddings = radiology_processor.generate_embeddings(processed_image)
Molecular Data
from honeybee.processors import MolecularProcessor
# Initialize processor
molecular_processor = MolecularProcessor()
# Load and process molecular data
molecular_data = molecular_processor.load_data("path/to/gene_expression.csv")
processed_data = molecular_processor.process(molecular_data)
# Generate embeddings
embeddings = molecular_processor.generate_embeddings(processed_data)
Multimodal Integration
Integrate multiple modalities for comprehensive analysis:
from honeybee import HoneyBee
# Initialize HoneyBee
hb = HoneyBee()
# Generate embeddings for each modality
clinical_embeddings = hb.generate_embeddings(clinical_data, modality="clinical")
pathology_embeddings = hb.generate_embeddings(pathology_data, modality="pathology")
radiology_embeddings = hb.generate_embeddings(radiology_data, modality="radiology")
molecular_embeddings = hb.generate_embeddings(molecular_data, modality="molecular")
# Integrate embeddings
multimodal_embeddings = hb.integrate_embeddings([
clinical_embeddings,
pathology_embeddings,
radiology_embeddings,
molecular_embeddings
])
# Use integrated embeddings for downstream tasks
results = hb.predict_survival(multimodal_embeddings)
Next Steps
Now that you understand the basics, you can explore the specific processing pipelines for each modality: