Overview
The pathology processing pipeline in HoneyBee handles Whole Slide Images (WSIs), which are high-resolution scans of tissue samples. These images present unique computational challenges due to their extreme size (often several gigabytes), multi-resolution pyramid structure, and vendor-specific file formats.

Key Features
- Support for multiple WSI formats (Aperio SVS, Philips TIFF, etc.)
- GPU-accelerated image processing
- Tissue detection and segmentation
- Stain normalization and separation
- Efficient patch extraction
- Multiple embedding models for feature extraction
WSI Loading and Data Management
HoneyBee utilizes CuImage for efficient loading and handling of WSIs:
from honeybee.processors import PathologyProcessor
# Initialize the pathology processor
processor = PathologyProcessor()
# Load a whole slide image
wsi = processor.load_wsi("path/to/slide.svs")
Tissue Detection and Segmentation
HoneyBee implements two approaches for tissue detection:
1. Otsu-based method
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
# Detect tissue using Otsu thresholding
tissue_mask = processor.detect_tissue(wsi, method="otsu")
2. Deep learning-based approach
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
# Detect tissue using pretrained model
tissue_mask = processor.detect_tissue(wsi, method="deeplearning")
Stain Normalization
HoneyBee implements three state-of-the-art stain normalization methods:
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
# Reinhard normalization
normalized_wsi_reinhard = processor.normalize_stain(wsi, method="reinhard")
# Macenko normalization
normalized_wsi_macenko = processor.normalize_stain(wsi, method="macenko")
# Vahadane normalization
normalized_wsi_vahadane = processor.normalize_stain(wsi, method="vahadane")
Stain Separation
HoneyBee implements color deconvolution for stain separation:
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
# Separate stains (returns a dictionary of stain components)
stains = processor.separate_stains(wsi)
# Access individual stain components
hematoxylin = stains['hematoxylin']
eosin = stains['eosin']
dab = stains.get('dab') # May be None if no DAB staining present
Patch Extraction
Extract patches from tissue regions for analysis:
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)
# Extract patches
patches = processor.extract_patches(
wsi,
tissue_mask,
patch_size=256,
overlap=0.2,
min_tissue_percentage=0.5
)
Embedding Generation
Generate embeddings from tissue patches using pretrained models:
from honeybee.processors import PathologyProcessor
processor = PathologyProcessor(model="uni") # Options: uni, remedis
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)
patches = processor.extract_patches(wsi, tissue_mask)
# Generate embeddings for all patches
embeddings = processor.generate_embeddings(patches)
# Shape: (num_patches, embedding_dim) # embedding_dim depends on the model
Complete Example
Full pipeline from WSI loading to embedding generation:
from honeybee.processors import PathologyProcessor
# Initialize processor with specific model
processor = PathologyProcessor(model="uni")
# Load WSI
wsi = processor.load_wsi("path/to/slide.svs")
# Normalize staining
normalized_wsi = processor.normalize_stain(wsi, method="macenko")
# Detect tissue
tissue_mask = processor.detect_tissue(normalized_wsi, method="deeplearning")
# Extract patches
patches = processor.extract_patches(
normalized_wsi,
tissue_mask,
patch_size=256,
overlap=0,
min_tissue_percentage=0.7
)
# Generate embeddings
embeddings = processor.generate_embeddings(patches)
# Aggregate patch-level embeddings to slide-level
slide_embedding = processor.aggregate_embeddings(embeddings, method="mean")
# Use for downstream tasks
# ...
Performance Considerations
When processing large WSIs, consider the following:
- Leverage GPU acceleration when available
- Use appropriate magnification level for analysis (typically 20x or 40x)
- Process slides in batches to manage memory usage
- Use multi-threading for patch extraction and processing
- Consider downsampling for initial tissue detection before high-resolution analysis
References
- UNI Model: https://arxiv.org/abs/2310.05694
- REMEDIS: https://arxiv.org/abs/2308.16184
- Stain Normalization Methods: https://bio-medical.github.io/staintools/