Overview

The pathology processing pipeline in HoneyBee handles Whole Slide Images (WSIs), which are high-resolution scans of tissue samples. These images present unique computational challenges due to their extreme size (often several gigabytes), multi-resolution pyramid structure, and vendor-specific file formats.

Whole Slide Image Processing Pipeline

Key Features

  • Support for multiple WSI formats (Aperio SVS, Philips TIFF, etc.)
  • GPU-accelerated image processing
  • Tissue detection and segmentation
  • Stain normalization and separation
  • Efficient patch extraction
  • Multiple embedding models for feature extraction

WSI Loading and Data Management

HoneyBee utilizes CuImage for efficient loading and handling of WSIs:

from honeybee.processors import PathologyProcessor

# Initialize the pathology processor
processor = PathologyProcessor()

# Load a whole slide image
wsi = processor.load_wsi("path/to/slide.svs")

Tissue Detection and Segmentation

HoneyBee implements two approaches for tissue detection:

1. Otsu-based method

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Detect tissue using Otsu thresholding
tissue_mask = processor.detect_tissue(wsi, method="otsu")

2. Deep learning-based approach

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Detect tissue using pretrained model
tissue_mask = processor.detect_tissue(wsi, method="deeplearning")

Stain Normalization

HoneyBee implements three state-of-the-art stain normalization methods:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Reinhard normalization
normalized_wsi_reinhard = processor.normalize_stain(wsi, method="reinhard")

# Macenko normalization
normalized_wsi_macenko = processor.normalize_stain(wsi, method="macenko")

# Vahadane normalization
normalized_wsi_vahadane = processor.normalize_stain(wsi, method="vahadane")

Stain Separation

HoneyBee implements color deconvolution for stain separation:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Separate stains (returns a dictionary of stain components)
stains = processor.separate_stains(wsi)

# Access individual stain components
hematoxylin = stains['hematoxylin']
eosin = stains['eosin']
dab = stains.get('dab')  # May be None if no DAB staining present

Patch Extraction

Extract patches from tissue regions for analysis:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)

# Extract patches
patches = processor.extract_patches(
    wsi, 
    tissue_mask,
    patch_size=256,
    overlap=0.2,
    min_tissue_percentage=0.5
)

Embedding Generation

Generate embeddings from tissue patches using pretrained models:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor(model="uni")  # Options: uni, remedis
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)
patches = processor.extract_patches(wsi, tissue_mask)

# Generate embeddings for all patches
embeddings = processor.generate_embeddings(patches)

# Shape: (num_patches, embedding_dim)  # embedding_dim depends on the model

Complete Example

Full pipeline from WSI loading to embedding generation:

from honeybee.processors import PathologyProcessor

# Initialize processor with specific model
processor = PathologyProcessor(model="uni")

# Load WSI
wsi = processor.load_wsi("path/to/slide.svs")

# Normalize staining
normalized_wsi = processor.normalize_stain(wsi, method="macenko")

# Detect tissue
tissue_mask = processor.detect_tissue(normalized_wsi, method="deeplearning")

# Extract patches
patches = processor.extract_patches(
    normalized_wsi, 
    tissue_mask,
    patch_size=256,
    overlap=0,
    min_tissue_percentage=0.7
)

# Generate embeddings
embeddings = processor.generate_embeddings(patches)

# Aggregate patch-level embeddings to slide-level
slide_embedding = processor.aggregate_embeddings(embeddings, method="mean")

# Use for downstream tasks
# ...

Performance Considerations

When processing large WSIs, consider the following:

  • Leverage GPU acceleration when available
  • Use appropriate magnification level for analysis (typically 20x or 40x)
  • Process slides in batches to manage memory usage
  • Use multi-threading for patch extraction and processing
  • Consider downsampling for initial tissue detection before high-resolution analysis

References