Pathology Processing

Overview

The pathology processing pipeline in HoneyBee handles Whole Slide Images (WSIs), which are high-resolution scans of tissue samples. These images present unique computational challenges due to their extreme size (often several gigabytes), multi-resolution pyramid structure, and vendor-specific file formats.

Key Features

Support for multiple WSI formats (Aperio SVS, Philips TIFF, etc.)
GPU-accelerated image processing
Tissue detection and segmentation
Stain normalization and separation
Efficient patch extraction
Multiple embedding models for feature extraction

WSI Loading and Data Management

HoneyBee utilizes CuImage for efficient loading and handling of WSIs:

from honeybee.processors import PathologyProcessor

# Initialize the pathology processor
processor = PathologyProcessor()

# Load a whole slide image
wsi = processor.load_wsi("path/to/slide.svs")

Tissue Detection and Segmentation

HoneyBee implements two approaches for tissue detection:

1. Otsu-based method

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Detect tissue using Otsu thresholding
tissue_mask = processor.detect_tissue(wsi, method="otsu")

2. Deep learning-based approach

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Detect tissue using pretrained model
tissue_mask = processor.detect_tissue(wsi, method="deeplearning")

Stain Normalization

HoneyBee implements three state-of-the-art stain normalization methods:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Reinhard normalization
normalized_wsi_reinhard = processor.normalize_stain(wsi, method="reinhard")

# Macenko normalization
normalized_wsi_macenko = processor.normalize_stain(wsi, method="macenko")

# Vahadane normalization
normalized_wsi_vahadane = processor.normalize_stain(wsi, method="vahadane")

Stain Separation

HoneyBee implements color deconvolution for stain separation:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")

# Separate stains (returns a dictionary of stain components)
stains = processor.separate_stains(wsi)

# Access individual stain components
hematoxylin = stains['hematoxylin']
eosin = stains['eosin']
dab = stains.get('dab')  # May be None if no DAB staining present

Patch Extraction

Extract patches from tissue regions for analysis:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor()
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)

# Extract patches
patches = processor.extract_patches(
    wsi, 
    tissue_mask,
    patch_size=256,
    overlap=0.2,
    min_tissue_percentage=0.5
)

Embedding Generation

Generate embeddings from tissue patches using pretrained models:

from honeybee.processors import PathologyProcessor

processor = PathologyProcessor(model="uni")  # Options: uni, remedis
wsi = processor.load_wsi("path/to/slide.svs")
tissue_mask = processor.detect_tissue(wsi)
patches = processor.extract_patches(wsi, tissue_mask)

# Generate embeddings for all patches
embeddings = processor.generate_embeddings(patches)

# Shape: (num_patches, embedding_dim)  # embedding_dim depends on the model

Complete Example

Full pipeline from WSI loading to embedding generation:

from honeybee.processors import PathologyProcessor

# Initialize processor with specific model
processor = PathologyProcessor(model="uni")

# Load WSI
wsi = processor.load_wsi("path/to/slide.svs")

# Normalize staining
normalized_wsi = processor.normalize_stain(wsi, method="macenko")

# Detect tissue
tissue_mask = processor.detect_tissue(normalized_wsi, method="deeplearning")

# Extract patches
patches = processor.extract_patches(
    normalized_wsi, 
    tissue_mask,
    patch_size=256,
    overlap=0,
    min_tissue_percentage=0.7
)

# Generate embeddings
embeddings = processor.generate_embeddings(patches)

# Aggregate patch-level embeddings to slide-level
slide_embedding = processor.aggregate_embeddings(embeddings, method="mean")

# Use for downstream tasks
# ...

Performance Considerations

When processing large WSIs, consider the following:

Leverage GPU acceleration when available
Use appropriate magnification level for analysis (typically 20x or 40x)
Process slides in batches to manage memory usage
Use multi-threading for patch extraction and processing
Consider downsampling for initial tissue detection before high-resolution analysis

References

UNI Model: https://arxiv.org/abs/2310.05694
REMEDIS: https://arxiv.org/abs/2308.16184
Stain Normalization Methods: https://bio-medical.github.io/staintools/