Tables and Figures Detection Using Layout Parser

View the complete implementation in Google Colab: Open Notebook Tables and Figures Detection Notebook

Introduction

Layout Parser is a toolkit for Document Layout Analysis that helps detect and extract various elements from documents, including tables, figures, text blocks, and more. It uses deep learning models trained on large datasets like PubLayNet to identify different components in document images.

Installation

Install LayoutParser and its dependencies:

pip install layoutparser
pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2" #if you are encountring any problem with this installation refer to readme.md
pip install "layoutparser[layoutmodels]"

Components

Model Initialization

model = lp.Detectron2LayoutModel(
    'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
    extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", min(table_threshold, figure_threshold)],
    label_map={0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}
)

for customization: - Modify label_map to detect different elements - Adjust thresholds for detection sensitivity - Use different pre-trained models (e.g., lp://PrimaLayout/mask_rcnn_R_50_FPN_3x for historical documents)

Block Type Detection

def get_block_type(block):
    """Helper function to safely get block type from layout detection"""
    if hasattr(block, 'type'):
        return block.type

    if hasattr(block, 'label'):
        if isinstance(block.label, str):
            return block.label
        if isinstance(block.label, (int, float)):
            type_mapping = {
                0: 'Text',
                1: 'Title',
                2: 'List',
                3: 'Table',
                4: 'Figure'
            }
            return type_mapping.get(int(block.label), 'Unknown')

    return 'Unknown'

for customization: - Add new types to type_mapping - Modify return values for different classification needs - Add custom type detection logic

Visualization

def create_visualization(image, detected_elements, show_plot=True):
    """Create visualization of detected tables and figures"""
    viz_image = image.copy()
    draw = ImageDraw.Draw(viz_image)

    # Customize colors and labels for different element types
    element_styles = {
        'tables': {'color': 'red', 'label': 'Table'},
        'figures': {'color': 'green', 'label': 'Figure'}
    }

Detection Processing

def process_single_page(image_path, table_threshold=0.3, figure_threshold=0.8):
    """Process a single page to detect tables and figures"""

parameters to adjust: - table_threshold: Lower values detect more tables but may increase false positives - figure_threshold: Higher values ensure more confident figure detection - new thresholds for more element types

Usage Examples

Basic usage with default thresholds:

result = process_single_page("path/to/document.png")

Adjust detection sensitivity:

# More lenient detection
result_lenient = process_single_page(
    "path/to/document.png",
    table_threshold=0.1,
    figure_threshold=0.6
)

# Stricter detection
result_strict = process_single_page(
    "path/to/document.png",
    table_threshold=0.5,
    figure_threshold=0.9
)