GridGulp¶

Automatically detect and extract tables from Excel, CSV, and text files

What is GridGulp?¶

GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It's designed to handle real-world spreadsheets that don't follow a standard format.

Fast & Lightweight

Pure Python implementation with zero external API dependencies. Process files in milliseconds, not minutes.
Accurate Detection

Solid success rate on real-world spreadsheets using proven heuristics.
Multiple Formats

Supports Excel (.xlsx, .xls, .xlsm), CSV, TSV, and text files with automatic format detection.
Multi-Table Support

Detects multiple tables per sheet, handles merged cells, and identifies hierarchical headers.

Quick Start¶

Installation¶

pip install gridgulp

Basic Usage¶

from gridgulp import GridGulp

# Detect tables in a file
gg = GridGulp()
result = await gg.detect_tables("sales_report.xlsx")

# Process results
for sheet in result.sheets:
    print(f"{sheet.name}: {len(sheet.tables)} tables found")
    for table in sheet.tables:
        print(f"  - {table.range.excel_range}")

Jupyter Notebook¶

For Jupyter notebooks, use the synchronous API:

from gridgulp import GridGulp

gg = GridGulp()
result = gg.detect_tables_sync("sales_report.xlsx")

# Extract as pandas DataFrame
for sheet in result.sheets:
    for table in sheet.tables:
        df = gg.extract_dataframe_sync(result.file_data, table)
        print(f"Table shape: {df.shape}")

Key Features¶

🎯 Smart Table Detection¶

SimpleCaseDetector: Handles ~80% of spreadsheets with single tables
IslandDetector: Finds multiple disconnected data regions
ExcelMetadataExtractor: Uses native Excel table definitions
Multi-row header detection: Identifies complex hierarchical headers

🚀 Performance¶

Process most files in under 1 second
Memory-efficient streaming for large files
Configurable performance/accuracy trade-offs
Optimized parsing for Excel and CSV files

🔧 Flexible Configuration¶

Confidence thresholds
Table size limits
Detection method selection
Custom format analyzers

Why GridGulp?¶

Most spreadsheet processing tools assume your data is perfectly formatted - one table per sheet, starting at A1, with clean headers. Real-world data is messier:

Multiple tables on one sheet
Tables that don't start at A1
Merged cells and multi-row headers
Mixed data and formatting

GridGulp handles all of these cases automatically, so you can focus on using your data instead of cleaning it.

Next Steps¶

Getting Started

Install GridGulp and process your first spreadsheet
User Guide

Learn about configuration, detection methods, and advanced features
API Reference

Detailed documentation of all classes and methods
GitHub

View source code, report issues, and contribute