Skip to content

GridGulp

Automatically detect and extract tables from Excel, CSV, and text files

PyPI version Python 3.10+ License: MIT

What is GridGulp?

GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It's designed to handle real-world spreadsheets that don't follow a standard format.

  • Fast & Lightweight


    Pure Python implementation with zero external API dependencies. Process files in milliseconds, not minutes.

  • Accurate Detection


    Solid success rate on real-world spreadsheets using proven heuristics.

  • Multiple Formats


    Supports Excel (.xlsx, .xls, .xlsm), CSV, TSV, and text files with automatic format detection.

  • Multi-Table Support


    Detects multiple tables per sheet, handles merged cells, and identifies hierarchical headers.

Quick Start

Installation

pip install gridgulp

Basic Usage

from gridgulp import GridGulp

# Detect tables in a file
gg = GridGulp()
result = await gg.detect_tables("sales_report.xlsx")

# Process results
for sheet in result.sheets:
    print(f"{sheet.name}: {len(sheet.tables)} tables found")
    for table in sheet.tables:
        print(f"  - {table.range.excel_range}")

Jupyter Notebook

For Jupyter notebooks, use the synchronous API:

from gridgulp import GridGulp

gg = GridGulp()
result = gg.detect_tables_sync("sales_report.xlsx")

# Extract as pandas DataFrame
for sheet in result.sheets:
    for table in sheet.tables:
        df = gg.extract_dataframe_sync(result.file_data, table)
        print(f"Table shape: {df.shape}")

Key Features

🎯 Smart Table Detection

  • SimpleCaseDetector: Handles ~80% of spreadsheets with single tables
  • IslandDetector: Finds multiple disconnected data regions
  • ExcelMetadataExtractor: Uses native Excel table definitions
  • Multi-row header detection: Identifies complex hierarchical headers

🚀 Performance

  • Process most files in under 1 second
  • Memory-efficient streaming for large files
  • Configurable performance/accuracy trade-offs
  • Optimized parsing for Excel and CSV files

🔧 Flexible Configuration

  • Confidence thresholds
  • Table size limits
  • Detection method selection
  • Custom format analyzers

Why GridGulp?

Most spreadsheet processing tools assume your data is perfectly formatted - one table per sheet, starting at A1, with clean headers. Real-world data is messier:

  • Multiple tables on one sheet
  • Tables that don't start at A1
  • Merged cells and multi-row headers
  • Mixed data and formatting

GridGulp handles all of these cases automatically, so you can focus on using your data instead of cleaning it.

Next Steps

  • Getting Started


    Install GridGulp and process your first spreadsheet

  • User Guide


    Learn about configuration, detection methods, and advanced features

  • API Reference


    Detailed documentation of all classes and methods

  • GitHub


    View source code, report issues, and contribute