GridGulp¶
Automatically detect and extract tables from Excel, CSV, and text files
What is GridGulp?¶
GridGulp finds tables in your spreadsheets - even when there are multiple tables on one sheet or when tables don't start at cell A1. It's designed to handle real-world spreadsheets that don't follow a standard format.
-
Fast & Lightweight
Pure Python implementation with zero external API dependencies. Process files in milliseconds, not minutes.
-
Accurate Detection
Solid success rate on real-world spreadsheets using proven heuristics.
-
Multiple Formats
Supports Excel (
.xlsx,.xls,.xlsm), CSV, TSV, and text files with automatic format detection. -
Multi-Table Support
Detects multiple tables per sheet, handles merged cells, and identifies hierarchical headers.
Quick Start¶
Installation¶
Basic Usage¶
from gridgulp import GridGulp
# Detect tables in a file
gg = GridGulp()
result = await gg.detect_tables("sales_report.xlsx")
# Process results
for sheet in result.sheets:
print(f"{sheet.name}: {len(sheet.tables)} tables found")
for table in sheet.tables:
print(f" - {table.range.excel_range}")
Jupyter Notebook¶
For Jupyter notebooks, use the synchronous API:
from gridgulp import GridGulp
gg = GridGulp()
result = gg.detect_tables_sync("sales_report.xlsx")
# Extract as pandas DataFrame
for sheet in result.sheets:
for table in sheet.tables:
df = gg.extract_dataframe_sync(result.file_data, table)
print(f"Table shape: {df.shape}")
Key Features¶
🎯 Smart Table Detection¶
- SimpleCaseDetector: Handles ~80% of spreadsheets with single tables
- IslandDetector: Finds multiple disconnected data regions
- ExcelMetadataExtractor: Uses native Excel table definitions
- Multi-row header detection: Identifies complex hierarchical headers
🚀 Performance¶
- Process most files in under 1 second
- Memory-efficient streaming for large files
- Configurable performance/accuracy trade-offs
- Optimized parsing for Excel and CSV files
🔧 Flexible Configuration¶
- Confidence thresholds
- Table size limits
- Detection method selection
- Custom format analyzers
Why GridGulp?¶
Most spreadsheet processing tools assume your data is perfectly formatted - one table per sheet, starting at A1, with clean headers. Real-world data is messier:
- Multiple tables on one sheet
- Tables that don't start at A1
- Merged cells and multi-row headers
- Mixed data and formatting
GridGulp handles all of these cases automatically, so you can focus on using your data instead of cleaning it.
Next Steps¶
-
Install GridGulp and process your first spreadsheet
-
Learn about configuration, detection methods, and advanced features
-
Detailed documentation of all classes and methods
-
View source code, report issues, and contribute