Dataframe extractor
gridgulp.extractors.dataframe_extractor ¶
Extract DataFrames from detected table regions with header detection.
HeaderDetectionResult ¶
Bases: BaseModel
Result of header detection analysis.
DataFrameExtractor ¶
Extract pandas DataFrames from sheet data with intelligent header detection.
Initialize extractor.
Parameters:
-
min_data_rows(int, default:2) –Minimum number of data rows (excluding headers) for valid table
-
min_data_density(float, default:0.3) –Minimum ratio of non-empty cells for valid table
Source code in src/gridgulp/extractors/dataframe_extractor.py
extract_dataframe ¶
extract_dataframe(sheet_data: SheetData, cell_range: CellRange, detect_headers: bool = True) -> tuple[pd.DataFrame | None, HeaderDetectionResult | None, float]
Extract a DataFrame from the specified range.
Parameters:
-
sheet_data(SheetData) –Sheet containing the data
-
cell_range(CellRange) –Range to extract
-
detect_headers(bool, default:True) –Whether to detect headers automatically
Returns:
-
DataFrame | None–Tuple of (dataframe, header_info, quality_score)
-
HeaderDetectionResult | None–Returns (None, None, 0.0) if extraction fails