1
0
Fork 0
textual-fastdatatable/README.md

132 lines
5 KiB
Markdown
Raw Permalink Normal View History

# textual-fastdatatable
A performance-focused reimplementation of Textual's DataTable widget, with a pluggable data storage backend.
Textual's built-in DataTable widget is beautiful and powerful, but it can be slow to load large datasets.
Here are some benchmarks on my relatively weak laptop. For each benchmark, we initialize a Textual App that
loads a dataset from a parquet file and mounts a data table; it then scrolls around the table
(10 pagedowns and 15 right arrows).
For the built-in table and the others marked "from Records", the data is loaded into memory before the timer
is started; for the "Arrow from Parquet" back-end, the timer is started immediately.
The times in each column represent the time to the first paint of the table, and the time after scrolling
is completed (we wait until the table is fully rendered after each scroll):
Records | Built-In DataTable | FastDataTable (Arrow from Parquet) | FastDataTable (Arrow from Records) | FastDataTable (Numpy from Records)
--------|--------|--------|--------|--------
lap_times_100.parquet | 0.019s / 1.716s | 0.012s / 1.724s | 0.011s / 1.700s | 0.011s / 1.688s
lap_times_1000.parquet | 0.103s / 1.931s | 0.011s / 1.859s | 0.011s / 1.799s | 0.015s / 1.848s
lap_times_10000.parquet | 0.977s / 2.824s | 0.013s / 1.834s | 0.016s / 1.812s | 0.078s / 1.869s
lap_times_100000.parquet | 11.773s / 13.770s | 0.025s / 1.790s | 0.156s / 1.824s | 0.567s / 2.347s
lap_times_538121.parquet | 62.960s / 65.760s | 0.077s / 1.803s | 0.379s / 2.234s | 3.324s / 5.031s
wide_10000.parquet | 5.110s / 10.539s | 0.024s / 3.373s | 0.042s / 3.278s | 0.369s / 3.461s
wide_100000.parquet | 51.144s / 56.604s | 0.054s / 3.294s | 0.429s / 3.642s | 3.628s / 6.732s
**NB:** FastDataTable currently does not support rows with a height of more than one line. See below for
more limitations, relative to the built-in DataTable.
## Installation
```bash
pip install textual-fastdatatable
```
## Usage
If you already have data in Apache Arrow or another common table format:
```py
from textual_fastdatatable import DataTable
data_table = DataTable(data = my_data)
```
The currently supported types are:
```py
AutoBackendType = Union[
pa.Table,
pa.RecordBatch,
Path, # to parquet only
str, # path to parquet only
Sequence[Iterable[Any]],
Mapping[str, Sequence[Any]],
]
```
To override the column labels and widths supplied by the backend:
```py
from textual_fastdatatable import DataTable
data_table = DataTable(data = my_data, column_labels=["Supports", "[red]Console[/]", "Markup!"], column_widths=[10, 5, None])
```
You can also pass in a `backend` manually (if you want more control or want to plug in your own).
```py
from textual_fastdatatable import ArrowBackend, DataTable, create_backend
backend = create_backend(my_data)
backend = ArrowBackend(my_arrow_table)
# from python dictionary in the form key: col_values
backend = ArrowBackend.from_pydict(
{
"col one": [1, 2, 3 ,4],
"col two": ["a", "b", "c", "d"],
}
)
# from a list of tuples or another sequence of iterables
backend = ArrowBackend.from_records(
[
("col one", "col two"),
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
]
)
# from a path to a Parquet file:
backend = ArrowBackend.from_parquet("path/to/file.parquet")
```
## Limitations and Caveats
The `DataTable` does not currently support rows with a height of more than one line. Only the first line of each row will be displayed.
The `DataTable` does not currently support row labels.
The `ArrowBackend` is optimized to be fast for large, immutable datasets. Mutating the data,
especially adding or removing rows, may be slow.
The `ArrowBackend` cannot be initialized without data, however, the DataTable can (either with or without `column_labels`).
The `ArrowBackend` cannot store arbitrary Python objects or Rich Renderables as values. It may widen types to strings unnecessarily.
## Additional Features
### Copying Data from the Table
`ctrl+c` will post a SelectionCopied message with a list of tuples of the values selected by the cursor. To use, initialize with `cursor_type=range` from an app that does NOT inherit bindings.
```py
from textual.app import App, ComposeResult
from textual_fastdatatable import ArrowBackend, DataTable
class TableApp(App, inherit_bindings=False):
BINDINGS = [("ctrl+q", "quit", "Quit")]
def compose(self) -> ComposeResult:
backend = ArrowBackend.from_parquet("./tests/data/lap_times_538121.parquet")
yield DataTable(backend=backend, cursor_type="range")
if __name__ == "__main__":
app = TableApp()
app.run()
```
### Truncating long values
The `DataTable` will automatically calculate column widths; if you set a `max_column_content_width` at initialization, it will truncate any long values at that width; the full value will be visible on hover in a tooltip (and the full value will always be copied to the clipboard).