Orso is not intended to compete with Polars or Pandas (or your favorite bear DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.
Key Use Cases:
- In Opteryx, Orso provides most of the database Cursor functionality
- In Mabel, Orso provides the data schema and validation functionality
Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.
Install Orso from PyPI:
pip install orso
import orso
# Create from list of dictionaries
df = orso.DataFrame([
{'name': 'Alice', 'age': 30, 'city': 'New York'},
{'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
{'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
])
print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")
# Display the DataFrame
print(df.display())
# Convert to different formats
arrow_table = df.arrow() # PyArrow Table
pandas_df = df.pandas() # Pandas DataFrame
# Access column names
print("Columns:", df.column_names)
# Access schema information
print("Schema:", df.schema)
# From PyArrow
import pyarrow as pa
arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
orso_df = orso.DataFrame.from_arrow(arrow_table)
# To Pandas
pandas_df = orso_df.pandas()
- Lightweight: Minimal overhead for tabular data operations
- Row-based: Optimized for row-oriented operations
- Interoperable: Easy conversion to/from PyArrow, Pandas
- Schema-aware: Built-in data validation and type checking
- Fast serialization: Efficient conversion to bytes, tuples, and dictionaries
The main DataFrame
class provides the following key methods:
DataFrame(dictionaries=None, *, rows=None, schema=None)
- Constructordisplay(limit=5, colorize=True, show_types=True)
- Pretty print the DataFramearrow(size=None)
- Convert to PyArrow Tablepandas(size=None)
- Convert to Pandas DataFramefrom_arrow(tables)
- Create DataFrame from PyArrow Table(s)fetchall()
- Get all rows as list of Row objectscollect()
- Materialize the DataFrameappend(other)
- Append another DataFramedistinct()
- Get unique rows
rowcount
- Number of rowscolumncount
- Number of columnscolumn_names
- List of column namesschema
- Schema information
# Clone the repository
git clone https://github.com/mabel-dev/orso.git
cd orso
# Install dependencies
pip install -r requirements.txt
pip install -r tests/requirements.txt
# Build Cython extensions
make compile
# Run tests
make test
Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:
- All tests pass:
make test
- Code follows the project style:
make lint
- New features include appropriate tests
- Documentation is updated for API changes
Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.
Orso is in beta. Beta means different things to different people, to us, being beta means:
- Interfaces are generally stable but may still have breaking changes
- Unit tests are not reliable enough to capture breaks to functionality
- Bugs are likely to exist in edge cases
- Code may not be tuned for performance
As such, we really don't recommend using Orso in critical applications.