Backend Support
FrameRight supports multiple DataFrame backends. You can choose your backend using:
Backend-specific classes (recommended for type safety)
Base Schema class (defaults to pandas, or specify
backendparameter)
Supported Backends
Pandas — Mature ecosystem, extensive third-party library support
Polars — High-performance, Rust-based, with lazy evaluation
Narwhals — Backend-agnostic DataFrame API
Backend Selection
Explicit Module Imports (Required)
Import Schema and Col from backend-specific modules:
from frameright.pandas import Schema as PandasSchema, Col, Field
from frameright.polars.eager import Schema as PolarsSchema, Col as PolarsCol, Field
from frameright.polars.lazy import Schema as LazySchema, Col as LazyCol, Field
# Pandas backend
class OrdersPandas(PandasSchema):
order_id: Col[int] = Field(unique=True)
revenue: Col[float] = Field(ge=0)
# Polars eager backend
class OrdersPolars(PolarsSchema):
order_id: PolarsCol[int] = Field(unique=True)
revenue: PolarsCol[float] = Field(ge=0)
# Polars lazy backend
class OrdersLazy(LazySchema):
order_id: LazyCol[int] = Field(unique=True)
revenue: LazyCol[float] = Field(ge=0)
Backend Auto-Detection
Each backend module’s Schema class is tied to its specific backend.
The underlying data type determines validation behavior (e.g., pl.DataFrame uses pandera.polars):
from frameright.polars.eager import Schema, Col, Field
from frameright.typing import Col
class Orders(Schema): # Defaults to pandas
order_id: Col[int] = Field(unique=True)
revenue: Col[float] = Field(ge=0)
# Pandas backend (default)
import pandas as pd
pandas_df = pd.DataFrame({...})
orders_pd = Orders(pandas_df) # Uses pandas backend
# Polars backend (explicit parameter)
import polars as pl
polars_df = pl.DataFrame({...})
orders_pl = Orders(polars_df, backend="polars") # Explicitly use polars
Type Safety: Explicit module imports like frameright.pandas, frameright.polars.eager provide stronger
type guarantees and are recommended for production code.
Typing Notes
FrameRight schemas are backend-agnostic, but you can opt into backend-specific typing for a better IDE experience:
Pandas:
from frameright.typing.pandas import Col(columns can type-check aspd.Series[T]with pandas stubs)Polars eager:
from frameright.typing.polars_eager import Col(columns type-check aspl.Series)Polars lazy:
from frameright.typing.polars_lazy import Col(columns type-check aspl.Exprfor expression chaining)
Note
Polars and Narwhals do not currently expose fully generic Series[T] / Expr[T] types upstream.
FrameRight’s Col[T] is still valuable as a schema contract and for IDE autocomplete, but type checkers
generally treat the runtime values as unparameterized pl.Series / pl.Expr / nw.Series today.
At runtime, the actual values you get depend on the backend:
Pandas: properties return
pd.SeriesPolars eager
pl.DataFrame: properties returnpl.SeriesPolars lazy
pl.LazyFrame: properties returnpl.Expr(lazy expressions)
Python 3.12+ Generic Syntax (PEP 695)
Python 3.12 adds a new generic class syntax that avoids manual TypeVar boilerplate.
FrameRight works well with this style and still preserves backend inference:
from frameright import Schema, Field
from frameright.typing import Col
class Sales[T](Schema[T]):
order_id: Col[int] = Field(unique=True)
customer: Col[str]
revenue: Col[float] = Field(ge=0)
Why both ``T``s?
Sales[T]declares the generic parameter.Schema[T]forwards it to the base class so type checkers can inferthe backend type from the constructor argument.
This is the shortest syntax that keeps full static typing without defaulting
to a specific backend or collapsing to Any.
Pandas Backend
Installation:
pip install frameright
Pandas comes as a default dependency.
Features:
Full validation with
pandera.pandasAccess to the entire Pandas ecosystem
Familiar API for existing Pandas users
Great for exploratory analysis and data science workflows
Example:
import pandas as pd
df = pd.read_csv("data.csv")
orders = Orders(df)
# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.groupby(orders.customer_id).sum()
# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)
Polars Backend
Installation:
pip install frameright[polars]
Why Polars?
10-100x faster than Pandas on large datasets (1M+ rows)
Parallel execution — uses all CPU cores automatically
Lazy evaluation — build optimized query plans
Memory efficient — better memory layout and columnar processing
Modern API — expressive, consistent, and type-safe
Example:
import polars as pl
df = pl.read_csv("data.csv")
orders = Orders(df)
# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.group_by(orders.customer_id).sum()
# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)
Lazy Evaluation:
Polars supports lazy evaluation for complex query optimization:
# LazyFrame is automatically handled
lazy_df = pl.scan_csv("data.csv")
orders = Orders(lazy_df) # Schema works with LazyFrames too
# Operations are lazy until you collect()
filtered_df = orders.fr_data.filter(orders.revenue > 1000)
# Execute the full query plan
result = filtered_df.collect()
Backend-Agnostic Schemas
The key benefit: write your schema once, use it with any backend.
class SalesData(Schema):
"""Works with both Pandas and Polars."""
date: Col[str]
product: Col[str]
revenue: Col[float] = Field(ge=0)
quantity: Col[int] = Field(ge=1)
# Use with Pandas during development
dev_df = pd.read_csv("sample.csv")
dev_data = SalesData(dev_df)
# Switch to Polars in production for better performance
prod_df = pl.read_csv("full_dataset.csv")
prod_data = SalesData(prod_df)
This means you can:
Prototype with Pandas (familiar, extensive library ecosystem)
Scale with Polars (performance, parallelism, memory efficiency)
Never rewrite your schema definitions
Validation with Pandera
Both backends use Pandera for validation:
Pandas backend uses
pandera.pandasPolars backend uses
pandera.polars
The validation logic is identical. Pandera automatically handles backend-specific validation:
class Validated(Schema):
amount: Col[float] = Field(ge=0, le=1000)
status: Col[str] = Field(isin=["active", "inactive"])
# Pandera validates with pandas
pandas_df = pd.DataFrame({...})
data_pd = Validated(pandas_df) # Uses pandera.pandas.DataFrameSchema
# Pandera validates with polars
polars_df = pl.DataFrame({...})
data_pl = Validated(polars_df) # Uses pandera.polars.DataFrameSchema
Backend-Specific Operations
For backend-specific operations, use fr_data to access the underlying DataFrame directly:
orders = Orders(df) # Works with either backend
# Pandas-specific
if isinstance(orders.fr_data, pd.DataFrame):
result = orders.fr_data.groupby(orders.customer_id).sum()
# Polars-specific
elif isinstance(orders.fr_data, pl.DataFrame):
result = orders.fr_data.group_by(orders.customer_id).sum()
For backend-agnostic access, use fr_data which returns a narwhals wrapper with full IDE autocomplete:
import narwhals as nw
# These work regardless of backend
orders.fr_data # Returns nw.DataFrame or nw.LazyFrame
orders.fr_data.columns # Column names (backend-agnostic)
orders.fr_data.schema # Narwhals schema
orders.fr_data.to_native() # Escape to native DataFrame (zero-copy)
Performance Comparison
Rough performance guidelines (results vary by dataset and operation):
Operation |
Pandas |
Polars |
|---|---|---|
Small datasets (<100K) |
Similar |
Similar |
Medium datasets (1M) |
1x |
5-20x |
Large datasets (10M+) |
1x |
10-100x |
Memory usage |
1x |
0.3-0.7x |
Parallel aggregations |
Single core |
All cores |
When to use Pandas:
Exploratory data analysis with lots of interactivity
Working with libraries that only support Pandas
Small to medium datasets where performance isn’t critical
When you need the extensive Pandas ecosystem
When to use Polars:
Large datasets (1M+ rows)
Performance-critical production pipelines
Memory-constrained environments
When you can benefit from parallel execution
Migrating Between Backends
Switching backends requires minimal code changes:
# Before (Pandas)
df = pd.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.groupby(orders.customer_id).sum()
# After (Polars)
df = pl.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.group_by(orders.customer_id).sum() # Note: group_by vs groupby
The schema definition (Orders) stays exactly the same. Only the DataFrame creation and backend-specific method calls change.
For backend-agnostic code, use fr_data — the narwhals API is the same regardless of backend.
Adding a Backend (Advanced)
FrameRight’s backend layer is a simple adapter interface implemented per DataFrame library.
Each backend module (frameright.pandas, frameright.polars.eager, etc.) provides its own
Schema class with a hardcoded backend adapter instance.
No auto-detection or dispatch logic — importing from a specific module gives you that backend. This design is intentionally simple and fast:
from frameright.pandas import Schema # _fr_backend = PandasBackend()
from frameright.polars.eager import Schema # _fr_backend = PolarsEagerBackend()
from frameright.polars.lazy import Schema # _fr_backend = PolarsLazyBackend()
If you want to integrate another DataFrame implementation:
Implement a
BackendAdapter(seeframeright.backends.base.BackendAdapter)Create a new module with a
Schemaclass that sets_fr_backend
from frameright.backends.base import BackendAdapter
from frameright.core import BaseSchema
class MyBackend(BackendAdapter):
# Implement required methods...
pass
class Schema(BaseSchema):
_fr_backend = MyBackend()
# Users import your Schema directly
from mypackage import Schema
Notes on cuDF
cuDF is a natural candidate because its API is intentionally close to Pandas. That said, there are two separate concerns:
- DataFrame operations (get/set columns, filtering, I/O, etc.): cuDF can often be supported with a
fairly thin adapter because many method names mirror Pandas.
- Runtime validation (Pandera): Schema currently relies on Pandera’s Pandas and Polars backends.
If Pandera doesn’t support cuDF validation in your environment, a cuDF adapter would either need to:
raise a clear
NotImplementedErrorforfr_validate(), orvalidate by materialising to Pandas (acceptable for small/medium data, but defeats GPU benefits), or
provide an alternative validation implementation.
If your primary goal is “typed column access + autocomplete” in production analysis code, cuDF can still be valuable even before full runtime validation is available — but it’s best treated as an experimental backend until the validation story is nailed down.