Backend Support

FrameRight supports multiple DataFrame backends. You can choose your backend using:

Backend-specific classes (recommended for type safety)
Base Schema class (defaults to pandas, or specify backend parameter)

Supported Backends

Pandas — Mature ecosystem, extensive third-party library support
Polars — High-performance, Rust-based, with lazy evaluation
Narwhals — Backend-agnostic DataFrame API

Backend Selection

Explicit Module Imports (Required)

Import Schema and Col from backend-specific modules:

from frameright.pandas import Schema as PandasSchema, Col, Field
from frameright.polars.eager import Schema as PolarsSchema, Col as PolarsCol, Field
from frameright.polars.lazy import Schema as LazySchema, Col as LazyCol, Field

# Pandas backend
class OrdersPandas(PandasSchema):
    order_id: Col[int] = Field(unique=True)
    revenue: Col[float] = Field(ge=0)

# Polars eager backend
class OrdersPolars(PolarsSchema):
    order_id: PolarsCol[int] = Field(unique=True)
    revenue: PolarsCol[float] = Field(ge=0)

# Polars lazy backend
class OrdersLazy(LazySchema):
    order_id: LazyCol[int] = Field(unique=True)
    revenue: LazyCol[float] = Field(ge=0)

Backend Auto-Detection

Each backend module’s Schema class is tied to its specific backend. The underlying data type determines validation behavior (e.g., pl.DataFrame uses pandera.polars):

from frameright.polars.eager import Schema, Col, Field
from frameright.typing import Col

class Orders(Schema):  # Defaults to pandas
    order_id: Col[int] = Field(unique=True)
    revenue: Col[float] = Field(ge=0)

# Pandas backend (default)
import pandas as pd
pandas_df = pd.DataFrame({...})
orders_pd = Orders(pandas_df)  # Uses pandas backend

# Polars backend (explicit parameter)
import polars as pl
polars_df = pl.DataFrame({...})
orders_pl = Orders(polars_df, backend="polars")  # Explicitly use polars

Type Safety: Explicit module imports like frameright.pandas, frameright.polars.eager provide stronger type guarantees and are recommended for production code.

Typing Notes

FrameRight schemas are backend-agnostic, but you can opt into backend-specific typing for a better IDE experience:

Pandas: from frameright.typing.pandas import Col (columns can type-check as pd.Series[T] with pandas stubs)
Polars eager: from frameright.typing.polars_eager import Col (columns type-check as pl.Series)
Polars lazy: from frameright.typing.polars_lazy import Col (columns type-check as pl.Expr for expression chaining)

Note

Polars and Narwhals do not currently expose fully generic Series[T] / Expr[T] types upstream. FrameRight’s Col[T] is still valuable as a schema contract and for IDE autocomplete, but type checkers generally treat the runtime values as unparameterized pl.Series / pl.Expr / nw.Series today.

At runtime, the actual values you get depend on the backend:

Pandas: properties return pd.Series
Polars eager pl.DataFrame: properties return pl.Series
Polars lazy pl.LazyFrame: properties return pl.Expr (lazy expressions)

Python 3.12+ Generic Syntax (PEP 695)

Python 3.12 adds a new generic class syntax that avoids manual TypeVar boilerplate. FrameRight works well with this style and still preserves backend inference:

from frameright import Schema, Field
from frameright.typing import Col

class Sales[T](Schema[T]):
        order_id: Col[int] = Field(unique=True)
        customer: Col[str]
        revenue: Col[float] = Field(ge=0)

Why both ``T``s?

Sales[T] declares the generic parameter.
Schema[T] forwards it to the base class so type checkers can infer
the backend type from the constructor argument.

This is the shortest syntax that keeps full static typing without defaulting to a specific backend or collapsing to Any.

Pandas Backend

Installation:

pip install frameright

Pandas comes as a default dependency.

Features:

Full validation with pandera.pandas
Access to the entire Pandas ecosystem
Familiar API for existing Pandas users
Great for exploratory analysis and data science workflows

Example:

import pandas as pd

df = pd.read_csv("data.csv")
orders = Orders(df)

# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.groupby(orders.customer_id).sum()

# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)

Polars Backend

Installation:

pip install frameright[polars]

Why Polars?

10-100x faster than Pandas on large datasets (1M+ rows)
Parallel execution — uses all CPU cores automatically
Lazy evaluation — build optimized query plans
Memory efficient — better memory layout and columnar processing
Modern API — expressive, consistent, and type-safe

Example:

import polars as pl

df = pl.read_csv("data.csv")
orders = Orders(df)

# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.group_by(orders.customer_id).sum()

# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)

Lazy Evaluation:

Polars supports lazy evaluation for complex query optimization:

# LazyFrame is automatically handled
lazy_df = pl.scan_csv("data.csv")
orders = Orders(lazy_df)  # Schema works with LazyFrames too

# Operations are lazy until you collect()
filtered_df = orders.fr_data.filter(orders.revenue > 1000)
# Execute the full query plan
result = filtered_df.collect()

Backend-Agnostic Schemas

The key benefit: write your schema once, use it with any backend.

class SalesData(Schema):
    """Works with both Pandas and Polars."""
    date: Col[str]
    product: Col[str]
    revenue: Col[float] = Field(ge=0)
    quantity: Col[int] = Field(ge=1)

# Use with Pandas during development
dev_df = pd.read_csv("sample.csv")
dev_data = SalesData(dev_df)

# Switch to Polars in production for better performance
prod_df = pl.read_csv("full_dataset.csv")
prod_data = SalesData(prod_df)

This means you can:

Prototype with Pandas (familiar, extensive library ecosystem)
Scale with Polars (performance, parallelism, memory efficiency)
Never rewrite your schema definitions

Validation with Pandera

Both backends use Pandera for validation:

Pandas backend uses pandera.pandas
Polars backend uses pandera.polars

The validation logic is identical. Pandera automatically handles backend-specific validation:

class Validated(Schema):
    amount: Col[float] = Field(ge=0, le=1000)
    status: Col[str] = Field(isin=["active", "inactive"])

# Pandera validates with pandas
pandas_df = pd.DataFrame({...})
data_pd = Validated(pandas_df)  # Uses pandera.pandas.DataFrameSchema

# Pandera validates with polars
polars_df = pl.DataFrame({...})
data_pl = Validated(polars_df)  # Uses pandera.polars.DataFrameSchema

Backend-Specific Operations

For backend-specific operations, use fr_data to access the underlying DataFrame directly:

orders = Orders(df)  # Works with either backend

# Pandas-specific
if isinstance(orders.fr_data, pd.DataFrame):
    result = orders.fr_data.groupby(orders.customer_id).sum()

# Polars-specific
elif isinstance(orders.fr_data, pl.DataFrame):
    result = orders.fr_data.group_by(orders.customer_id).sum()

For backend-agnostic access, use fr_data which returns a narwhals wrapper with full IDE autocomplete:

import narwhals as nw

# These work regardless of backend
orders.fr_data              # Returns nw.DataFrame or nw.LazyFrame
orders.fr_data.columns      # Column names (backend-agnostic)
orders.fr_data.schema       # Narwhals schema
orders.fr_data.to_native()  # Escape to native DataFrame (zero-copy)

Performance Comparison

Rough performance guidelines (results vary by dataset and operation):

Operation	Pandas	Polars
Small datasets (<100K)	Similar	Similar
Medium datasets (1M)	1x	5-20x
Large datasets (10M+)	1x	10-100x
Memory usage	1x	0.3-0.7x
Parallel aggregations	Single core	All cores

When to use Pandas:

Exploratory data analysis with lots of interactivity
Working with libraries that only support Pandas
Small to medium datasets where performance isn’t critical
When you need the extensive Pandas ecosystem

When to use Polars:

Large datasets (1M+ rows)
Performance-critical production pipelines
Memory-constrained environments
When you can benefit from parallel execution

Migrating Between Backends

Switching backends requires minimal code changes:

# Before (Pandas)
df = pd.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.groupby(orders.customer_id).sum()

# After (Polars)
df = pl.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.group_by(orders.customer_id).sum()  # Note: group_by vs groupby

The schema definition (Orders) stays exactly the same. Only the DataFrame creation and backend-specific method calls change. For backend-agnostic code, use fr_data — the narwhals API is the same regardless of backend.

Adding a Backend (Advanced)

FrameRight’s backend layer is a simple adapter interface implemented per DataFrame library. Each backend module (frameright.pandas, frameright.polars.eager, etc.) provides its own Schema class with a hardcoded backend adapter instance.

No auto-detection or dispatch logic — importing from a specific module gives you that backend. This design is intentionally simple and fast:

from frameright.pandas import Schema       # _fr_backend = PandasBackend()
from frameright.polars.eager import Schema # _fr_backend = PolarsEagerBackend()
from frameright.polars.lazy import Schema  # _fr_backend = PolarsLazyBackend()

If you want to integrate another DataFrame implementation:

Implement a BackendAdapter (see frameright.backends.base.BackendAdapter)
Create a new module with a Schema class that sets _fr_backend

from frameright.backends.base import BackendAdapter
from frameright.core import BaseSchema

class MyBackend(BackendAdapter):
    # Implement required methods...
    pass

class Schema(BaseSchema):
    _fr_backend = MyBackend()

# Users import your Schema directly
from mypackage import Schema

Notes on cuDF

cuDF is a natural candidate because its API is intentionally close to Pandas. That said, there are two separate concerns:

DataFrame operations (get/set columns, filtering, I/O, etc.): cuDF can often be supported with a
fairly thin adapter because many method names mirror Pandas.
Runtime validation (Pandera): Schema currently relies on Pandera’s Pandas and Polars backends.
If Pandera doesn’t support cuDF validation in your environment, a cuDF adapter would either need to:
- raise a clear NotImplementedError for fr_validate(), or
- validate by materialising to Pandas (acceptable for small/medium data, but defeats GPU benefits), or
- provide an alternative validation implementation.

If your primary goal is “typed column access + autocomplete” in production analysis code, cuDF can still be valuable even before full runtime validation is available — but it’s best treated as an experimental backend until the validation story is nailed down.