Backend Support

FrameRight supports multiple DataFrame backends. You can choose your backend using:

  1. Backend-specific classes (recommended for type safety)

  2. Base Schema class (defaults to pandas, or specify backend parameter)

Supported Backends

  • Pandas — Mature ecosystem, extensive third-party library support

  • Polars — High-performance, Rust-based, with lazy evaluation

  • Narwhals — Backend-agnostic DataFrame API

Backend Selection

Explicit Module Imports (Required)

Import Schema and Col from backend-specific modules:

from frameright.pandas import Schema as PandasSchema, Col, Field
from frameright.polars.eager import Schema as PolarsSchema, Col as PolarsCol, Field
from frameright.polars.lazy import Schema as LazySchema, Col as LazyCol, Field

# Pandas backend
class OrdersPandas(PandasSchema):
    order_id: Col[int] = Field(unique=True)
    revenue: Col[float] = Field(ge=0)

# Polars eager backend
class OrdersPolars(PolarsSchema):
    order_id: PolarsCol[int] = Field(unique=True)
    revenue: PolarsCol[float] = Field(ge=0)

# Polars lazy backend
class OrdersLazy(LazySchema):
    order_id: LazyCol[int] = Field(unique=True)
    revenue: LazyCol[float] = Field(ge=0)

Backend Auto-Detection

Each backend module’s Schema class is tied to its specific backend. The underlying data type determines validation behavior (e.g., pl.DataFrame uses pandera.polars):

from frameright.polars.eager import Schema, Col, Field
from frameright.typing import Col

class Orders(Schema):  # Defaults to pandas
    order_id: Col[int] = Field(unique=True)
    revenue: Col[float] = Field(ge=0)

# Pandas backend (default)
import pandas as pd
pandas_df = pd.DataFrame({...})
orders_pd = Orders(pandas_df)  # Uses pandas backend

# Polars backend (explicit parameter)
import polars as pl
polars_df = pl.DataFrame({...})
orders_pl = Orders(polars_df, backend="polars")  # Explicitly use polars

Type Safety: Explicit module imports like frameright.pandas, frameright.polars.eager provide stronger type guarantees and are recommended for production code.

Typing Notes

FrameRight schemas are backend-agnostic, but you can opt into backend-specific typing for a better IDE experience:

  • Pandas: from frameright.typing.pandas import Col (columns can type-check as pd.Series[T] with pandas stubs)

  • Polars eager: from frameright.typing.polars_eager import Col (columns type-check as pl.Series)

  • Polars lazy: from frameright.typing.polars_lazy import Col (columns type-check as pl.Expr for expression chaining)

Note

Polars and Narwhals do not currently expose fully generic Series[T] / Expr[T] types upstream. FrameRight’s Col[T] is still valuable as a schema contract and for IDE autocomplete, but type checkers generally treat the runtime values as unparameterized pl.Series / pl.Expr / nw.Series today.

At runtime, the actual values you get depend on the backend:

  • Pandas: properties return pd.Series

  • Polars eager pl.DataFrame: properties return pl.Series

  • Polars lazy pl.LazyFrame: properties return pl.Expr (lazy expressions)

Python 3.12+ Generic Syntax (PEP 695)

Python 3.12 adds a new generic class syntax that avoids manual TypeVar boilerplate. FrameRight works well with this style and still preserves backend inference:

from frameright import Schema, Field
from frameright.typing import Col

class Sales[T](Schema[T]):
        order_id: Col[int] = Field(unique=True)
        customer: Col[str]
        revenue: Col[float] = Field(ge=0)

Why both ``T``s?

  • Sales[T] declares the generic parameter.

  • Schema[T] forwards it to the base class so type checkers can infer

    the backend type from the constructor argument.

This is the shortest syntax that keeps full static typing without defaulting to a specific backend or collapsing to Any.

Pandas Backend

Installation:

pip install frameright

Pandas comes as a default dependency.

Features:

  • Full validation with pandera.pandas

  • Access to the entire Pandas ecosystem

  • Familiar API for existing Pandas users

  • Great for exploratory analysis and data science workflows

Example:

import pandas as pd

df = pd.read_csv("data.csv")
orders = Orders(df)

# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.groupby(orders.customer_id).sum()

# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)

Polars Backend

Installation:

pip install frameright[polars]

Why Polars?

  • 10-100x faster than Pandas on large datasets (1M+ rows)

  • Parallel execution — uses all CPU cores automatically

  • Lazy evaluation — build optimized query plans

  • Memory efficient — better memory layout and columnar processing

  • Modern API — expressive, consistent, and type-safe

Example:

import polars as pl

df = pl.read_csv("data.csv")
orders = Orders(df)

# Use fr_data for backend-specific operations
customer_totals = orders.fr_data.group_by(orders.customer_id).sum()

# You can always access the underlying DataFrame directly
print(orders.fr_data.columns)

Lazy Evaluation:

Polars supports lazy evaluation for complex query optimization:

# LazyFrame is automatically handled
lazy_df = pl.scan_csv("data.csv")
orders = Orders(lazy_df)  # Schema works with LazyFrames too

# Operations are lazy until you collect()
filtered_df = orders.fr_data.filter(orders.revenue > 1000)
# Execute the full query plan
result = filtered_df.collect()

Backend-Agnostic Schemas

The key benefit: write your schema once, use it with any backend.

class SalesData(Schema):
    """Works with both Pandas and Polars."""
    date: Col[str]
    product: Col[str]
    revenue: Col[float] = Field(ge=0)
    quantity: Col[int] = Field(ge=1)

# Use with Pandas during development
dev_df = pd.read_csv("sample.csv")
dev_data = SalesData(dev_df)

# Switch to Polars in production for better performance
prod_df = pl.read_csv("full_dataset.csv")
prod_data = SalesData(prod_df)

This means you can:

  • Prototype with Pandas (familiar, extensive library ecosystem)

  • Scale with Polars (performance, parallelism, memory efficiency)

  • Never rewrite your schema definitions

Validation with Pandera

Both backends use Pandera for validation:

  • Pandas backend uses pandera.pandas

  • Polars backend uses pandera.polars

The validation logic is identical. Pandera automatically handles backend-specific validation:

class Validated(Schema):
    amount: Col[float] = Field(ge=0, le=1000)
    status: Col[str] = Field(isin=["active", "inactive"])

# Pandera validates with pandas
pandas_df = pd.DataFrame({...})
data_pd = Validated(pandas_df)  # Uses pandera.pandas.DataFrameSchema

# Pandera validates with polars
polars_df = pl.DataFrame({...})
data_pl = Validated(polars_df)  # Uses pandera.polars.DataFrameSchema

Backend-Specific Operations

For backend-specific operations, use fr_data to access the underlying DataFrame directly:

orders = Orders(df)  # Works with either backend

# Pandas-specific
if isinstance(orders.fr_data, pd.DataFrame):
    result = orders.fr_data.groupby(orders.customer_id).sum()

# Polars-specific
elif isinstance(orders.fr_data, pl.DataFrame):
    result = orders.fr_data.group_by(orders.customer_id).sum()

For backend-agnostic access, use fr_data which returns a narwhals wrapper with full IDE autocomplete:

import narwhals as nw

# These work regardless of backend
orders.fr_data              # Returns nw.DataFrame or nw.LazyFrame
orders.fr_data.columns      # Column names (backend-agnostic)
orders.fr_data.schema       # Narwhals schema
orders.fr_data.to_native()  # Escape to native DataFrame (zero-copy)

Performance Comparison

Rough performance guidelines (results vary by dataset and operation):

Operation

Pandas

Polars

Small datasets (<100K)

Similar

Similar

Medium datasets (1M)

1x

5-20x

Large datasets (10M+)

1x

10-100x

Memory usage

1x

0.3-0.7x

Parallel aggregations

Single core

All cores

When to use Pandas:

  • Exploratory data analysis with lots of interactivity

  • Working with libraries that only support Pandas

  • Small to medium datasets where performance isn’t critical

  • When you need the extensive Pandas ecosystem

When to use Polars:

  • Large datasets (1M+ rows)

  • Performance-critical production pipelines

  • Memory-constrained environments

  • When you can benefit from parallel execution

Migrating Between Backends

Switching backends requires minimal code changes:

# Before (Pandas)
df = pd.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.groupby(orders.customer_id).sum()

# After (Polars)
df = pl.read_csv("data.csv")
orders = Orders(df)
result = orders.fr_data.group_by(orders.customer_id).sum()  # Note: group_by vs groupby

The schema definition (Orders) stays exactly the same. Only the DataFrame creation and backend-specific method calls change. For backend-agnostic code, use fr_data — the narwhals API is the same regardless of backend.

Adding a Backend (Advanced)

FrameRight’s backend layer is a simple adapter interface implemented per DataFrame library. Each backend module (frameright.pandas, frameright.polars.eager, etc.) provides its own Schema class with a hardcoded backend adapter instance.

No auto-detection or dispatch logic — importing from a specific module gives you that backend. This design is intentionally simple and fast:

from frameright.pandas import Schema       # _fr_backend = PandasBackend()
from frameright.polars.eager import Schema # _fr_backend = PolarsEagerBackend()
from frameright.polars.lazy import Schema  # _fr_backend = PolarsLazyBackend()

If you want to integrate another DataFrame implementation:

  1. Implement a BackendAdapter (see frameright.backends.base.BackendAdapter)

  2. Create a new module with a Schema class that sets _fr_backend

from frameright.backends.base import BackendAdapter
from frameright.core import BaseSchema

class MyBackend(BackendAdapter):
    # Implement required methods...
    pass

class Schema(BaseSchema):
    _fr_backend = MyBackend()

# Users import your Schema directly
from mypackage import Schema

Notes on cuDF

cuDF is a natural candidate because its API is intentionally close to Pandas. That said, there are two separate concerns:

  • DataFrame operations (get/set columns, filtering, I/O, etc.): cuDF can often be supported with a

    fairly thin adapter because many method names mirror Pandas.

  • Runtime validation (Pandera): Schema currently relies on Pandera’s Pandas and Polars backends.

    If Pandera doesn’t support cuDF validation in your environment, a cuDF adapter would either need to:

    • raise a clear NotImplementedError for fr_validate(), or

    • validate by materialising to Pandas (acceptable for small/medium data, but defeats GPU benefits), or

    • provide an alternative validation implementation.

If your primary goal is “typed column access + autocomplete” in production analysis code, cuDF can still be valuable even before full runtime validation is available — but it’s best treated as an experimental backend until the validation story is nailed down.