# Tables.jl Documentation

This guide provides documentation around the powerful tables interfaces in the Tables.jl package. Note that the package, and hence, documentation, are geared towards package and library developers who intend to implement and consume the interfaces. Users, on the other hand, benefit from these other packages that provide useful access to table data in various formats or workflows. While everyone is encouraged to understand the interfaces and the functionality they allow, just note that most users don't need to use Tables.jl directly.

With that said, don't hesitate to open a new issue, even just for a question, or come chat with us on the #data slack channel with questions, concerns, or clarifications. Also one can find list of packages that supports Tables.jl interface in INTEGRATIONS.md.

Please refer to TableOperations.jl for common table operations such as `select`

, `transform`

, `filter`

and `map`

.

## Using the Interface (i.e. consuming Tables.jl-compatible sources)

We start by discussing *usage* of the Tables.jl interface functions, since that can help contextualize *implementing* them for custom table types.

At a high level, Tables.jl provides two powerful APIs for predictably accessing data from any table-like source:

```
# access data of input table `x` row-by-row
# Tables.rows must return a row iterator
rows = Tables.rows(x)
# we can iterate through each row
for row in rows
# example of getting all values in the row
# don't worry, there are other ways to more efficiently process rows
rowvalues = [Tables.getcolumn(row, col) for col in Tables.columnnames(row)]
end
# access data of input table `x` column-by-column
# Tables.columns returns an object where individul, entire columns can be accessed
columns = Tables.columns(x)
# iterate through each column name in table
for col in Tables.columnnames(columns)
# retrieve entire column by column name
# a column is an indexable collection
# with known length (i.e. supports
# `length(column)` and `column[i]`)
column = Tables.getcolumn(columns, col)
end
```

So we see two high-level functions here, `Tables.rows`

, and `Tables.columns`

.

`Tables.rows`

— Function`Tables.rows(x) => Row iterator`

Accesses data of input table source `x`

row-by-row by returning an `AbstractRow`

-compatible iterator. Note that even if the input table source is column-oriented by nature, an efficient generic definition of `Tables.rows`

is defined in Tables.jl to return an iterator of row views into the columns of the input.

The `Tables.Schema`

of an `AbstractRow`

iterator can be queried via `Tables.schema(rows)`

, which may return `nothing`

if the schema is unknown. Column names can always be queried by calling `Tables.columnnames(row)`

on an individual row, and row values can be accessed by calling `Tables.getcolumn(rows, i::Int )`

or `Tables.getcolumn(rows, nm::Symbol)`

with a column index or name, respectively.

See also `rowtable`

and `namedtupleiterator`

.

`Tables.columns`

— Function`Tables.columns(x) => AbstractColumns-compatible object`

Accesses data of input table source `x`

by returning an `AbstractColumns`

-compatible object, which allows retrieving entire columns by name or index. A retrieved column is a 1-based indexable object that has a known length, i.e. supports `length(col)`

and `col[i]`

for any `i = 1:length(col)`

. Note that even if the input table source is row-oriented by nature, an efficient generic definition of `Tables.columns`

is defined in Tables.jl to build a `AbstractColumns`

- compatible object object from the input rows.

The `Tables.Schema`

of a `AbstractColumns`

object can be queried via `Tables.schema(columns)`

, which may return `nothing`

if the schema is unknown. Column names can always be queried by calling `Tables.columnnames(columns)`

, and individual columns can be accessed by calling `Tables.getcolumn(columns, i::Int )`

or `Tables.getcolumn(columns, nm::Symbol)`

with a column index or name, respectively.

Given these two powerful data access methods, let's walk through real, albeit somewhat simplified versions of how packages actually use these methods.

`Tables.rows`

usage

First up, let's take a look at the SQLite.jl package and how it uses the Tables.jl interface to allow loading of generic table-like data into a sqlite relational table. Here's the code:

```
function load!(table, db::SQLite.DB, tablename)
# get input table rows
rows = Tables.rows(table)
# query for schema of data
sch = Tables.schema(rows)
# create table using tablename and schema from input table
createtable!(db, tablename, sch)
# build insert statement
params = chop(repeat("?,", length(sch.names)))
stmt = Stmt(db, "INSERT INTO $tablename VALUES ($params)")
# start a transaction for inserting rows
transaction(db) do
# iterate over rows in the input table
for row in rows
# Tables.jl provides a utility function
# Tables.eachcolumn, which allows efficiently
# applying a function to each column value in a row
# it's called with a schema and row, and applies
# a user-provided function to the column value `val`, index `i`
# and column name `nm`. Here, we bind the row values
# to our parameterized SQL INSERT statement and then
# call `sqlite3_step` to execute the INSERT statement.
Tables.eachcolumn(sch, row) do val, i, nm
bind!(stmt, i, val)
end
sqlite3_step(stmt.handle)
sqlite3_reset(stmt.handle)
end
end
return
end
```

This is pretty straightforward usage: it calls `Tables.rows`

on the input table source, and since we need the schema to setup the database table, we query it via `Tables.schema`

. We then iterate the rows in our table via `for row in rows`

, and use the convenient `Tables.eachcolumn`

to efficiently apply a function to each value in the row. Note that we didn't call `Tables.columnnames`

or `Tables.getcolumn`

at all, since they're utilized by `Tables.eachcolumn`

itself. `Tables.eachcolumn`

is optimized to provide type-stable, and even constant-propagation of column index, name, and type in some cases to allow for efficient consumption of row values.

One wrinkle to consider is the "unknown schema" case; i.e. what if our `Tables.schema`

call had returned `nothing`

(this can be the case for exotic table sources like lazily mapped transformations over rows in a table):

```
function load!(sch::Nothing, rows, db::SQLite.DB, tablename)
# sch is nothing === unknown schema
# start iteration on input table rows
state = iterate(rows)
state === nothing && return
row, st = state
# query column names of first row
names = Tables.columnnames(row)
# partially construct Tables.Schema by at least passing
# the column names to it
sch = Tables.Schema(names, nothing)
# create table if needed
createtable!(db, tablename, sch)
# build insert statement
params = chop(repeat("?,", length(names)))
stmt = Stmt(db, "INSERT INTO $nm VALUES ($params)")
# start a transaction for inserting rows
transaction(db) do
while true
# just like before, we can still use `Tables.eachcolumn`
# even with our partially constructed Tables.Schema
# to apply a function to each value in the row
Tables.eachcolumn(sch, row) do val, i, nm
bind!(stmt, i, val)
end
sqlite3_step(stmt.handle)
sqlite3_reset(stmt.handle)
# keep iterating rows until we finish
state = iterate(rows, st)
state === nothing && break
row, st = state
end
end
return name
end
```

The strategy taken here is to start iterating the input source, and using the first row as a guide, we make a `Tables.Schema`

object with just the column names, which we can then still pass to `Tables.eachcolumn`

to apply our `bind!`

function to each row value.

`Tables.columns`

usage

Ok, now let's take a look at a case utlizing `Tables.columns`

. The following code is taken from the DataFrames.jl Tables.jl implementation:

```
getvector(x::AbstractVector) = x
getvector(x) = collect(x)
# note that copycols is ignored in this definition (Tables.CopiedColumns implies copies have already been made)
fromcolumns(x::Tables.CopiedColumns, names; copycols::Bool=true) =
DataFrame(AbstractVector[getvector(Tables.getcolumn(x, nm) for nm in names],
Index(names),
copycols=false)
fromcolumns(x; copycols::Bool=true) =
DataFrame(AbstractVector[getvector(Tables.getcolumn(x, nm) for nm in names],
Index(names),
copycols=copycols)
function DataFrame(x; copycols::Bool=true)
# get columns from input table source
cols = Tables.columns(x)
# get column names as Vector{Symbol}, which is required
# by core DataFrame constructor
names = collect(Symbol, Tables.columnnames(cols))
return fromcolumns(cols, names; copycols=copycols)
end
```

So here we have a generic `DataFrame`

constructor that takes a single, untyped argument, calls `Tables.columns`

on it, then `Tables.columnnames`

to get the column names. It then passes the `Tables.AbstractColumns`

-compatible object to an internal function `fromcolumns`

, which dispatches on a special kind of `Tables.AbstractColumns`

object called a `Tables.CopiedColumns`

, which wraps any `Tables.AbstractColumns`

-compatible object that has already had copies of its columns made, and are thus safe for the columns-consumer to assume ownership of (this is because DataFrames.jl, by default makes copies of all columns upon construction). In both cases, individual columns are collected in `Vector{AbstractVector}`

s by calling `Tables.getcolumn(x, nm)`

for each column name. A final note is the call to `getvector`

on each column, which ensures each column is materialized as an `AbstractVector`

, as is required by the DataFrame constructor.

Note in both the rows and columns usages, we didn't need to worry about the natural orientation of the input data; we just called `Tables.rows`

or `Tables.columns`

as was most natural for the table-specific use-case, knowing that it will Just Work™️.

### Tables.jl Utilities

Before moving on to *implementing* the Tables.jl interfaces, we take a quick break to highlight some useful utility functions provided by Tables.jl:

`Tables.Schema`

— Type`Tables.Schema(names, types)`

Create a `Tables.Schema`

object that holds the column names and types for an `AbstractRow`

iterator returned from `Tables.rows`

or an `AbstractColumns`

object returned from `Tables.columns`

. `Tables.Schema`

is dual-purposed: provide an easy interface for users to query these properties, as well as provide a convenient "structural" type for code generation.

To get a table's schema, one can call `Tables.schema`

on the result of `Tables.rows`

or `Tables.columns`

, but also note that a table may return `nothing`

, indicating that its column names and/or column element types are unknown (usually not inferrable). This is similar to the `Base.EltypeUnknown()`

trait for iterators when `Base.IteratorEltype`

is called. Users should account for the `Tables.schema(tbl) => nothing`

case by using the properties of the results of `Tables.rows(x)`

and `Tables.columns(x)`

directly.

To access the names, one can simply call `sch.names`

to return a collection of Symbols (`Tuple`

or `Vector`

). To access column element types, one can similarly call `sch.types`

, which will return a collection of types (like `(Int64, Float64, String)`

).

The actual type definition is

```
struct Schema{names, types}
storednames::Union{Nothing, Vector{Symbol}}
storedtypes::Union{Nothing, Vector{Type}}
end
```

Where `names`

is a tuple of `Symbol`

s or `nothing`

, and `types`

is a tuple *type* of types (like `Tuple{Int64, Float64, String}`

) or `nothing`

. Encoding the names & types as type parameters allows convenient use of the type in generated functions and other optimization use-cases, but users should note that when `names`

and/or `types`

are the `nothing`

value, the names and/or types are stored in the `storednames`

and `storedtypes`

fields. This is to account for extremely wide tables with columns in the 10s of thousands where encoding the names/types as type parameters becomes prohibitive to the compiler. So while optimizations can be written on the typed `names`

/`types`

type parameters, users should also consider handling the extremely wide tables by specializing on `Tables.Schema{nothing, nothing}`

.

`Tables.schema`

— Function`Tables.schema(x) => Union{Nothing, Tables.Schema}`

Attempt to retrieve the schema of the object returned by `Tables.rows`

or `Tables.columns`

. If the `AbstractRow`

iterator or `AbstractColumns`

object can't determine its schema, `nothing`

will be returned. Otherwise, a `Tables.Schema`

object is returned, with the column names and types available for use.

`Tables.partitions`

— Function`Tables.partitions(x)`

Request a "table" iterator from `x`

. Each iterated element must be a "table" in the sense that one may call `Tables.rows`

or `Tables.columns`

to get a row-iterator or collection of columns. All iterated elements *must* have identical schema, so that users may call `Tables.schema(first_element)`

on the first iterated element and know that each subsequent iteration will match the same schema. The default definition is:

`Tables.partitions(x) = (x,)`

So that any input is assumed to be a single "table". This means users should feel free to call `Tables.partitions`

anywhere they're currently calling `Tables.columns`

or `Tables.rows`

, and get back an iterator of those instead. In other words, "sink" functions can use `Tables.partitions`

whether or not the user passes a partionable table, since the default is to treat a single input as a single, non-partitioned table.

`Tables.partitioner(itr)`

(@ref) is a convenience wrapper to provide table partitions from any table iterator; this allows for easy wrapping of a `Vector`

or iterator of tables as valid partitions, since by default, they'd be treated as a single table.

A 2nd convenience method is provided with the defintion:

`Tables.partitions(x...) = x`

That allows passing vararg tables and they'll be treated as separate partitions. Sink functions may allow vararg table inputs and can "splat them through" to `partitions`

.

For convenience, `Tables.partitions(Iterators.partition(...))`

is defined for cases where user-controlled partitioning is desired over an applicable input (an input iterator).

`Tables.partitioner`

— Function```
Tables.partitioner(f, itr)
Tables.partitioner(x)
```

Convenience methods to generate table iterators. The first method takes a "materializer" function `f`

and an iterator `itr`

, and will call `Tables.LazyTable(f, x) for x in itr`

for each iteration. This allows delaying table materialization until `Tables.columns`

or `Tables.rows`

are called on the `LazyTable`

object (which will call `f(x)`

). This allows a common desired pattern of materializing and processing a table on a remote process or thread, like:

```
for tbl in Tables.partitions(Tables.partitioner(CSV.File, list_of_csv_files))
Threads.@spawn begin
cols = Tables.columns(tbl)
# do stuff with cols
end
end
```

The second method is provided because the default behavior of `Tables.partition(x)`

is to treat `x`

as a single, non-partitioned table. This method allows users to easily wrap a `Vector`

or generator of tables as table partitions to pass to sink functions able to utilize `Tables.partitions`

.

`Tables.rowtable`

— Function`Tables.rowtable(x) => Vector{NamedTuple}`

Take any input table source, and produce a `Vector`

of `NamedTuple`

s, also known as a "row table". A "row table" is a kind of default table type of sorts, since it satisfies the Tables.jl row interface naturally, i.e. a `Vector`

naturally iterates its elements, and `NamedTuple`

satisifes the `AbstractRow`

interface by default (allows indexing value by index, name, and getting all names).

For a lazy iterator over rows see `rows`

and `namedtupleiterator`

.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing `NamedTuple`

s that large.

`Tables.columntable`

— Function`Tables.columntable(x) => NamedTuple of Vectors`

Takes any input table source `x`

and returns a `NamedTuple`

of `Vector`

s, also known as a "column table". A "column table" is a kind of default table type of sorts, since it satisfies the Tables.jl column interface naturally.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing `NamedTuple`

s that large.

`Tables.dictrowtable`

— Function`Tables.dictrowtable(x) => Tables.DictRowTable`

Take any Tables.jl-compatible source `x`

and return a `DictRowTable`

, which can be thought of as a `Vector`

of `Dict`

rows mapping column names as `Symbol`

s to values. The order of the input table columns is preserved via the `Tables.schema(::DictRowTable)`

.

For "schema-less" input tables, `dictrowtable`

employs a "column unioning" behavior, as opposed to inferring the schema from the first row like `Tables.columns`

. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value `missing`

, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when the union behavior is needed.

`Tables.dictcolumntable`

— Function`Tables.dictcolumntable(x) => Tables.DictColumnTable`

Take any Tables.jl-compatible source `x`

and return a `DictColumnTable`

, which can be thought of as a `Dict`

mapping column names as `Symbol`

s to `AbstractVector`

s. The order of the input table columns is preserved via the `Tables.schema(::DictColumnTable)`

.

For "schema-less" input tables, `dictcolumntable`

employs a "column unioning" behavior, as opposed to inferring the schema from the first row like `Tables.columns`

. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value `missing`

, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when needed.

`Tables.namedtupleiterator`

— Function`Tables.namedtupleiterator(x)`

Pass any table input source and return a `NamedTuple`

iterator

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing `NamedTuple`

s that large.

`Tables.datavaluerows`

— Function`Tables.datavaluerows(x) => NamedTuple iterator`

Takes any table input `x`

and returns a `NamedTuple`

iterator that will replace missing values with `DataValue`

-wrapped values; this allows any table type to satisfy the TableTraits.jl Queryverse integration interface by defining:

`IteratorInterfaceExtensions.getiterator(x::MyTable) = Tables.datavaluerows(x)`

`Tables.nondatavaluerows`

— Function`Tables.nondatavaluerows(x)`

Takes any Queryverse-compatible `NamedTuple`

iterator source and converts to a Tables.jl-compatible `AbstractRow`

iterator. Will automatically unwrap any `DataValue`

s, replacing `NA`

with `missing`

. Useful for translating Query.jl results back to non-`DataValue`

-based tables.

`Tables.table`

— Function`Tables.table(m::AbstractVecOrMat; [header])`

Wrap an `AbstractVecOrMat`

(`Matrix`

, `Vector`

, `Adjoint`

, etc.) in a `MatrixTable`

, which satisfies the Tables.jl interface. (An `AbstractVector`

is treated as a 1-column matrix.) This allows accessing the matrix via `Tables.rows`

and `Tables.columns`

. An optional keyword argument iterator `header`

can be passed which will be converted to a `Vector{Symbol}`

to be used as the column names. Note that no copy of the `AbstractVecOrMat`

is made.

`Tables.matrix`

— Function`Tables.matrix(table; transpose::Bool=false)`

Materialize any table source input as a new `Matrix`

or in the case of a `MatrixTable`

return the originally wrapped matrix. If the table column element types are not homogenous, they will be promoted to a common type in the materialized `Matrix`

. Note that column names are ignored in the conversion. By default, input table columns will be materialized as corresponding matrix columns; passing `transpose=true`

will transpose the input with input columns as matrix rows or in the case of a `MatrixTable`

apply `permutedims`

to the originally wrapped matrix.

`Tables.eachcolumn`

— Function```
Tables.eachcolumn(f, sch::Tables.Schema{names, types}, x::Union{Tables.AbstractRow, Tables.AbstractColumns})
Tables.eachcolumn(f, sch::Tables.Schema{names, nothing}, x::Union{Tables.AbstractRow, Tables.AbstractColumns})
```

Takes a function `f`

, table schema `sch`

, `x`

, which is an object that satisfies the `AbstractRow`

or `AbstractColumns`

interfaces; it generates calls to get the value for each column (`Tables.getcolumn(x, nm)`

) and then calls `f(val, index, name)`

, where `f`

is the user-provided function, `val`

is the column value (`AbstractRow`

) or entire column (`AbstractColumns`

), `index`

is the column index as an `Int`

, and `name`

is the column name as a `Symbol`

.

An example using `Tables.eachcolumn`

is:

```
rows = Tables.rows(tbl)
sch = Tables.schema(rows)
if sch === nothing
state = iterate(rows)
state === nothing && return
row, st = state
sch = Tables.schema(Tables.columnnames(row), nothing)
while state !== nothing
Tables.eachcolumn(sch, row) do val, i, nm
bind!(stmt, i, val)
end
state = iterate(rows, st)
state === nothing && return
row, st = state
end
else
for row in rows
Tables.eachcolumn(sch, row) do val, i, nm
bind!(stmt, i, val)
end
end
end
```

Note in this example we account for the input table potentially returning `nothing`

from `Tables.schema(rows)`

; in that case, we start iterating the rows, and build a partial schema using the column names from the first row `sch = Tables.schema(Tables.columnnames(row), nothing)`

, which is valid to pass to `Tables.eachcolumn`

.

`Tables.materializer`

— Function`Tables.materializer(x) => Callable`

For a table input, return the "sink" function or "materializing" function that can take a Tables.jl-compatible table input and make an instance of the table type. This enables "transform" workflows that take table inputs, apply transformations, potentially converting the table to a different form, and end with producing a table of the same type as the original input. The default materializer is `Tables.columntable`

, which converts any table input into a `NamedTuple`

of `Vector`

s.

`Tables.columnindex`

— Function`Tables.columnindex(table, name::Symbol)`

Return the column index (1-based) of a column by `name`

in a table with a known schema; returns 0 if `name`

doesn't exist in table

given names and a Symbol `name`

, compute the index (1-based) of the name in names

`Tables.columntype`

— Function`Tables.columntype(table, name::Symbol)`

Return the column element type of a column by `name`

in a table with a known schema; returns Union{} if `name`

doesn't exist in table

given tuple type and a Symbol `name`

, compute the type of the name in the tuples types

`Tables.rowmerge`

— Function```
rowmerge(row, other_rows...)
rowmerge(row; fields_to_merge...)
```

Return a `NamedTuple`

by merging `row`

(an `AbstractRow`

-compliant value) with `other_rows`

(one or more `AbstractRow`

-compliant values) via `Base.merge`

. This function is similar to `Base.merge(::NamedTuple, ::NamedTuple...)`

, but accepts `AbstractRow`

-compliant values instead of `NamedTuple`

s.

A convenience method `rowmerge(row; fields_to_merge...) = rowmerge(row, fields_to_merge)`

is defined that enables the `fields_to_merge`

to be specified as keyword arguments.

`Tables.Row`

— Type`Tables.Row(row)`

Convenience type to wrap any `AbstractRow`

interface object in a dedicated struct to provide useful default behaviors (allows any `AbstractRow`

to be used like a `NamedTuple`

):

- Indexing interface defined; i.e.
`row[i]`

will return the column value at index`i`

,`row[nm]`

will return column value for column name`nm`

- Property access interface defined; i.e.
`row.col1`

will retrieve the value for the column named`col1`

- Iteration interface defined; i.e.
`for x in row`

will iterate each column value in the row `AbstractDict`

methods defined (`get`

,`haskey`

, etc.) for checking and retrieving column values

`Tables.Columns`

— Type`Tables.Columns(columns)`

Convenience type to wrap any `AbstractColumns`

interface object in a dedicated struct to provide useful default behaviors (allows any `AbstractColumns`

to be used like a `NamedTuple`

of `Vectors`

):

- Indexing interface defined; i.e.
`row[i]`

will return the column at index`i`

,`row[nm]`

will return column for column name`nm`

- Property access interface defined; i.e.
`row.col1`

will retrieve the value for the column named`col1`

- Iteration interface defined; i.e.
`for x in row`

will iterate each column in the row `AbstractDict`

methods defined (`get`

,`haskey`

, etc.) for checking and retrieving columns

## Implementing the Interface (i.e. becoming a Tables.jl source)

Now that we've seen how one *uses* the Tables.jl interface, let's walk-through how to implement it; i.e. how can I make my custom type valid for Tables.jl consumers?

The interface to becoming a proper table is straightforward:

Required Methods | Default Definition | Brief Description |
---|---|---|

`Tables.istable(table)` | Declare that your table type implements the interface | |

One of: | ||

`Tables.rowaccess(table)` | Declare that your table type defines a `Tables.rows(table)` method | |

`Tables.rows(table)` | Return an `Tables.AbstractRow` -compatible iterator from your table | |

Or: | ||

`Tables.columnaccess(table)` | Declare that your table type defines a `Tables.columns(table)` method | |

`Tables.columns(table)` | Return an `Tables.AbstractColumns` -compatible object from your table | |

Optional methods | ||

`Tables.schema(x)` | `Tables.schema(x) = nothing` | Return a `Tables.Schema` object from your `Tables.AbstractRow` iterator or `Tables.AbstractColumns` object; or `nothing` for unknown schema |

`Tables.materializer(table)` | `Tables.columntable` | Declare a "materializer" sink function for your table type that can construct an instance of your type from any Tables.jl input |

Based on whether your table type has defined `Tables.rows`

or `Tables.columns`

, you then ensure that the `Tables.AbstractRow`

iterator or `Tables.AbstractColumns`

object satisfies the respective interface.

As an additional source of documentation, see this discourse post outlining in detail a walk-through of making a row-oriented table.

`Tables.AbstractRow`

`Tables.AbstractRow`

— Type`Tables.AbstractRow`

Abstract interface type representing the expected `eltype`

of the iterator returned from `Tables.rows(table)`

. `Tables.rows`

must return an iterator of elements that satisfy the `Tables.AbstractRow`

interface. While `Tables.AbstractRow`

is an abstract type that custom "row" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects **are not required** to subtype, but only implement the required interface methods.

Interface definition:

Required Methods | Default Definition | Brief Description |
---|---|---|

`Tables.getcolumn(row, i::Int)` | getfield(row, i) | Retrieve a column value by index |

`Tables.getcolumn(row, nm::Symbol)` | getproperty(row, nm) | Retrieve a column value by name |

`Tables.columnnames(row)` | propertynames(row) | Return column names for a row as an indexable collection |

Optional methods | ||

`Tables.getcolumn(row, ::Type{T}, i::Int, nm::Symbol)` | Tables.getcolumn(row, nm) | Given a column element type `T` , index `i` , and column name `nm` , retrieve the column value. Provides a type-stable or even constant-prop-able mechanism for efficiency. |

Note that subtypes of `Tables.AbstractRow`

**must** overload all required methods listed above instead of relying on these methods' default definitions.

While custom row types aren't required to subtype `Tables.AbstractRow`

, benefits of doing so include:

- Indexing interface defined (using
`getcolumn`

); i.e.`row[i]`

will return the column value at index`i`

- Property access interface defined (using
`columnnames`

and`getcolumn`

); i.e.`row.col1`

will retrieve the value for the column named`col1`

- Iteration interface defined; i.e.
`for x in row`

will iterate each column value in the row `AbstractDict`

methods defined (`get`

,`haskey`

, etc.) for checking and retrieving column values- A default
`show`

method

This allows the custom row type to behave as close as possible to a builtin `NamedTuple`

object.

`Tables.AbstractColumns`

`Tables.AbstractColumns`

— Type`Tables.AbstractColumns`

An interface type defined as an ordered set of columns that support retrieval of individual columns by name or index. A retrieved column must be a 1-based indexable collection with known length, i.e. an object that supports `length(col)`

and `col[i]`

for any `i = 1:length(col)`

. `Tables.columns`

must return an object that satisfies the `Tables.AbstractColumns`

interface. While `Tables.AbstractColumns`

is an abstract type that custom "columns" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects **are not required** to subtype, but only implement the required interface methods.

Interface definition:

Required Methods | Default Definition | Brief Description |
---|---|---|

`Tables.getcolumn(table, i::Int)` | getfield(table, i) | Retrieve a column by index |

`Tables.getcolumn(table, nm::Symbol)` | getproperty(table, nm) | Retrieve a column by name |

`Tables.columnnames(table)` | propertynames(table) | Return column names for a table as an indexable collection |

Optional methods | ||

`Tables.getcolumn(table, ::Type{T}, i::Int, nm::Symbol)` | Tables.getcolumn(table, nm) | Given a column eltype `T` , index `i` , and column name `nm` , retrieve the column. Provides a type-stable or even constant-prop-able mechanism for efficiency. |

Note that subtypes of `Tables.AbstractColumns`

**must** overload all required methods listed above instead of relying on these methods' default definitions.

While types aren't required to subtype `Tables.AbstractColumns`

, benefits of doing so include:

- Indexing interface defined (using
`getcolumn`

); i.e.`tbl[i]`

will retrieve the column at index`i`

- Property access interface defined (using
`columnnames`

and`getcolumn`

); i.e.`tbl.col1`

will retrieve column named`col1`

- Iteration interface defined; i.e.
`for col in table`

will iterate each column in the table `AbstractDict`

methods defined (`get`

,`haskey`

, etc.) for checking and retrieving columns- A default
`show`

method

This allows a custom table type to behave as close as possible to a builtin `NamedTuple`

of vectors object.

### Implementation Example

As an extended example, let's take a look at some code defined in Tables.jl for treating `AbstractVecOrMat`

s as tables.

First, we define a special `MatrixTable`

type that will wrap an `AbstractVecOrMat`

, and allow easy overloading for the Tables.jl interface.

```
struct MatrixTable{T <: AbstractVecOrMat} <: Tables.AbstractColumns
names::Vector{Symbol}
lookup::Dict{Symbol, Int}
matrix::T
end
# declare that MatrixTable is a table
Tables.istable(::Type{<:MatrixTable}) = true
# getter methods to avoid getproperty clash
names(m::MatrixTable) = getfield(m, :names)
matrix(m::MatrixTable) = getfield(m, :matrix)
lookup(m::MatrixTable) = getfield(m, :lookup)
# schema is column names and types
Tables.schema(m::MatrixTable{T}) where {T} = Tables.Schema(names(m), fill(eltype(T), size(mat(m), 2)))
```

Here we defined `Tables.istable`

for all `MatrixTable`

types, signaling that they implement the Tables.jl interfaces. We also defined `Tables.schema`

by pulling the column names out that we stored, and since `AbstractVecOrMat`

have a single `eltype`

, we repeat it for each column (the call to `fill`

). Note that defining `Tables.schema`

is optional on tables; by default, `nothing`

is returned and Tables.jl consumers should account for both known and unknown schema cases. Returning a schema when possible allows consumers to have certain optimizations when they can know the types of all columns upfront (and if the # of columns isn't too large) to generate more efficient code.

Now, in this example, we're actually going to have `MatrixTable`

implement *both* `Tables.rows`

and `Tables.columns`

methods itself, i.e. it's going to return itself from those functions, so here's first how we make our `MatrixTable`

a valid `Tables.AbstractColumns`

object:

```
# column interface
Tables.columnaccess(::Type{<:MatrixTable}) = true
Tables.columns(m::MatrixTable) = m
# required Tables.AbstractColumns object methods
Tables.getcolumn(m::MatrixTable, ::Type{T}, col::Int, nm::Symbol) where {T} = matrix(m)[:, col]
Tables.getcolumn(m::MatrixTable, nm::Symbol) = matrix(m)[:, lookup(m)[nm]]
Tables.getcolumn(m::MatrixTable, i::Int) = matrix(m)[:, i]
Tables.columnnames(m::MatrixTable) = names(m)
```

We define `columnaccess`

for our type, then `columns`

just returns the `MatrixTable`

itself, and then we define the three `getcolumn`

methods and `columnnames`

. Note the use of a `lookup`

`Dict`

that maps column name to column index so we can figure out which column to return from the matrix. We're also storing the column names in our `names`

field so the `columnnames`

implementation is trivial. And that's it! Literally! It can now be written out to a csv file, stored in a sqlite or other database, converted to DataFrame or JuliaDB table, etc. Pretty fun.

And now for the `Tables.rows`

implementation:

```
# declare that any MatrixTable defines its own `Tables.rows` method
rowaccess(::Type{<:MatrixTable}) = true
# just return itself, which means MatrixTable must iterate `Tables.AbstractRow`-compatible objects
rows(m::MatrixTable) = m
# the iteration interface, at a minimum, requires `eltype`, `length`, and `iterate`
# for `MatrixTable` `eltype`, we're going to provide a custom row type
Base.eltype(m::MatrixTable{T}) where {T} = MatrixRow{T}
Base.length(m::MatrixTable) = size(matrix(m), 1)
Base.iterate(m::MatrixTable, st=1) = st > length(m) ? nothing : (MatrixRow(st, m), st + 1)
# a custom row type; acts as a "view" into a row of an AbstractVecOrMat
struct MatrixRow{T} <: Tables.AbstractRow
row::Int
source::MatrixTable{T}
end
# required `Tables.AbstractRow` interface methods (same as for `Tables.AbstractColumns` object before)
# but this time, on our custom row type
getcolumn(m::MatrixRow, ::Type, col::Int, nm::Symbol) =
getfield(getfield(m, :source), :matrix)[getfield(m, :row), col]
getcolumn(m::MatrixRow, i::Int) =
getfield(getfield(m, :source), :matrix)[getfield(m, :row), i]
getcolumn(m::MatrixRow, nm::Symbol) =
getfield(getfield(m, :source), :matrix)[getfield(m, :row), getfield(getfield(m, :source), :lookup)[nm]]
columnnames(m::MatrixRow) = names(getfield(m, :source))
```

Here we start by defining `Tables.rowaccess`

and `Tables.rows`

, and then the iteration interface methods, since we declared that a `MatrixTable`

itself is an iterator of `Tables.AbstractRow`

-compatible objects. For `eltype`

, we say that a `MatrixTable`

iterates our own custom row type, `MatrixRow`

. `MatrixRow`

subtypes `Tables.AbstractRow`

, which provides interface implementations for several useful behaviors (indexing, iteration, property-access, etc.); essentially it makes our custom `MatrixRow`

type more convenient to work with.

Implementing the `Tables.AbstractRow`

interface is straightfoward, and very similar to our implementation of `Tables.AbstractColumns`

previously (i.e. the same methods for `getcolumn`

and `columnnames`

).

And that's it. Our `MatrixTable`

type is now a fully fledged, valid Tables.jl source and can be used throughout the ecosystem. Now, this is obviously not a lot of code; but then again, the actual Tables.jl interface implementations tend to be fairly simple, given the other behaviors that are already defined for table types (i.e. table types tend to already have a `getcolumn`

like function defined).

`Tables.isrowtable`

One option for certain table types is to define `Tables.isrowtable`

to automatically satisfy the Tables.jl interface. This can be convenient for "natural" table types that already iterate rows.

`Tables.isrowtable`

— Function`Tables.isrowtable(x) => Bool`

For convenience, some table objects that are naturally "row oriented" can define `Tables.isrowtable(::Type{TableType}) = true`

to simplify satisfying the Tables.jl interface. Requirements for defining `isrowtable`

include:

`Tables.rows(x) === x`

, i.e. the table object itself is a`Row`

iterator- If the table object is mutable, it should support:
`push!(x, row)`

: allow pushing a single row onto table`append!(x, rows)`

: allow appending set of rows onto table

- If table object is mutable and indexable, it should support:
`x[i] = row`

: allow replacing of a row with another row by index

A table object that defines `Tables.isrowtable`

will have definitions for `Tables.istable`

, `Tables.rowaccess`

, and `Tables.rows`

automatically defined.

### Testing Tables.jl Implementations

One question that comes up is what the best strategies are for testing a Tables.jl implementation. Continuing with our `MatrixTable`

example, let's see some useful ways to test that things are working as expected.

`mat = [1 4.0 "7"; 2 5.0 "8"; 3 6.0 "9"]`

First, we define a matrix literal with three columns of various differently typed values.

```
# first, create a MatrixTable from our matrix input
mattbl = Tables.table(mat)
# test that the MatrixTable `istable`
@test Tables.istable(typeof(mattbl))
# test that it defines row access
@test Tables.rowaccess(typeof(mattbl))
@test Tables.rows(mattbl) === mattbl
# test that it defines column access
@test Tables.columnaccess(typeof(mattbl))
@test Tables.columns(mattbl) === mattbl
# test that we can access the first "column" of our matrix table by column name
@test mattbl.Column1 == [1,2,3]
# test our `Tables.AbstractColumns` interface methods
@test Tables.getcolumn(mattbl, :Column1) == [1,2,3]
@test Tables.getcolumn(mattbl, 1) == [1,2,3]
@test Tables.columnnames(mattbl) == [:Column1, :Column2, :Column3]
# now let's iterate our MatrixTable to get our first MatrixRow
matrow = first(mattbl)
@test eltype(mattbl) == typeof(matrow)
# now we can test our `Tables.AbstractRow` interface methods on our MatrixRow
@test matrow.Column1 == 1
@test Tables.getcolumn(matrow, :Column1) == 1
@test Tables.getcolumn(matrow, 1) == 1
@test propertynames(mattbl) == propertynames(matrow) == [:Column1, :Column2, :Column3]
```

So, it looks like our `MatrixTable`

type is looking good. It's doing everything we'd expect with regards to accessing its rows or columns via the Tables.jl API methods. Testing a table source like this is fairly straightforward since we're really just testing that our interface methods are doing what we expect them to do.

Now, while we didn't go over a "sink" function for matrices in our walkthrough, there does indeed exist a `Tables.matrix`

function that allows converting any table input source into a plain Julia `Matrix`

object.

Having both Tables.jl "source" and "sink" implementations (i.e. a type that is a Tables.jl-compatible source, as well as a way to *consume* other tables), allows us to do some additional "round trip" testing:

```
rt = [(a=1, b=4.0, c="7"), (a=2, b=5.0, c="8"), (a=3, b=6.0, c="9")]
ct = (a=[1,2,3], b=[4.0, 5.0, 6.0])
```

In addition to our `mat`

object earlier, we can define a couple simple "tables"; in this case `rt`

is a kind of default "row table" as a `Vector`

of `NamedTuple`

s, while `ct`

is a default "column table" as a `NamedTuple`

of `Vector`

s. Notice that they contain mostly the same data as our matrix literal earlier, yet in slightly different storage formats. These default "row" and "column" tables are supported by default in Tables.jl due do their natural table representations, and hence can be excellent tools in testing table integrations.

```
# let's turn our row table into a plain Julia Matrix object
mat = Tables.matrix(rt)
# test that our matrix came out like we expected
@test mat[:, 1] == [1, 2, 3]
@test size(mat) == (3, 3)
@test eltype(mat) == Any
# so we successfully consumed a row-oriented table,
# now let's try with a column-oriented table
mat2 = Tables.matrix(ct)
@test eltype(mat2) == Float64
@test mat2[:, 1] == ct.a
# now let's take our matrix input, and make a column table out of it
tbl = Tables.table(mat) |> columntable
@test keys(tbl) == (:Column1, :Column2, :Column3)
@test tbl.Column1 == [1, 2, 3]
# and same for a row table
tbl2 = Tables.table(mat2) |> rowtable
@test length(tbl2) == 3
@test map(x->x.Column1, tbl2) == [1.0, 2.0, 3.0]
```