Anatomy of DTable

DTable is the primary data structure in the dtables package. All data is loaded and manipulated in a DTable. It's similar to DataFrame construct in pandas and R (only much simpler).

DTable utilizes columnar data storage. Observations pertaining to each column are stored in a single ndarray. Numeric data is by default stored as float64 data type, string data is stored as fixed width string data type and timestamps are stored as numeric (micro-seconds since epoch).

DTable Overview

We'll see how you can navigate the DTable to explore data. First, let's load a dataset:

>> iris_dt = datasets.load('iris')

DTable provides some useful methods to help you get a feel of data:

>> iris_dt.meta.size()

6 variables, 152 observations
>> iris_dt.meta.column_names
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']

>> iris_dt.meta.column_dtypes
[
    ('Id', np.float64),
    ('SepalLengthCm', 'float64'),
    ('SepalWidthCm', 'float64'),
    ('PetalLengthCm', 'float64'),
    ('PetalWidthCm', 'float64'),
    ('Species', '<U11')
]
>> iris_dt.head(4)

Id    SepalLengthCm    SepalWidthCm    PetalLengthCm    PetalWidthCm    Species
1     5.1              3.5             1.4              0.2             Iris-setosa
2     4.9              3.0             1.4              0.2             Iris-setosa
3     4.7              3.2             1.3              0.2             Iris-setosa
4     4.6              3.1             1.5              0.2             Iris-setosa

results matching ""

    No results matching ""