Load data

Here are the APIs for loading data from various sources. The data is loaded into DTable data structure that's similar to DataFrame in pandas and R

import dtables

dt = dtables.load_csv(path_or_input_stream, has_headers=True, encoding='utf8')
dt = dtables.load_xml(path_or_input_stream)
dt = dtables.load_json(path_or_input_stream)

You can also create a DTable from a dictionary:

import dtables

dt = dtables.load_dict({
    'col1': [1, 2, 3, 4],
    'col2': ['a', 'b', 'c', 'd']
})

Additionally, dtables comes with builtin datasets.

>> from dtables import datasets

You can view the list of built-in datasets:

>> datasets.list()
 iris - the classic iris dataset
    tags: common, clean, machine-learning, flowers, popular
    size: 6 variables, 1250 observations

 flight-delays - flight delays data in USA for 2016
    tags: flights, large, usa, 2016
    size: 23 variables, 152300 observations
 ...

You can search for datasets based on various criteria:

>> datasets.search(purpose=['cleaning', 'reshaping'], has_column_types=['timestamp', 'categorical'])

You can also view more details of a given dataset:

>> datasets.details('iris')
 iris - the classic iris dataset

 tags: common, clean, machine-learning, flowers, popular
 size: 6 variables, 1250 observations
 license: Public Domain 
 source: https://archive.ics.uci.edu/ml/datasets/iris

 Id   SepalLengthCm   SepalWidthCm    PetalLengthCm   PetalWidthCm    Species
 1    5.1             3.5             1.4             0.2             Iris-setosa
 2    4.9             3.0             1.4             0.2             Iris-setosa
 3    4.7             3.2             1.3             0.2             Iris-setosa
 4    4.6             3.1             1.5             0.2             Iris-setosa
 ... 148 more entries ...
 <DTable.0x23423a>

And, of course, you can load a builtin dataset into DTable:

>> iris_dt = datasets.load('iris')

results matching ""

    No results matching ""