Load data
Here are the APIs for loading data from various sources. The data is loaded into DTable data structure that's similar to DataFrame in pandas and R
import dtables
dt = dtables.load_csv(path_or_input_stream, has_headers=True, encoding='utf8')
dt = dtables.load_xml(path_or_input_stream)
dt = dtables.load_json(path_or_input_stream)
You can also create a DTable from a dictionary:
import dtables
dt = dtables.load_dict({
'col1': [1, 2, 3, 4],
'col2': ['a', 'b', 'c', 'd']
})
Additionally, dtables comes with builtin datasets.
>> from dtables import datasets
You can view the list of built-in datasets:
>> datasets.list()
iris - the classic iris dataset
tags: common, clean, machine-learning, flowers, popular
size: 6 variables, 1250 observations
flight-delays - flight delays data in USA for 2016
tags: flights, large, usa, 2016
size: 23 variables, 152300 observations
...
You can search for datasets based on various criteria:
>> datasets.search(purpose=['cleaning', 'reshaping'], has_column_types=['timestamp', 'categorical'])
You can also view more details of a given dataset:
>> datasets.details('iris')
iris - the classic iris dataset
tags: common, clean, machine-learning, flowers, popular
size: 6 variables, 1250 observations
license: Public Domain
source: https://archive.ics.uci.edu/ml/datasets/iris
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
... 148 more entries ...
<DTable.0x23423a>
And, of course, you can load a builtin dataset into DTable:
>> iris_dt = datasets.load('iris')