Append / Join
There are two ways to merge multiple datasets:
- Append - This is vertical concatenation.
- Join - This is the classic SQL style join.
Append
Let's look at .append() first. Here is how it works in simplest form:
>> dt1
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90 | 30 |
| 2017-09-02 | 95 | 30 |
| 2017-09-03 | 80 | 35 |
+------------+----------+----------+
>>
>> dt2
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-04 | 90 | 30 |
| 2017-09-05 | 95 | 30 |
| 2017-09-06 | 80 | 35 |
+------------+----------+----------+
>>
>> dt1.append(dt2)
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90 | 30 |
| 2017-09-02 | 95 | 30 |
| 2017-09-03 | 80 | 35 |
| 2017-09-04 | 90 | 30 |
| 2017-09-05 | 95 | 30 |
| 2017-09-06 | 80 | 35 |
+------------+----------+----------+
You can also provide a list of tuples to append. Each tuple will become a new row. Example:
>> dt1
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90 | 30 |
| 2017-09-02 | 95 | 30 |
| 2017-09-03 | 80 | 35 |
+------------+----------+----------+
>>
>> records = [
('2017-09-04', 90, 30),
('2017-09-05', 95, 30),
('2017-09-06', 80, 35)
]
>>
>> dt1.append(records)
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90 | 30 |
| 2017-09-02 | 95 | 30 |
| 2017-09-03 | 80 | 35 |
| 2017-09-04 | 90 | 30 |
| 2017-09-05 | 95 | 30 |
| 2017-09-06 | 80 | 35 |
+------------+----------+----------+
If there is a column mismatch, the gaps will be filled with NaN values. Example:
>> dt1
+------------+----------+----------+
| date | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90 | 30 |
| 2017-09-02 | 95 | 30 |
| 2017-09-03 | 80 | 35 |
+------------+----------+----------+
>>
>> dt2
+------------+----------+----------+
| date | max-temp | humidity |
+------------+----------+----------+
| 2017-09-04 | 90 | 90 |
| 2017-09-05 | 95 | 78 |
| 2017-09-06 | 80 | 40 |
+------------+----------+----------+
>>
>> dt1.append(dt2)
+------------+----------+----------+----------+
| date | max-temp | min-temp | humidity |
+------------+----------+----------+----------+
| 2017-09-01 | 90 | 30 | NaN |
| 2017-09-02 | 95 | 30 | NaN |
| 2017-09-03 | 80 | 35 | NaN |
| 2017-09-04 | 90 | NaN | 90 |
| 2017-09-05 | 95 | NaN | 78 |
| 2017-09-06 | 80 | NaN | 40 |
+------------+----------+----------+----------+
Note: if you are appending a list of tuples, it should have exactly same number of columns as the DTable.
Join
Join is a classic SQL style join operation with four flavors (inner, outer, left, right). Here is the signature for join operation:
dt_left.join(dt_right, flavor='inner', columns_left=[], columns_right=[])