Append / Join

There are two ways to merge multiple datasets:

  • Append - This is vertical concatenation.
  • Join - This is the classic SQL style join.

Append

Let's look at .append() first. Here is how it works in simplest form:

>> dt1
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90       | 30       |
| 2017-09-02 | 95       | 30       |
| 2017-09-03 | 80       | 35       |
+------------+----------+----------+
>>
>> dt2
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-04 | 90       | 30       |
| 2017-09-05 | 95       | 30       |
| 2017-09-06 | 80       | 35       |
+------------+----------+----------+
>>
>> dt1.append(dt2)
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90       | 30       |
| 2017-09-02 | 95       | 30       |
| 2017-09-03 | 80       | 35       |
| 2017-09-04 | 90       | 30       |
| 2017-09-05 | 95       | 30       |
| 2017-09-06 | 80       | 35       |
+------------+----------+----------+

You can also provide a list of tuples to append. Each tuple will become a new row. Example:

>> dt1
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90       | 30       |
| 2017-09-02 | 95       | 30       |
| 2017-09-03 | 80       | 35       |
+------------+----------+----------+
>>
>> records = [ 
    ('2017-09-04', 90, 30),
    ('2017-09-05', 95, 30),
    ('2017-09-06', 80, 35)
]
>>
>> dt1.append(records)
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90       | 30       |
| 2017-09-02 | 95       | 30       |
| 2017-09-03 | 80       | 35       |
| 2017-09-04 | 90       | 30       |
| 2017-09-05 | 95       | 30       |
| 2017-09-06 | 80       | 35       |
+------------+----------+----------+

If there is a column mismatch, the gaps will be filled with NaN values. Example:

>> dt1
+------------+----------+----------+
| date       | max-temp | min-temp |
+------------+----------+----------+
| 2017-09-01 | 90       | 30       |
| 2017-09-02 | 95       | 30       |
| 2017-09-03 | 80       | 35       |
+------------+----------+----------+
>>
>> dt2
+------------+----------+----------+
| date       | max-temp | humidity |
+------------+----------+----------+
| 2017-09-04 | 90       | 90       |
| 2017-09-05 | 95       | 78       |
| 2017-09-06 | 80       | 40       |
+------------+----------+----------+
>>
>> dt1.append(dt2)
+------------+----------+----------+----------+
| date       | max-temp | min-temp | humidity |
+------------+----------+----------+----------+
| 2017-09-01 | 90       | 30       | NaN      |
| 2017-09-02 | 95       | 30       | NaN      |
| 2017-09-03 | 80       | 35       | NaN      |
| 2017-09-04 | 90       | NaN      | 90       |
| 2017-09-05 | 95       | NaN      | 78       |
| 2017-09-06 | 80       | NaN      | 40       |
+------------+----------+----------+----------+

Note: if you are appending a list of tuples, it should have exactly same number of columns as the DTable.

Join

Join is a classic SQL style join operation with four flavors (inner, outer, left, right). Here is the signature for join operation:

dt_left.join(dt_right, flavor='inner', columns_left=[], columns_right=[])

results matching ""

    No results matching ""