The LOOKUP function allows the user to combine a selected dataset as a lookup table to the previously selected Base Data Set. To select a dataset to as a lookup table, simply click the “click to select” button” in the upper portion of the “Add a Lookup dataset of” pane. This will open a Data Library window for you to choose a data set to use as a lookup table.
In order for two datasets to be integrated as a single dataset, the files must share some amount of data in common. The connections menu is used to define the in-common column(s) that two datasets share. The application will provide a suggestion for this combining operation based on the number and kind of matches that occur within the columns available in a dataset pair.
The "Current" dataset is whichever dataset was selected as the Base Data Set. Columns from the next dataset are considered to be the Lookup Data Set. The Base Data Set will appear on the left of the combining interface; the Lookup Data Set will appear on the right. As you choose your options for the LOOKUP function, Paxata will display a preview of how the data will appear in the lower portion of the screen.
Join Score Calculation When attaching a lookup dataset to your base dataset, the join score calculation considers "selectivity" (the extent to which each column in the lookup connection is populated with unique values) in addition to "overlap" (the percentage of rows that match in the lookup connection.) The higher the score, the better the connection.
Percentages less than 100% do not indicate a problem with the proposed combine operation. In fact, there are numerous legitimate use cases where the percentage of overlap may be less than 1%! Rather than being an absolute indicator of “right” or “wrong,” the percentage value should serve as a “sanity check” that should match some level of expectation the user has regarding the amount of data the two datasets have in common.
If you are unhappy with the default column selection for the LOOKUP function, you can choose other possible LOOKUP connections found by Paxata in the list below the default. If no connections were found, or none of the found connections are what you are looking for in the LOOKUP function, you can manually select columns to use by clicking in each field.
Options The "Options" menu allows you to specify what type of combine operation and matching options you would like to conduct between the Base Data Set and the Lookup Data Set. The combine options are: Outer Lookup, Inner Lookup, Left Lookup, and Right Lookup. One of the easiest ways to understand these Lookup Types are by using a Venn diagram:
An Outer Lookup will return a combined dataset containing all rows with matching values in the specified join columns, as well as all rows from both datasets that contained no matching values.
An Inner Lookup will return a combined dataset containing only those rows which contain matching values within the specified join columns.
A Left Outer Join will return a combined dataset containing all rows with matching values in the specified join columns, as well as all rows from the left (Base Data Set) table that had no corresponding matching value in the right (Lookup Data Set) table.
A Right Outer Join is similar to a Left Outer Join in that it will return a combined dataset containing all rows with matching values in the specified join columns; however, rather than return the rows with non-matching values from the left (Base Data Set) table, it will return all non-matching rows from the right (Lookup Data Set) table instead.
The color blocks in this column correspond to the ones used by the application to distinguish the datasets elsewhere. For each row, the appearance or absence of a color block provides a quick visual indicator regarding whether or not that particular dataset had values to contribute to the row.