As mentioned in the introduction, we can also use Merge
inside a data step
which will be discussed under a separate topic. Joins play a very important role to blend and unify data according to the requirement.
Vertical join appends dataset B to dataset A providing both of them have similar variables. For example, we have sales for the month of Jan'17 in dataset A and sales for Feb'17 in dataset B. To create a dataset C that has sales of both Jan and Feb we use Vertical Join.
PROC SQL;
CREATE TABLE C AS
SELECT *
FROM A
UNION
SELECT *
FROM B;
QUIT;
Now dataset C has observations from both A and B and is appended vertically.
Inner join creates a dataset that contains records that have matching values from both the tables. For example, we have a dataset A that contains customer information and a dataset B that contains credit card details. To get the credit card details of customers in dataset A, let us create dataset C
PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUM
FROM CUSTOMER A, CC_DETAILS B
WHERE A.CUSTOMERID=B.CUSTOMERID
QUIT;
Dataset C will have only matching observations from both the datasets.
Left join returns all the observations in the left data set regardless of their key values but only observations with matching key values from the right data set. Considering the same example as above,
PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A LEFT JOIN CC_DETAILS B
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;
Dataset C contains all the values from the left table, plus matched values from the right table or missing values in the case of no match.
Like left join, right join selects all the observations from the right dataset and the matched records from the left table.
PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A RIGHT JOIN CC_DETAILS B
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;
Dataset C contains all the values from the right table, plus matched values from the left table or missing values in the case of no match.
Full join selects all the observations from both data sets but there are missing values where the key value in each observation is found in one table only.
PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A FULL JOIN CC_DETAILS B
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;
Dataset C will contain all records from both the tables and fill in .
for missing matches on either side.
Type of join | Output |
---|---|
Proc Sql | SQL procedure inside SAS |
Create Table | Creates a SAS dataset |
Select | Selects required variables from respective datasets |
Where | Specifies particular condition |
Quit | End the procedure |