Generally, when we import data from external sources such as Excel/CSV files, it loads additional rows that are totally blank. Sometimes empty values in the database also affect the desired output so it’s necessary to check missing cases and perform operations accordingly
Example:
Input: The sample dataset looks like below have four variables – 1 character and 3 numeric. It would be used further in the example to demonstrate how to remove empty rows.
Name | Phys | Chem | Maths |
---|---|---|---|
Shubhash | 70 | 68 | 66 |
– | |||
Samar | 55 | 85 | |
Ashutosh | 54 | 78 | 89 |
– | |||
Varun | 50 | 96 | 85 |
Pratiksha | 68 | 93 |
Create a SAS dataset
The below defined code is a sample dataset to perform delete empty operation.
data outdata; LENGTH name $12.;
input name $ phys chem maths ;
infile datalines missover; datalines;Shubhash 70 68 66 samar 55 . 85
ashutosh 54 78 89
varun 50 96 85
pratiksha . 68 93
;run; |
Output:
-
Method I: Removes complete row where all variables having blank/missing values
OPTIONS missing =
' '
;
data readin;
SET
outdata;
IF missing(cats(
of
_all_))
THEN
DELETE
;
run;
Note:
- The MISSING= system option is used to display the missing values as a single space rather than as the default period (.) options missing = ‘ ‘;
- The CATS function concatenates the values. It also removes leading and trailing blanks. cats(of _all_) – Concatenate all the variables
- missing(cats(of _all_)) – Identifies all the rows in which missing values exist in all the variables.
Output:
-
Method II: Removes only that rows where any of the variable has missing/blank values
data readin;
SET
outdata;
IF cmiss(
of
_character_)
OR
nmiss(
of
_numeric_) > 0
THEN
DELETE
;
run;
In this case, we are using the OR operator to check if any of the variables have missing values. It returns 4 observations. Check out the output below –