1

Thank you very much for your assistance.

However, my main objective is not simply eliminating the duplicate records. On the contrary, what I am trying to do is to draw all possible pairs of records among the 24,000 records and then conduct any checking in respect of selected fields so as to determine any duplicate fields between two different records.

To be specific, perhaps I may use an example for illustration. As I said before, there are totally 24,000 records collected from a surveys with totall 10 fields. Among the 10 fields, one of them is a unique reference key for identification of each person (record). In order to determine any cheating that some interviewers may duplicate the answers for the remaining 9 fields from one record to other records, I intend to make use of SAS to draw all possible pairs from the 24,000 records and check whether there is any occurrence of cheating. Hence, I wonder whether there are any SAS codes which allow me to draw pairs of samples iteratively.

flag

2 Answers

1

I'm not sure that this is what you're asking, but if you want to see every record combined with every other record, a cartesian join in SQL will do the trick:

proc sql;
    create table big_table as
    select a*,b*
    from a, b
    where a.unique_key <> b.unique_key
    ;
quit;

This will give you every record combimed with every other record.

With 24000 records though, this is 575,976,000 records though, which could take you some time to process.

There are better ways of acheiveing this with SAS.

link|flag
0

I think you are looking for something like this:

data x;
  input a b c unique_id;
  datalines;
1 2 2 101
4 5 1 102
4 5 2 103
1 2 3 104
2 3 4 105
5 6 4 106
1 2 3 107
1 3 2 108
run;

proc sort data=x;
  by a b c;
run;

data y;
  set x;
  by a b c;
  if not (first.c and last.c) then do;
    put "OBS" _n_ "IS NOT UNIQUE     " _all_;
  end;
  else do;
    put "OBS" _n_ "IS UNIQUE         " _all_;
  end;
run;
link|flag

Your Answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.