This example demonstrates how QGEN might be used to clean the data from a typical marketing research product awareness, trial and usage survey.
A SAMPLE QUESTIONNAIRE ============================================================================= 1A 1B 1C 1D 1E 1F 1G 1H ----- ----- ----- ----- ----- ----- ----- ----- OTHER BRANDS FIRST UN- BOUGHT PRI- BRAND MEN- AIDED AIDED EVER PAST 6 REGU- MARY BOUGHT TION AWARE AWARE BOUGHT MONTHS LARLY BRAND LAST ----- ----- ----- ------ ------ ------ ------ ------ A. Brand A ........ 1 (20) 1 (22) 1 (24) 1 (26) 1 (28) 1 (30) 1 (32) 1 (34) B. Brand B ........ 2 2 2 2 2 2 2 2 C. Brand C ........ 3 3 3 3 3 3 3 3 D. Brand D ........ 4 4 4 4 4 4 4 4 E. Brand E ........ 5 5 5 5 5 5 5 5 F. Brand F ........ 6 6 6 6 6 6 6 6 G. Brand G ........ 7 7 7 7 7 7 7 7 H. Brand H ........ 8 8 8 8 8 8 8 8 I. Brand I ........ 9 9 9 9 9 9 9 9 J. Brand J ........ 0 0 0 0 0 0 0 0 K. Brand K ........ 1 (21) 1 (23) 1 (25) 1 (27) 1 (29) 1 (31) 1 (33) 1 (35) L. Brand L ........ 2 2 2 2 2 2 2 2 M. Brand M ........ 3 3 3 3 3 3 3 3 N. Brand N ........ 4 4 4 4 4 4 4 4 O. Brand O ........ 5 5 5 5 5 5 5 5 P. Brand P ........ 6 6 6 6 6 6 6 6 Q. Brand Q ........ 7 7 7 7 7 7 7 7 R. Brand R ........ 8 8 8 8 8 8 8 8 S. Brand S ........ 9 9 9 9 9 9 9 9 Others__________ X X X X X X X X (List)__________ ============================================================================= 1a) When you think of products that (...) what one brand first comes to mind? (INTERVIEWER: CIRCLE ONLY ONE IN COLUMN 1A.) 1b) What other brands can you think of? (PROBE:) What others come to mind? (CIRCLE AS MANY AS APPLY IN COLUMN 1B.) 1c) (HAND RESPONDENT CARD) Looking at this card, please read me the letters next to each brand you have ever heard of, including any you may have already mentioned. (CIRCLE ALL THAT APPLY IN COLUMN 1C.) 1d) This time, look down the list and read me the letters next to each of the brand you have ever bought. (CIRCLE ALL THAT APPLY IN COLUMN 1D.) 1e) Which of these brands have you purchased in the last 6 months? (CIRCLE ALL THAT APPLY IN COLUMN 1E.) 1f) Which brands do you buy on a regular basis? (RECORD IN COLUMN 1F.) 1g) (IF MORE THAN ONE BRAND MENTIONED IN 1F) Of the brands you buy on a regular basis, which one would you consider to be your primary brand? (CIRCLE ONE BRAND IN COLUMN 1G.) 1h) What one brand did you buy last? (RECORD ONE BRAND IN COLUMN 1H.) =============================================================================
Questions 1B through 1F allow for multiple answers to be circled in each column, so the data will be entered in column-binary format, a common method of representing data in marketing research surveys because it allows one to store multiple responses in a single data column.
Here are some items you might want to check the data for:
In addition to generating an error listing, you might want to force edit in certain responses that were not explicitly asked in the questionnaire itself, for example:
Finally, you might wish to generate some summary codes in the data to make it it easier to tabulate or to get counts that cannot be read directly from marginal counts, such as nets.
Here is a sample of QGEN specifications written to apply the cleaning rules and recodes listed above, extensively annotated to explain exactly what actions are being performed. The actual instructions read by QGEN are shown here in boldface--everything else is a comment in the spec file.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Data directives: Input and output have 80 columns = 160 bytes per case. * Listing file will list specs, ID numbers of all cases * breaking cleaning rules, run summaries and statistics. * Report file prints ID list created in item # 8. * INTAP olddata.dat ( 160 OUTAP newdata.dat ( 160 LISTING cleanrun.txt REPORT userlist.txt * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Address mode is column-binary with card/column addressing. * SETUP ADDRMODE B * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Run options: keywords indicating the location and length of the ID, * requesting sequence checking, and calling for a clear of * cards 1 through 12 between each record. * RUNOPS ID 101B4 SEQ A CLEAR 101,280 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 1: loop through all 8 2-column questions, checking that the punches * X and Y do not appear in the first column of each pair, and the punches * 0 and Y do not appear in the second column of each pair. * 000 1-8 LOOP 002 002 ?BAD-PUNCH AND 120'NXY' 121'N0Y' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 2: check that question 1A has no more than one answer * ?1A-MULTI SPB 120'1-0' 121'1-9X' * * Item 3: Loop using punch increments to compare questions 1A and 1B on a * punch by punch basis, starting with brand A, and ending with brand S. * check that, if a brand is coded in 1A, it is not coded in 1B. * (Note that the times-to-loop assumes the X and Y punch positions.) * 000 1-21 LOOP PI1 PI1 ?1A-MATCH-1B CAND 120'1' 122'N1' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 4: Loop using punch increments to compare questions 1D through 1H * (ORed together) against question 1C on a punch by punch basis, for all * brands (including others). Check that all brands appearing in 1D - 1H * also appear in 1C. * 000 1-23 LOOP PI1 PI1 ?1C-NOT-CODED CAND 126'1'!128'1'!130'1'!132'1'!134'1' 124'1' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 5: Tally (count) the number of brands mentioned in question 1F into * counter Z1; if the number is less than 2, copy question 1F into 1G. * Z1 TAL 130'1-0' 131'1-9X' 132 CMOV Z1<'2' 130B2 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 6: For each of questions 1E through 1H, NET the results into 1D. * 000 1-4 LOOP 000 002 126 NET 126B2 128B2 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 7: code a Y punch in the second column of each question if brands * D, I, N, P or S were mentioned in that question. * 002 1-8 LOOP 002 121'Y' IF 120'49'!121'469' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Item 8: if more than one of the client's brands (D, I, N, P or S) are * coded in question 1F, call a subroutine to clear a character-mode area * convert the ID number to character-mode, list the client's brands used * in columns 130 and 131 as characters, and print the character string. * (Note: column-binary columns use 2 bytes, ASCII characters use 1 byte). * LNKLC MP 130'49' 131'469' (call the subroutine) SKPXX ALL (skip past subroutine) * _LC_ NOP (subroutine begins here) 201C132* ALL (clear area to ASCII) 201 BTA 101B4 (ID to ASCII) 204 MOV 'Uses brands:' (load text constant) 211 CMOV 130'4' 'D ' 212 CMOV 130'9' 'I ' 213 CMOV 131'4' 'N ' 214 CMOV 131'6' 'P ' 215 CMOV 131'9' 'S ' PRT201 ALL (print line) RET ALL (return to program) * _XX_ NOP * END
The first seven items in this spec file illustrate the kind of routine cleaning and recode operations that will be familiar to anyone experienced at cleaning commercial survey questionnaire data.
The eighth item illustrates use of callable subroutines in QGEN and shows how it caan be used as a report generator.