Sort & Icetool

1. Display the number of input or output records

I have some millions of records in my input file and I want to know only the number of records present in that input file. Can DFSORT display just the number of input records in an output file?

Here's a simple DFSORT job that will write just the record count to SORTOUT as an 8-digit number:

//CTRCDS EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=_
//SORTIN DD DSN=... input file
//SORTOUT DD DSN=... output file
//SYSIN DD *
  OPTION COPY
              OUTFIL NODETAIL,REMOVECC,
  TRAILER1=(COUNT=(M11,LENGTH=8))
/*

NODETAIL ensures that the data records are not included in the SORTOUT file. REMOVECC ensures that a '1' for the ANSI eject character is not included in the SORTOUT record. TRAILER1 with COUNT writes one record with the count to the SORTOUT file. M11,LENGTH=8 formats the count as 8 digits with leading zeros. So if the input file had five thousand records, SORTOUT would have:
00005000
Other edit masks could be used to format the count differently, if appropriate. For example, M10,LENGTH=8 formats the count as 8 digits with blanks substituted for leading zeros. So if the input file had five thousand records, SORTOUT would have:
5000

Like that,
            …
//SYSIN            DD *
                        OPTION COPY
                        OUTFIL REMOVECC,  
                        TRAILER1=(‘XXXXXXX’,COUNT=(M11,LENGTH=8),DATENS=(DM4))
/*

whereas, ‘XXXXXXX’ can be any string which we need before displaying the count.
                            DATENS=(DM4) will print the system date in dd/mm/yyyy format.

2. Sample records

An SORT/ICETOOL job that can sample every 100th record using the SAMPLE=n parameter of OUTFIL.

//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=... input file
//SORTOUT DD DSN=... output file
//SYSIN DD *
OPTION COPY
OUTFIL STARTREC=100,SAMPLE=100
/*

In this case STARTREC=100 is used to write record 100 to SORTOUT, and SAMPLE=100 is used to write records 200, 300, and so on to SORTOUT.

3a. Use of STARTREC and ENDREC

STARTREC=n and ENDREC=m can be used to select a sequential range of records to be included in each output data set. STARTREC=n starts processing at the n’th record while ENDREC=m ends processing at the m’th record. Here's an example of STARTREC=n and ENDREC=m:

//RANGE EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=Y897797.INPUT2,DISP=OLD
//FRONT DD DSN=Y897797.RANGE1,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//MIDDLE DD DSN=Y897797.RANGE2,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//BACK DD DSN=Y897797.RANGE3,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//SYSIN DD *
OPTION COPY
OUTFIL FNAMES=FRONT,ENDREC=500
OUTFIL FNAMES=MIDDLE,STARTREC=501,ENDREC=2205
OUTFIL FNAMES=BACK,STARTREC=2206

Input records 1 through input record 500 are written to the FRONT data set. Input record 501 through input record 2205 are written to the MIDDLE data set. Input record 2206 through the last input record are written to the BACK data set.

3b. Use INCLUDE, OMIT and SAVE

INCLUDE/OMIT and SAVE can be used to select specific records to be included in each output data set. The INCLUDE and OMIT operands provide all of the capabilities of the INCLUDE and OMIT statements including substring search and bit logic.  SAVE can be used to select the records that are not selected for any other subset, eliminating the need to specify complex conditions.

//SUBSET EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*_
//SORTIN DD DSN=Y897797.INPUT3,DISP=OLD
//OUT1 DD DSN=Y897797.SUBSET1,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//OUT2 DD DSN=Y897797.SUBSET2,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//OUT3 DD DSN=Y897797.SUBSET3,DISP=(NEW,CATLG),
//                      SPACE=(CYL,(5,5)),UNIT=SYSDA
//SYSIN DD *
OPTION COPY
OUTFIL INCLUDE=(8,6,CH,EQ,C'ACCTNG'),FNAMES=OUT1
OUTFIL INCLUDE=(8,6,CH,EQ,C'DVPMNT'),FNAMES=OUT2
OUTFIL SAVE,FNAMES=OUT3

Records with ACCTNG in positions 8-13 are included in the OUT1 data set. Records with DVPMNT in positions 8-13 are included in the OUT2 data set. Records without ACCTNG or DVPMNT in positions 8-13 are written to the OUT3 data set.




3c. Use STOPAFT , ENDREC and SKIPREC

For example, to keep the first 10 records,
you can use either:
OPTION COPY,STOPAFT=10
(or)
OPTION COPY
OUTFIL ENDREC=10
Similarly, to skip first 10 records,
OPTION COPY,SKIPREC=10

3d. Keep the last n records

             See last page. 
(Before going to that, you should be aware of Seq. no., so, continue reading and at last go through that.) 

4. Change all zeros / strings in your records to spaces / strings

Here's how you could change all low values (X'00') to spaces (X'40'), in an FB data set with an LRECL of 60:
ALTSEQ CODE=(0040)
OUTREC FIELDS=(1,60,TRAN=ALTSEQ)

Here's how you could change all 'a' (X'81') and 'x' (X'A7') characters to 'b' (X'82') and 'y' (X'A8') characters, respectively, in a VB input data set with any LRECL:
ALTSEQ CODE=(81A7,82A8)
OUTREC FIELDS=(1,4,5,TRAN=ALTSEQ)

5. Using IFTHEN clauses

I have to replace all leading spaces (X'40') in positions 1 to 6 with zeroes (X'F0'). The input file has
RECFM=FB and LRECL=80 and looks like this:

0001XX 20041210
    7 S5
 2044X
   X4407 20041110
X8768978
        25 0003

Non-leading blanks, as for example the fourth character in the 7 S5 record, should not be replaced by zeros.
So the output file should look like this:

0001XX 20041210
007 S5
02044X
00X4407 20041110
X8768978
000025 0003

In this case, we check for 6-1 leading blanks and replace them with 6-1 leading zeros, respectively, as follows:
//SYSIN DD *
            OPTION COPY
            INREC IFTHEN=(WHEN=(1,6,CH,EQ,C’ ’),OVERLAY=(6C‘0’)),
            IFTHEN=(WHEN=(1,5,CH,EQ,C‘ ’),OVERLAY=(5C‘0’)),
            IFTHEN=(WHEN=(1,4,CH,EQ,C‘ ’),OVERLAY=(4C‘0’)),
            IFTHEN=(WHEN=(1,3,CH,EQ,C‘ ’),OVERLAY=(3C‘0’)),
            IFTHEN=(WHEN=(1,2,CH,EQ,C‘ ’),OVERLAY=(2C‘0’)),
            IFTHEN=(WHEN=(1,1,CH,EQ,C‘ ’),OVERLAY=(C‘0’))
/*

6. DISPLAY Operator

                            To print the values or characters of specified numeric or character fields in a separate list data set. Simple, tailored, or sectioned reports can be produced.
e.g.
 DISPLAY FROM(CODASD) LIST(OUT) ON(28,6,PD) ON(18,4,ZD) ON(1,15,CH)
                       
   (28,6,PD)                            (18,4,ZD)                               (1,15,CH)
                        +000000000005200    +000000000000020   Aspen
                        +000000000007351     +000000000000032   Boulder

7. RANGE Operator

            To filter out records which are having the values between 2 extremes?
e.g. 1 - for picking records which are having values X; and X is  -1500 < X < +8000.

RANGE FROM(CADASD) ON(28,6,PD) HIGHER(-1500) LOWER(+8000)
                        ──┬─── ───┬─── ──┬── ──┬──
                             │                   │              │             └── Upper limit for range
                             │                   │              └──────── Lower limit for range
                             │                   └────── ────────location    
                             └────────────────────── dd name of input data set

e.g. 2 - RANGE FROM(ALL) ON(18,4,ZD) LOWER(32) - for giving records which are having values less than 32 at 18th position.

8a. SELECT Operator
  • You can keep only the first record for each value (FIRST operand), only the last record for each value (LAST operand),
  • or only those records with values that occur:
More than once, that is, duplicate values (ALLDUPS operand)
  • Only once, that is, non-duplicate values (NODUPS operand)
  • A specified number of times (EQUAL operand)
  • More than a specified number of times (HIGHER operand)
  • Less than a specified number of times (LOWER operand)

e.g. SELECT FROM(BKIN) TO(BKOUT) ON(106,4,CH) HIGHER(4)
            BKIN is the ddname for the sample input data set.  BKOUT is the ddname of the output data set that will contain the records for each respective field value that occurs more than 4 times.

8b. DISCARD Parameter:
To save the records which are not satisfy as per our condition?
e.g. SELECT FROM(IN1) TO(OUT1) ON(1,60,CH) ALLDUPS DISCARD(OUT2)
and here, whichever records are unique will be pushed to OUT2.

8c. SPLICE Operator

SPLICE is a new ICETOOL operator that helps you to perform various file "join" and "match" operations. SPLICE allows you to create output records in a variety of ways by splicing together fields from records that have the same key, but different information.

e.g. SPLICE FROM(T1) TO(OUT12) ON(1,1,CH) WITH(13,1) USING(CTL3) KEEPNODUPS

SPLICE overlays the WITH fields from the overlay record to the base record using matching ON fields, it's
usually necessary to do some initial setup before using SPLICE, to ensure that:
- the ON fields are in the same positions in the base and overlay records
- the WITH fields in the overlay records are in the positions they will occupy in the base records
- the base records and overlay records are the same length.

(Note: For more detail, refer to document SORT New Enhancements at \\mbpnas\DATA\ACC_T444_DATA\Mainframe\Programming Languages\Mainframes\JCL)


9. Set RC=12 or RC=4 if file is empty, has more than n records, etc
                        I need to check if a data set is empty or not. If it's empty I want to skip all other steps. Is there any way to check for empty data sets?
                                I would like to skip certain steps in my job if a file is empty. This would be easiest if a utility would generate a non-zero RC if the input file is empty.
                                I have several datasets that I need to sort together. How can I terminate if the total count of records in these data sets is greater than 5000.
                                I have a file that always has a header and trailer record and may or may not have data records. Is there anyway to check if there are no data records in the file?

//STEP1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=...
//TOOLIN DD *
 * SET RC=12 IF THE 'IN' DATA SET IS EMPTY, OR
 * SET RC= IF THE 'IN' DATA SET IS NOT EMPTY
COUNT FROM(IN) EMPTY
/*
….
….
….
// IF STEP1.RC =  THEN
//***  STEP2 WILL RUN IF 'IN' IS NOT EMPTY
//***  STEP2 WILL NOT RUN IF 'IN' IS EMPTY
//STEP2 EXEC ...
...
// ENDIF

10. Join fields from two files on a key

File A has the following records:
000 $$$$$
111 AAAAA
222 CCCCC
333 EEEEE
444 GGGGG

and File B has the following records:
111 BBBBB
333 DDDDD
444 FFFFF
555 HHHHH

I want to join the data fields for pairs of records with the same key to get the following output:

111 AAAAA BBBBB
333 EEEEE DDDDD
444 GGGGG FFFFF

Below is an ICETOOL job that can do it. The trick is to reformat the fields of the IN1 and IN2 files so you can join them with the SPLICE operator. SPLICE helps you perform various file join and match operations.

//DFSORT EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN1 DD DSN=... file 1
//IN2 DD DSN=... file 2
//TMP1 DD DSN=&&TEMP1,DISP=(MOD,PASS),SPACE=(TRK,(5,5)),UNIT=SYSDA
//OUT DD DSN=... output file
//TOOLIN DD *
* For this example, the fields (p,m) are as follows:
* IN1: sort key - 1,3
* f1fld - 5,5
* IN2: sort key - 1,3
* f2fld - 5,5
*
* Reformat the IN1 data set so it can be spliced
COPY FROM(IN1) TO(TMP1) USING(CPY1)
* Reformat the IN2 data set so it can be spliced
COPY FROM(IN2) TO(TMP1) USING(CPY2)
* Splice records with matching IN1/IN2 keys
SPLICE FROM(TMP1) TO(OUT) ON(1,3,CH) WITH(11,5)
/*
//CPY1CNTL DD _
* Use OUTREC to create |key |f1fld |blank|
OUTREC FIELDS=(1:1,3,5:5,5,11:5X)
/*
//CPY2CNTL DD _
* Use OUTREC to create:|key |blank |f2fld|
OUTREC FIELDS=(1:1,3,11:5,5)
/*

11. Join fields from two files record-by-record

I want to merge two files laterally record by record. For example:
If FILE1 contains:
AAAAA
BBBBB
CCCCC

and FILE2 contains:
11111
22222
33333

then my output file should contain:
AAAAA11111
BBBBB22222
CCCCC33333

Normally, we're asked how to join files on a key, but in this case there is no key; we just want to join the files record-by-record. Well, no problem, we'll just create a key we can use by adding a sequence number (1, 2, ...) to the records in file1 and a sequence number (1, 2, to the records in file2. Then we can join the fields from the 1 records, from the 2 records, and so on using the same technique we used for joining files on a key.

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN1 DD DSN=... file1
//IN2 DD DSN=... file2
//TMP1 DD DSN=&&TEMP1,DISP=(MOD,PASS),SPACE=(TRK,(5,5)),UNIT=SYSDA
//OUT DD DSN=... output file
//TOOLIN DD *
* Reformat the IN1 data set so it can be spliced
COPY FROM(IN1) TO(TMP1) USING(CTL1)
* Reformat the IN2 data set so it can be spliced
COPY FROM(IN2) TO(TMP1) USING(CTL2)
* Splice records with matching sequence numbers.
SPLICE FROM(TMP1) TO(OUT) ON(11,8,PD) WITH(6,5) USING(CTL3)
/*
//CTL1CNTL DD _
* Use OUTREC to create: |f1fld|blank|seqnum|
OUTREC FIELDS=(1:1,5,11:SEQNUM,8,PD)
/*
//CTL2CNTL DD _
* Use OUTREC to create: |blank|f2fld|seqnum|
OUTREC FIELDS=(6:1,5,11:SEQNUM,8,PD)
/*
//CTL3CNTL DD _
* Use OUTFIL OUTREC to remove the sequence number
OUTFIL FNAMES=OUT,OUTREC=(1,10)
/*

3d. Keep the last n records

But keeping the last n records of an input data set is a bit more challenging since
DFSORT doesn't have any built-in functions to do that.
The trick is to attach a sequence number to the records and sort them in descending order by the sequence number.


I also recommend reading following documents present at \\mbpnas\DATA\ACC_T444_DATA\Mainframe\Programming Languages\Mainframes\JCL

  1. IBM Storage  Storage Software  DFSORT-MVS  Smart DFSORT Tricks.html
  2. Sort and Icetool
  3. SORT New Enhancements and
  4. Dfsort-application programming guide
  5. DFSORT Tricks

~~~~~ The End ~~~~

Additions…

12. Using SORT we can even do arithmetic operations also. 
For eg. OUTREC FIELDS=(1,4,5:6,6,11:1X,(12,11,ZD,MUL,-1),ZD,LENGTH=11,58X) will multiply number 11 digit number starting from 12 by -1.



Comments

Popular posts from this blog

Date/time constants in JCL

JCL ABeND --

IEBGENER usage examples