6.1.3 Indexed Files - Vision

On VMS machines, indexed files are implemented using RMS indexed files. On MS-DOS, UNIX, and MPE/iX machines, the Vision indexed file system is used by default. The Vision file system is described here.

Vision files may be created with either single or dual file format. With Vision Version 3, files are generated in a single file that contains both the data records and the overhead key information. A separate linked list is maintained which is used to rebuild corrupted files. A linked list of deleted records is also maintained so that new records first occupy old deleted records before expanding the file.

In Vision Version 4, each file occupies at least two disk files--one file, or segment, contains the data records and another segment contains the key information. When any segment approaches the configurable file size limit (up to nearly 2 GB), Vision creates a new data or index file, in which it stores the excess information. Separating data files from index files also provides a high degree of reliability when files must be rebuilt. Permanent data loss due to memory-related problems is less likely to happen to two files as opposed to one file.

Segment Naming for Vision Version 4 Files

File segments generated with Vision Version 4 can be given customized extensions.

The naming rules described here do not apply to the first data segment. Set the name of the first segment as you would for any file. This name is referred to as the "regular" name of the file in this section.

Tip: Turning on file tracing level 3 or higher ("tf 3") in the debugger will display some diagnostics that show which names are being used for the various segments of a Vision Version 4 file. Also, "vutil -info" will display the names of all of the segments of a Vision Version 4 file.

Vision Version 4 employs two different methods to determine the file names of additional index and data segments (after the first segment).

- The first naming method allows the user to specify a "format" from which Vision Version 4 will generate the file names of additional segments.

- The second (default) method generates the file names of additional segments by applying a simple translation to the regular name of the file. Both methods allow specific generated file names to be overridden.

The first method takes precedence over the second method. That is, if circumstances allow Vision Version 4 to use the first method, it will be used. If not, the second (default) method will be used.

In the following methods, environment variables are used to indicate how Vision Version 4 is to determine the file names of additional segments. You form the names of these environment variables by taking the base name of the file, converting all the alphabetic characters to upper case, leaving numeric characters alone, and converting all other characters to underscores ("_"). Thus, gl42.dat becomes GL42_DAT. This name is typically suffixed with another string, depending upon which method you are using, as described below.

Note that the runtime looks for these variables in the runtime configuration file. Utilities such as vio and vutil look for them in the operating system's environment.

Method One: The Format Method

This method allows the user to specify a format that Vision Version 4 will use to determine the file names of additional segments. Two formats must be specified: a format for data file extensions and a format for index file extensions. The resulting variables have this general look: NAME_EXT_DATA_FMT and NAME_EXT_INDEX_FMT. Each of these variables must be equated with a format code that includes an escape sequence. The valid escape sequences are defined below the next example.

Example for Method One

Suppose that the regular name of your COBOL file is "/usr1/gl.dat". The variables you would use to set the segment naming formats for this file are GL_DAT_DATA_FMT and GL_DAT_INDEX_FMT.

Each of these variables must be set equal to a pattern that shows how to create the segment names. The pattern shows how to form the base name and how to form the extension for each segment. Part of this pattern is a special character (such as %d) that specifies how the segment number should be represented. Choices include %d (decimal segment numbers), %x (lower case hexadecimal numbers), %X (uppercase hexadecimal numbers), and %o (octal numbers).

For example, setting environment variables GL_DAT_DATA_FMT=gl%d.dat and GL_DAT_INDEX_FMT=gl%d.idx would result in data segments named /usr1/gl.dat (remember that the first data segment is not affected), /usr1/gl1.dat, /usr1/gl2.dat, and so forth. The index segments would be named /usr1/gl0.idx, /usr1/gl1.idx, /usr1/gl2.idx, and so forth.

Escape Sequence Definitions

The %d in the values of the NAME_EXT_DATA_FMT and NAME_EXT_INDEX_FMT variables above is a printf-style escape sequence. Most reference books on the C language contain an in-depth explanation of these escape sequences, and UNIX systems typically have a man page ("man printf") that explains them in detail. Here are the basics:

- "%d" expands into the decimal representation of the segment number.

- "%x" expands into the hexadecimal representation (with lower case a-f) of the segment number.

- "%X" expands into the hexadecimal representation (with upper case A-F) of the segment number.

- "%o" expands into the octal representation of the segment number.

- You can add leading zeros to the number (to keep all the file names the same length) by placing a zero and a length digit between the percent sign and the following character. "%02d" would result in "00", "01", "02", and so forth when expanded.

- To embed a literal "%" in the file name, use "%%".

The escape sequence can be positioned anywhere in the file name, including the extension.

Important Note:The runtime checks for these segment naming variables in the runtime configuration file. Utilities such as vutil and vio check in the operating system's environment.

Method Two: The Default Method

The default method uses the regular file name to determine the file names of additional segments. This method stores the segment number in the extension of the file name. If you use the extension of the file name to distinguish files that are otherwise named the same, you should not use this method.

This method takes the regular file name and removes the extension (if any) from that name. It then adds ".vix" to generate the name for the first index segment. Subsequent index segments are named with ".v01" through ".vff" (hexadecimal representation) for the first 255, and ".v0100" through ".vffff" for segments 256 through 65536. The data segments are named using a similar numbering scheme, but use "d" instead of "v" before the segment number. You should avoid using file name extensions that can be considered as "d" or "v" followed by a hexadecimal number. For example, the extension ".dae" is not safe, because that is the name of the 175th data segment. The common extension ".dat" is safe because "t" is not a hexadecimal digit.

Overriding individual segment names

You can override an individual generated segment file name by setting an environment variable named by the generated file name of the segment (converted as described above) to the full path of the desired file name. As an example, suppose the regular name of your file is /usr1/gl.dat, and you have GL_DAT_DATA_FMT=gl%d.dat set, but you want to place the second data segment on /usr2. Setting GL1_DAT=/usr2/gl1.dat will override the originally determined name. This feature works with both methods. Using the file names generated by the default method as environment variables (converted as described above) works, too.

Selecting the Vision Version

Use of the V-VERSION configuration variable sets the default format for new Vision files. New files may be created in Version 4,Version 3, or Version 2 format, depending on the value for this variable. The ?-VERSION variable allows you to set the file format on a file-by-file basis. See Book 4, Appendix H for more details on these configuration variables.

On many (but not all) systems, the runtime system allows an indexed file to be opened for input when the user does not have write-access to the file.

Vision files allow up to 120 keys for sorting. One key is the primary key, all others are alternates. In a record, a key may not occupy physical space that exceeds the minimum record length. Thus, a 10-byte key cannot occupy positions 20-29 in a record with a minimum record length of 28.

You can specify whether duplicates are allowed for each alternate key. If duplicates are allowed for a particular key, it is possible to write a record whose key fields contain exactly the same data as the key fields of an existing record. In this case, the records are stored in chronological order. Primary keys do not allow duplicates.

Every alternate key causes significant additional overhead. (Keys have the least amount of overhead when duplicates are allowed to occur.) The keys for a data file are automatically stored in a compressed form.

A key may be a contiguous part of the records, or it may be split into as many as 16 segments. Note that if you are compiling for compatibility with versions earlier than Vision Version 4, you can have no more than six segments. You must use Vision Version 4 if you want more than six segments.

Suppose you have the following record structure in a file called AJAX-SUPPLIES:


01  CUSTOMER-RECORD.

    03  CUSTOMER-NO            PIC 9(6).

    03  CUSTOMER-BALANCE       PIC S9(9)V99.

    03  CUSTOMER-NAME          PIC X(30).

    03  CUSTOMER-CONTACT       PIC X(30).

To use CUSTOMER-NAME as the primary key, you would use the syntax shown in the last line below:


FILE-CONTROL.

     SELECT AJAX-SUPPLIES

     ASSIGN TO DISK "INDEX.DAT"

     RECORD KEY IS CUSTOMER-NAME.

If data elements are contiguous and defined in the order that would be used for sorting, they may be grouped and defined together as a key. For example, suppose you wanted to use CUSTOMER-BALANCE, CUSTOMER-NAME as an alternate key. Because these two fields are contiguous and are defined in the same sequence they will be used for sorting, the most efficient way to define the alternate key is to establish a group item that includes both fields. For example:


01  CUSTOMER RECORD

    03  CUSTOMER-NO            PIC 9(6).

    03  CUSTOMER-BALNAME.

        05  CUSTOMER-BALANCE   PIC S9(9)V99.

        05  CUSTOMER-NAME      PIC X(30).

    03  CUSTOMER-CONTACT       PIC X(30).

Then, to define CUSTOMER-BALNAME as an alternate key, you would use the syntax shown in the last line below:


FILE-CONTROL.

     SELECT AJAX-SUPPLIES

     ASSIGN TO DISK "INDEX.DAT"

     RECORD KEY IS CUSTOMER-NAME

     ALTERNATE KEY IS CUSTOMER-BALNAME.

Suppose now that you want to define a sort sequence that uses fields that are not contiguous, or are defined in a different order from the sorting order. In this case, you could either:

move the fields around, or duplicate them, so that they are contiguous and are in the same sequence in which they will be used for sorting

define a split key

Split keys allow you to specify up to 16 segments of data elements as the components of a key. (Note that if you compile for compatibility with versions earlier than Vision Version 4, you can have no more than six segments. The 16-segment capability is available in Vision Version 4 only.) The data segments need not be contiguous and need not be listed in the order they appear within the record. The composite length of a split key cannot exceed 250 bytes, and no key can be defined beyond the minimum record length.

For example, to define an alternate key consisting of CUSTOMER-BALANCE, CUSTOMER-NAME, and CUSTOMER-NO, use the syntax shown in the last two lines below:


FILE-CONTROL.

    SELECT AJAX-SUPPLIES

    ASSIGN TO DISK "INDEX.DAT"

    RECORD KEY IS CUSTOMER-NAME

    ALTERNATE RECORD KEY IS CUSTOMER-BALNAME

    ALTERNATE RECORD KEY IS BAL2-KEY =

      CUSTOMER-BALANCE, CUSTOMER-NAME, CUSTOMER-NO.

In this example, BAL2-KEY is a user-defined word and is the name you would use in your READ and START statements. Note that BAL2-KEY is not defined in Working-Storage. This is the only definition of the key.

Vision files have a block size of either 512 or 1024 bytes, depending on the BLOCK CONTAINS clause. If the BLOCK CONTAINS clause is omitted, then files have 512 bytes per block. Indexed files may be assigned only to disk files.

Vision can optionally compress and/or encrypt records. Record compression uses a simple run-length compression algorithm. Encryption uses a byte transformation algorithm that is unique to every byte in the file. Encrypted files may not have records extracted by the Vision utility program vutil. Records are stored internally in the least amount of space required. Furthermore, they are packed together and span block boundaries, so no disk space is wasted.

Vision maintains a user count for each file. This count is normally zero. When a file is opened for update, then the user count is incremented; when the file is closed the user count is decremented. The user count is thus the number of currently updating processes for the file. If a program dies catastrophically, however, then the user count will not get decremented. (For a definition of "catastrophic," see section 6.6, "Exiting from ACUCOBOL-GT Programs.") This will result in the count never reaching zero. Thus, if this value is ever non-zero when there are no users active, it indicates a catastrophic program failure and suggests that corrective action may need to be taken. At the very least, the file should be checked for integrity, but depending on the program that died, perhaps more significant action should be taken. Basically a non-zero user count indicates that someone knowledgeable about the system should intervene and ensure that everything is okay. This can be used as an early warning system to head off some problems. Note that a non-zero user count is not a fatal error to Vision. It is used only as a indicator of potential problems.