contents.gifindex.gifprev1.gifnext1.gif

6.1.6 Indexed File Considerations

This section describes the impact of ACUCOBOL-GT's special indexed file features. It covers how they work and when they should (and should not) be used.

File compression can be used on indexed files to save substantial amounts of disk storage. The Vision file system supports compression, but not all file systems do. You enable compression for a file by specifying WITH COMPRESSION in the ASSIGN clause of the file's SELECT. Compression must be specified when the file is initially created to have any effect. (Note, however, that "vutil -rebuild" will allow you to apply or remove compression during the file rebuilding process. See section 3.3.3.)

File compression uses a simple run-length compression scheme. This replaces "runs" of identical bytes with a shorter sequence. Files using compression may contain any type of data.

Obviously, some files will compress better than others. Generally speaking, files that contain text compress the best due to repeated space characters. Results can vary significantly, however. Experimentation is the best way to tell how much space will be saved.

Each compressed record usually retains some extra, unused space for future expansion. This is advisable especially if the records are frequently changed. You can specify via a compression factor how much of the space saved by compression should be retained to allow for future growth. When no compression factor is specified, WITH COMPRESSION uses the default compression factor (70). The following paragraphs explain how the factor is used.

A compression factor other than the default may be selected via the COMPRESSION CONTROL VALUE IS clause in the SELECT statement. The factor must be a numeric literal from 0 (zero, meaning no compression) to 100 (maximum compression). A compression factor of 1 is equivalent to the default compression.

For factors from 2 through 100, the factor is considered to be a percentage. It specifies how much of the space saved by compression is actually to be removed from the record. For example, suppose an 80-byte record is compressed to 30 bytes. Then the compression factor is used to determine how much of the 50 bytes of saved space is actually to be removed from the record. A compression factor of 70 would mean that 70% of the 50 bytes (35 bytes total) will be removed. This leaves 15 bytes for future expansion, and results in a compressed record size of 45 bytes (30 compressed size plus 15 extra for growth). The larger the compression factor, the more of the saved space is removed. A compression factor of 100 removes all saved space and is advisable only if the file is rarely updated.

An alternate way to specify compression is to use the option COMPRESS-FACTOR in your configuration file. You can modify the default compression amount by adding the following line to the file:

COMPRESS-FACTOR value

Value must be a numeric literal from 0 (zero) to 100. (As noted earlier, "1" corresponds to the default compression factor, which is 70.) Note that the compression factor for an individual file is established when the file is created. Subsequent changes to COMPRESS-FACTOR will not affect existing files.

The selection of the compression factor should be based on the amount of updating that the file undergoes. If rewrites and deletes are rarely or never done on the file, then a high compression factor is most efficient. We recommend 100 for files that are rarely updated, 70 for average files, and 50 (or less) for files that are frequently updated.

The MASS-UPDATE option of the OPEN verb can provide significant performance benefits under some circumstances. Several issues come into play, however, when you are deciding whether or not to use MASS-UPDATE. Currently, the MASS-UPDATE clause affects only the systems that use Vision.

Normally, when Vision updates a file, it immediately writes all of the changed information to the disk. This is done for two reasons. One is to allow current information to be accessed by other concurrent processes. The other reason is to ensure that the file will be accurate should the program die suddenly without closing the file. This could happen if the machine's power went out or the operating system crashed. Normally, the only time that a Vision file risks being damaged is during an update to the file.

The MASS-UPDATE option changes this strategy. It allows Vision to retain information in memory until the file is closed. This allows Vision to be much more efficient on MS-DOS machines It can result in two or three times the normal file performance. Using this option, however, means that the file will be at risk from the time the first update is made until the time the file is closed. Should the machine die during this time, the file will almost certainly be corrupt. However, Vision writes enough information to disk to ensure that the file can be rebuilt using vutil.

The MASS-UPDATE option also requires that the entire physical file be locked against other updaters because the disk version of the file is not always accurate. This may limit circumstances where MASS-UPDATE can be used.

Generally, programs might use MASS-UPDATE if they heavily update a file. For many such programs, the fact that the file is at greater risk is not really an issue. For example, many posting programs cannot recover from an incomplete run. This is because the program cannot tell where it left off in the process. This is particularly true for programs that update several files at once, because it is usually not clear which file got updated last. For these programs, it is usually necessary to go to a backup of the affected files when the program dies. These programs are obvious candidates for MASS-UPDATE because it doesn't matter if the files are corrupt after a program failure, since they are just going to be restored from backup. Furthermore, these programs benefit the most from MASS-UPDATE because they do a lot of updating.

Interactive programs, however, make poor candidates for MASS-UPDATE. Usually the volume of updates is low (at least for the time frame the program runs in). Furthermore, interactive programs are often killed or left running overnight by their operators, thus increasing both the risk to the file and the inconvenience of the file lock that MASS-UPDATE implies.

To summarize, then, MASS-UPDATE is appropriate for programs where the implied file lock is useful, the volume of updates is large, and where a system failure would usually require special attention for recovery (either restoring from backup or rebuilding the files).


Note that for convenience when you are converting programs written with other COBOL compilers, ACUCOBOL-GT can treat files opened WITH LOCK as if they were opened with MASS-UPDATE. This is controlled by the MASS-UPDATE runtime configuration variable. Configuration variables are described in Appendix H.