next up previous

3.1.4 Text Compression and File Archives

Much of the information available over the Internet has been collected in archives and compressed for more efficient storage and retrieval. In this section we will briefly mention the most common archiving and compressing programs. The commands we present here should be enough to help you decompress and extract most documents, but you should read the manual pages or other documentation if you end up using these programs frequently.

An archive is a collection of files that have been collected and stored in a single file (the archive) in order to make operations on the collection easier and more efficient. In many file systems a collection of several small files takes more space than one large file with the same data, so just making an archive and deleting the originals is worthwhile if you will not use the individual files for a while.

A compressed file is a file that has gone through a data compression algorithm. There is a lot of redundant information in most files, particularly text files. For example, characters in most systems are represented by 8 bits, and a text file with n characters occupies 8n bits on disk. It is possible to design a code that represents common letters with fewer bits. A coding scheme that uses a variable number of bits per character will take less space. The code for the letter e might be 01, and the code for the backquote character (`) might be 00101000010101001. Even though the latter is much longer, the fact that it is so rare in most text files means files will be shorter.