Behind the phrase “data archiving” is the basic idea of backing up files or entire di­rec­to­ries and storing them in a secure location, often in a com­pressed form. For reasons of data security, archiving was an important factor in server en­vi­ron­ments at an early stage: Orig­i­nal­ly server data was stored on tape drives – a backup method which is still used for large data volumes. To make this archiving method as efficient as possible, the packing program tar (short for tape archiver) was developed for Unix systems in 1979. With the help of tar, files and di­rec­to­ries can to this day still be packed into a single data file and then recovered with the user rights still remaining intact – as long as the source and target both support the Unix or Linux data file di­rec­to­ries.

For the archiving process to free up ad­di­tion­al storage space, .tar-data files are often com­pressed with the help of different tools, like gzip, bzip2, or lzop. But what are the different com­pres­sion programs and formats? And why are they still so important today for systems like Linux tar?

Cloud Backup powered by Acronis
Mitigate downtime with total workload pro­tec­tion
  • Automatic backup & easy recovery
  • Intuitive sched­ul­ing and man­age­ment
  • AI-based threat pro­tec­tion

The most popular com­pres­sion programs for Linux

There are a number of free com­pres­sion tools for Linux dis­tri­b­u­tions that all have one thing in common: they can be operated via command line or terminal. Short commands can quickly compress data files, such as HTML documents, to save storage space and bandwidth when sending via networks or the internet. In addition, there are standard graphic in­ter­faces for these tools, as well as archive managers, which combine several com­pres­sion programs – that must be installed as well – into a single visual user interface. Control of the graphic interface obviously requires ad­di­tion­al system resources, which is why use of a terminal generally remains the best choice for com­pres­sion.

The main dif­fer­ence between the in­di­vid­ual programs is the com­pres­sion rate, which is ac­com­pa­nied by different com­pres­sion durations. In most cases, however, different modes can also be selected in the tool itself to offer either the best possible storage reduction or the quickest possible com­pres­sion time. Another feature that dif­fer­en­ti­ates com­pres­sion software is the output format. Due to the different al­go­rithms used by the various programs, com­pressed files have different pack format and require specific programs to be unpacked.

gzip

gzip (GNU zip) is one of the most used Linux com­pres­sion methods. The tool es­pe­cial­ly plays an important role in web de­vel­op­ment, which is based on the deflate algorithm and was orig­i­nal­ly developed as a successor to the Unix original rock compress for the GNU platform. Today, the ap­pli­ca­tion pro­grammed in C can be used for ex­tract­ing and packing files not only on Linux, but also on Windows and macOS systems. gzip builds 32,000 bytes (32KB) data blocks, which is why it’s con­sid­ered obsolete in modern com­pres­sion programs.

In terms of speed, the free pack program is still among the top options, which is why common web server software such as Apache, IIS, or NGINX usually implement it in the form of their own modules to answer user queries with com­pressed data packets in the shortest possible time. Ad­di­tion­al in­for­ma­tion about the func­tion­al­i­ty and use of the GPL-licensed com­pres­sion tool can be found in our article on the program.

Benefits Drawbacks
Fast com­pres­sion process Small block size
Standard popular web server software Low com­pres­sion ratio

bzip2

For a loss-free and high-quality com­pres­sion of files under Linux, bzip2 is almost marketed under a BSD-similar license. The ap­pli­ca­tion uses a three-layer com­pres­sion method: First the Burrows-Wheeler Trans­for­ma­tion is used to sort the incoming data into different blocks. These are 900,000 bytes (900KB) each, and then undergo a Move-to-front trans­for­ma­tion. Finally, a Huffman coding provides for the actual com­pres­sion of the data. Files packages with bzip2 are given the for­mat­ting .bz2.

The program, developed by Julian Seward, trumps other tools by far in terms of com­pres­sion, but also takes a lot more time to complete the process. One of the biggest ad­van­tages is that you can work with partially damaged archives in unpacking bz2. With the help of bzip2recover, you can at least extract and unpack all readable blocks. bzip2 is the official successor of bzip, which worked with an arith­metic code and wasn’t developed further for patent reasons.

Benefits Drawbacks
Strong com­pres­sion rate Very slow
Unpacking partially damaged archives possible

p7zip

p7zip is a portal  of the free, LGPL-licensed 7-zip archive program for POSIX platforms. The portal is the only solution under Linux that fully supports the .7z format. The packing program is based on the Lempel-Ziv-Markov algorithm (LZMA) developed by Igor Pavlov in 1998, which works with a dic­tio­nary method and can, in principle, be regarded as a further de­vel­op­ment of Deflate (with ap­prox­i­mate­ly 50% stronger com­pres­sion). A created file archive can be split into as many parts as required, with password pro­tec­tion and optional en­cryp­tion using AES-256 (header).

LZMA provides excellent results with its high com­pres­sion rate, and also performs well in terms of speed. But the archiving tool also places very high demands on system per­for­mance. A good processor (at least 2GHz) and suf­fi­cient memory (2GB or more) are basic pre­req­ui­sites, es­pe­cial­ly for high com­pres­sion levels. Aside from use via terminal or an archive manager, p7zip-gui also has its own graphic interface for the ported 7-zip ap­pli­ca­tion.

Benefits Drawbacks
Excellent ratio of com­pres­sion and duration Very high system re­quire­ments
Password pro­tec­tion and header en­cryp­tion possible

lzop

The com­pres­sion program lzop (Lempel-Ziv-Oberhumer-Packer) focuses on the speed of the packing and unpacking processes, just like gzip, and averages even better results than the GNU tool. It’s based on its namesake, the Lempel-Ziv-Oberhumer Algorithm (LZO), which was published in 1996 under the GNU General Public License (GPL). The resource-efficient com­pres­sion works according to the dic­tio­nary method: Repeating strings are replaced by a symbol, which points to the cor­re­spond­ing entry of the same, first-recorded string in the dic­tio­nary. The files are processed in blocks of 256,000 bytes (256KB). By default, the original file will remain in the process.

Beside a top-level com­pres­sion speed and com­pat­i­bil­i­ty with gzip, the de­vel­op­ment of lzop focused on the porta­bil­i­ty of the software as a top issue. For this reason, versions exist for virtually all platforms, including macOS and Windows. Com­pressed files contain the format .lzo.

Benefits Drawbacks
Very quick com­pres­sion Com­pres­sion ratio rather low due to the high speed
High porta­bil­i­ty

Popular tools and formats: A tabular com­par­i­son

gzip bzip2 p7zip lzop
Operating systems Cross-platform Linux/Unix, Windows Unix-like Cross-platform
License GNU GPL BSD-like GNU LGPL GNU GPL
Com­pres­sion procedure Deflate algorithm Burrows-Wheeler trans­for­ma­tion, move-to-front trans­for­ma­tion, Huffman coding LZMA algorithm LZO algorithm
Data format .gz .bz2 .7z .lzo
En­cryp­tion AES-256
Com­pres­sion mode 1–9 1–9 0–9 1, 3, 7–9
Strengths Very fast Very good com­pres­sion rate Superb com­pres­sion rate, com­press­es file di­rec­to­ries Very fast, com­press­es file di­rec­to­ries
Weak­ness­es Only com­press­es single files Moderate speed, only com­press­es single files High system per­for­mance demands Weak com­pres­sion rate

The table overview makes it apparent that there is no single in­dis­pens­able com­pres­sion tool, but instead demon­strates that the choice of program depends on the operation scenario. p7zip, for example, has clear ad­van­tages, such as the strength of com­pres­sion rate and the pos­si­bil­i­ty for AES-256 en­cryp­tion, which is worth quite a lot when security plays a large role. Ad­di­tion­al­ly, p7zip and lzop both allow for the com­pres­sion of entire file di­rec­to­ries, while with gzip and bzip2 only single files can be com­pressed. On the other hand, p7zip also makes high demands on the system per­for­mance, making it less suitable for small-scale com­pres­sion.

How data com­pres­sion works with Linux tools

The mentioned packing programs differ sig­nif­i­cant­ly in terms of com­pres­sion rates and speed. When it comes to the syntax and use of these tools, though, the sim­i­lar­i­ties are no­tice­able. All programs can be used without a specific graphic interface or archive manager, via the command line. Beginners can quickly become ac­cus­tomed to the different pa­ra­me­ters and commands. As an example, we’ll show you how to compress files with bzip2 under Linux and then unpack such files in the .bz2 format.

The universal syntax of bzip2 has the following form:

bzip2 Optional file(s)

For the standard com­pres­sion process it’s not necessary to specify options. This is only required if you want to change com­pres­sion settings, access the overview menu, or unpack a .bz2 file. For example, to pack the text document test.txt, you just need to complete the command

bzip2 test.txt

to delete the original file and replace it with the com­pressed file test.txt.bz2. By placing the documents together, you can also package multiple files with a single command:

bzip2 text.txt test2.txt test3.txt

If you want to de­com­press a packed document, it’s necessary – as mentioned earlier – to set the cor­re­spond­ing option pa­ra­me­ters (-d):

bzip2 –d test.txt

Here’s an overview of some other bzip2 command options:

Command De­scrip­tion
-1 … -9 Gives the com­pres­sion rate on a scale of 1 to 9, where 1 is the weakest rate and 9 is the strongest; Default value is 5
-f Starts the com­pres­sion, even if a .bz2 file of the same name already exists; in this case, the existing file is over­writ­ten
-c Writes the packed document to the standard output (usually the desktop)
-q Blocks all bzip2 messages
-v Shows ad­di­tion­al in­for­ma­tion, like the com­pres­sion rate for all processed files
-t Checks the integrity of the selected file
-k If you add this parameter to a com­pres­sion command, the original file will remain
-h Opens the overview menu

Reasons for high tar demand

The archiving program tar has been in operation for over 30 years and has hardly lost any of its value. Partially, this is because the tool allows data to be archived while retaining file de­f­i­n­i­tions. Mainly, though, it’s because it allows for the packing of complete file di­rec­to­ries. This makes tar the perfect partner of com­pres­sion­al tools like gzip and bzip2, which only allow for single file data com­pres­sion.

In the first step, the packing program compiles all data files in a selected directory into a single archive file without unlinking any of the contained files. In the second step, the files are com­pressed using one of the specific com­pres­sion programs. As a result of this com­pres­sion, which is either described as pro­gres­sive, compact, or solid, the archive files are given extended formats, such as .tar.gz (.tgz for short) or .tar.bz2 (.tbz2 for short). The packing program also allows for the sub­se­quent unpacking of such files (e.g. file type .tar.gz).

Tar archive: How to (un)pack .tar.gz and Co. under Linux

The com­bi­na­tion of tar and a com­pres­sion tool isn’t required, so you can also combine files in an archive that you haven’t pre­vi­ous­ly packaged or don’t want to compress. For example, if you want to bundle the un­com­pressed test documents test.txt and test2.txt in the same archive named archive.tar, the following command will suffice:

tar –cf archive.tar test.txt test2.txt

To unzip this archive on Linux, replace the –c (create new archive) parameter with –x (extract files from archive). If not only a certain archive component is to be unzipped, then the file(s) can be omitted:

tar –xf archiv.tar

Al­ter­na­tive­ly, if you aim to pack a com­pressed archive – for example, on the basis of the gzip com­pres­sion, including the extended for­mat­ting .tar.gz – then tar also offers cor­re­spond­ing options. Since the program has im­ple­ment­ed options for com­pres­sion and de­com­pres­sion with the bzip2, xz, compress, and gzip pack programs, this is also possible with a single command:

tar –czf archive.tar.gz test.txt test2.txt

 The command to unpack .tar.gz differs from the equiv­a­lent for un­com­pressed di­rec­to­ries only through the spec­i­fi­ca­tion of the pack program parameter:

tar –xzf archive.tar.gz
Tip

The parameter –f, which lets you select the re­spec­tive archive file, must always be in the last place – the following char­ac­ters are always in­ter­pret­ed as a file.

The most important commands of the archiving ap­pli­ca­tion

In addition to the pre­vi­ous­ly listed command options for easy file archiving, there are several ad­di­tion­al pa­ra­me­ters to specify the pack or unpack process. These include the com­pres­sion methods already mentioned, options for setting up di­rec­to­ries, as well as options for checking and pre­view­ing the tar archives:

Command De­scrip­tion
--help Access the tar menu
-c Create a new archive
-d Allows you to compare files in the archive and in the file system
-f Writes the selected files to an archive with the specified file name; Reads the data from the archive with the specified file name
-j Com­press­es archives with bzip2 or unzips same archives
-J Com­press­es archives with xz or unzips same archives
-k Prevents existing files from being over­writ­ten when they’re extracted from an archive
-p Ensures that access priv­i­leges remain during ex­trac­tion
-r Adds files to a pre­vi­ous­ly created archive
-t Displays the contents of the selected archive
-u Adds only those files to an archive that are younger than their archive version
-x Unzips files from an archive
-z Com­press­es archive with gzip or unzips same archive
-Z Com­press­es files with compress or unzips same archive
-A Im­ple­ments the contents of an archive into another archive
-C Changes to the specified directory to unzip the selected archive
-M Option to create, display, or extra a multi-part archive
-W Checks the archive after the archiving process
Tip

Some options, like adding files to an existing archive (-r), don’t work with com­pressed archives. These have to be unzipped first.

Examples:

Display the content of an archive

tar –tf archive.tar

Update contents of an archive (doesn’t include sub­di­rec­to­ries!)

tar –uf archive.tar file(s)

Expand contents of an archive

tar –rf archive.tar New File

Compare contents of an archive with the file system (run in the archive directory!)

tar –dvf archive.tar

File Roller: the archive manager for GNOME

File Roller is a graphic user interface for various com­pres­sion tools and packing programs, which is standard for the operation of command lines. The archive manager is available for the GNOME and Unity desktop en­vi­ron­ments, and has been dis­trib­uted under the GNU General Public License since 2001. It allows contents of various archive files to be viewed, as well as for files to be extracted, deleted, or added to them. It is also possible to create new com­pressed or unchanged files and archives, as well as to convert them to another format. For this purpose, the main window of the software offers various buttons and menus alongside a drag-and-drop function. In addition to tar archive formats like tar.gz, File Roller supports the following formats:

  • .7z
  • .tar
  • .gzip
  • .bzip2
  • .ar
  • .jar
  • .cpio

File Roller is pre­in­stalled on some Linux dis­tri­b­u­tions, such as Ubuntu, by default, but can also be installed manually using the re­spec­tive package manager or from the official homepage. An al­ter­na­tive for the desktop en­vi­ron­ment KDE is Ark.

Go to Main Menu