Dis­trib­uted file systems are a solution for storing and managing data that no longer fit onto a typical server. Lack of capacity can be due to more factors than just data volume. For example, if the data to be stored is un­struc­tured, then a classic file system with a file structure will not do.

IONOS Cloud Object Storage
Cloud storage at an un­beat­able price
  • Perfect for backups and archiving.
  • Redundant and secure data storage across multiple regions.
  • Un­beat­able price-per­for­mance ratio at $4.99/TB.

Saving large volumes of data – GlusterFS and Ceph make it possible

With bulk data, the actual volume of data is unknown at the beginning of a project. As such, systems must be easily ex­pand­able onto ad­di­tion­al servers that are seam­less­ly in­te­grat­ed into an existing storage system while operating. For a user, so-called “dis­trib­uted file systems” look like a single file in a con­ven­tion­al file system, and they are unaware that in­di­vid­ual data or even a large part of the overall data might actually be found on several servers that are sometimes in different ge­o­graph­i­cal locations. Since GlusterFS and Ceph are already part of the software layers on Linux operating systems, they do not place any special demands on the hardware. Linux runs on every standard server and supports all common types of hard drives.

High avail­abil­i­ty is decisive

High avail­abil­i­ty is an important topic when it comes to dis­trib­uted file systems. Hardware mal­func­tions must be avoided as much as possible, and any software that is required for operation must also be able to continue running un­in­ter­rupt­ed even while new com­po­nents are being added to it. Main­te­nance work must be able to be performed while the system is operating, and all-important metadata should not be saved in a single central location. Access to metadata must be de­cen­tral­ized, and data re­dun­dan­cy must be a factor at all times. A server mal­func­tion should never neg­a­tive­ly impact the con­sis­ten­cy of the entire system. GlusterFS and Ceph are two systems with different ap­proach­es that can be expanded to almost any size, which can be used to compile and search for data from big projects in one system.

Fact

The term “big data” is used in relation to very large, complex, and un­struc­tured bulk data that is collected from sci­en­tif­ic sensors (for example, GPS satel­lites), weather networks, or sta­tis­ti­cal sources. In addition to storage, efficient search options and the sys­tem­ati­za­tion of the data also play a vital role with big data.

A short in­tro­duc­tion to GlusterFS

GlusterFS is a dis­trib­uted file system with a modular design. Various servers are connected to one another using a TCP/IP network. As a POSIX (Portable Operating System Interface)-com­pat­i­ble file system, GlusterFS can easily be in­te­grat­ed into existing Linux server en­vi­ron­ments. This is also the case for FreeBSD, Open­So­laris, and macOS, which support POSIX. In­te­gra­tion into Windows en­vi­ron­ments can only be achieved in the round­about way of using a Linux server as a gateway.

Func­tion­al­i­ties of GlusterFS

During its be­gin­nings, GlusterFS was a classic file-based storage system that later became object-oriented, at which point par­tic­u­lar im­por­tance was placed on optimal in­te­gra­bil­i­ty into the well-known open-source cloud solution OpenStack. GlusterFS still operates in the back­ground on a file basis, meaning that each file is assigned an object that is in­te­grat­ed into the file system through a hard link. There are no dedicated servers for the user, since they have their own in­ter­faces at their disposal for saving their data on GlusterFS, which appears to them as a complete system.

Pros Cons
Easy in­te­gra­tion into Linux systems In­te­gra­tion into Windows systems can only be done in­di­rect­ly
POSIX-com­pat­i­bil­i­ty
Supports FUSE (File System in User Space)

Short in­tro­duc­tion to Ceph

The dis­trib­uted open-source storage solution Ceph is an object-oriented storage system that operates using binary objects, thereby elim­i­nat­ing the rigid block structure of classic data carriers. Phys­i­cal­ly, Ceph also uses hard drives, but it has its own algorithm for reg­u­lat­ing the man­age­ment of the binary objects, which can then be dis­trib­uted among several servers and later re­assem­bled.

Func­tion­al­i­ties of Ceph

Every component is de­cen­tral­ized, and all OSDs (Object-Based Storage Devices) are equal to one another. As such, any number of servers with different hard drives can be connected to create a single storage system. Ceph can be in­te­grat­ed several ways into existing system en­vi­ron­ments using three major in­ter­faces: CephFS as a Linux file system driver, RADOS Block Devices (RBD) as Linux devices that can be in­te­grat­ed directly, and RADOS Gateway, which is com­pat­i­ble with Swift and Amazon S3.

Pros Cons
Easy in­te­gra­tion into all systems, ir­re­spec­tive of the operating system being used Weaker file system functions
Block device for Linux Higher in­te­gra­tion effort needed due to com­plete­ly new storage struc­tures
CephFS file system for Linux
Amazon S3 API
Seamless con­nec­tion to Keystone au­then­ti­ca­tion
FUSE module (File System in User Space) to support systems without a CephFS client

Com­par­i­son: GlusterFS vs. Ceph

Due to the technical dif­fer­ences between GlusterFS and Ceph, there is no clear winner. Ceph is basically an object-oriented memory for un­struc­tured data, whereas GlusterFS uses hi­er­ar­chies of file system trees in block storage. GlusterFS has its origins in a highly-efficient, file-based storage system that continues to be developed in a more object-oriented direction. In contrast, Ceph was developed as binary object storage from the start and not as a classic file system, which can lead to weaker, standard file system op­er­a­tions.

GlusterFS Ceph
File system strengths Object storage strengths
Quicker storage algorithm Better per­for­mance on simpler hardware
No central metadata server necessary Easy in­te­gra­tion into all systems, no matter the operating system being used
Lower com­plex­i­ty Block device for Linux
Better suit­abil­i­ty for saving larger files (starting at around 4 MB per file) Easier pos­si­bil­i­ties to create customer-specific mod­i­fi­ca­tions
Better suit­abil­i­ty for data with se­quen­tial access RADOS com­pat­i­bil­i­ty

When should which system be used?

Because of its diverse APIs, Ceph works well in het­ero­ge­neous networks, in which other operating systems are used alongside Linux. But the strengths of GlusterFS come to the forefront when dealing with the storage of a large quantity of classic and also larger files. Since Ceph was developed as an open-source solution from the very start, it was easier to integrate into many locations earlier than GlusterFS, which only later became open-source. A major ap­pli­ca­tion for dis­trib­uted memories is cloud solutions. In this regard, OpenStack is one of the most important software projects offering ar­chi­tec­tures for cloud computing. GlusterFS and Ceph both work equally well with OpenStack.

Go to Main Menu