Ceph is a com­pre­hen­sive storage solution that uses its very own Ceph file system (CephFS). Ceph offers the pos­si­bil­i­ty to file various com­po­nents within a dis­trib­uted network. In addition, the data can be phys­i­cal­ly secured in various storage areas. Ceph guar­an­tees a wide variety of storage devices from which to choose, alongside high scal­a­bil­i­ty.

IONOS Cloud Object Storage
Cloud storage at an un­beat­able price
  • Perfect for backups and archiving.
  • Redundant and secure data storage across multiple regions.
  • Un­beat­able price-per­for­mance ratio at $4.99/TB.

What you should know about Ceph and its most important features

Ceph was conceived by Sage A. Weil, who developed it while writing his dis­ser­ta­tion and published it in 2006. He then led the project with his company Intank Storage. In 2014, the company was acquired by RedHat, with Weil staying on as the chief architect, in charge of the software’s de­vel­op­ment.

Ceph only works on Linux systems, for example CentOS, Debian, Fedora, RedHat/RHEL, OpenSUSE, and Ubuntu. Accessing the software through Windows systems cannot be done directly, but is possible through the use of iSCSI (Internet Small Computer System Interface). As such, Ceph is par­tic­u­lar­ly suitable for use in data centers that make their storage space available over servers, and for cloud solutions of any kind that use software to provide storage.

We have complied a list of the most important features of Ceph:

  • Open-source
  • High scal­a­bil­i­ty
  • Data security through redundant storage
  • Absolute re­li­a­bil­i­ty through dis­trib­uted data storage
  • Software-based increase in avail­abil­i­ty through an in­te­grat­ed algorithm for locating data
  • Con­tin­u­ous memory al­lo­ca­tion
  • Minimal hardware re­quire­ments (set-up possible with 1 GB RAM on a computer with a single-core processor and only a few GB of available storage space, depending on the task in the network)

Ceph func­tion­al­i­ties

Ceph requires several computers that are connected to one another in what is called a cluster. Each connected computer within that network is referred to as a node.

The following tasks must be dis­trib­uted among the nodes within the network:

  • Monitor nodes: Monitor the status of in­di­vid­ual nodes in the cluster, es­pe­cial­ly the managers, object storage devices, and metadata servers (MDS). In order to ensure maximum re­li­a­bil­i­ty, at least three monitor nodes are rec­om­mend­ed.
  • Managers: Manage the status of storage usage, system load, and the capacity of the nodes.
  • Ceph-OSDs (Object Storage Devices): The back­ground ap­pli­ca­tions for the actual data man­age­ment; they are re­spon­si­ble for the storage, du­pli­ca­tion, and restora­tion of data. At least three OSDs are rec­om­mend­ed for a cluster.
  • Metadata servers (MDSs): Store metadata, including storage paths, file names, and time stamps of files stored in the CephFS for per­for­mance reasons. They are POSIX-com­pat­i­ble, and can be queried using Unix command lines such as is, find, and like.

The cen­ter­piece of the data storage is an algorithm called CRUSH (Con­trolled Repli­ca­tion Under Scalable Hashing). It uses an al­lo­ca­tion table called the CRUSH Map to find an OSD with the requested file.

Ceph pseudo-randomly dis­trib­utes files, meaning that they appear to be filed in­dis­crim­i­nate­ly. However, CRUSH actually chooses the most-suitable storage location based on fixed criteria, after which the files are du­pli­cat­ed and then saved on phys­i­cal­ly separate media. The ad­min­is­tra­tor of the network can set the relevant criteria.

Files are organized into placement groups. File names are processed as hash values. Another or­ga­ni­za­tion­al property is the quantity of file du­pli­cates.

Note

Hash values are strings of numeric values that are returned following the pro­cess­ing of input by certain computing op­er­a­tions. An easier approach would be to generate the checksum from the raw data. However, highly-complex al­go­rithms do come into play that create unique digital fin­ger­prints out of data of any length. The output always has the same compact length and does not contain any unwanted symbols, making it suitable for the pro­cess­ing of file names, as well.

In order to guarantee data security, jour­nal­ing is used on the OSD level. Every file to be saved is stored tem­porar­i­ly until it has been properly saved on the intended OSD.

Accessing stored data

The base of the Ceph data storage ar­chi­tec­ture is called RADOS, a reliable, dis­trib­uted object store comprised of self-healing, self-mapping, in­tel­li­gent storage nodes.

There are several ways to access stored data:

  • librados: Native access is possible by using the librados software libraries through APIs in pro­gram­ming and scripting languages, such as C/C++, Python, Java, and PHP.
  • radosgw: Data can either be read or written by means of the HTTP Internet protocol in this gateway.
  • CephFS: This is the POSIX-com­pat­i­ble, inherent file system. It offers a kernel module for computers with access, and also supports FUSE (a file system creation interface that does not require ad­min­is­tra­tor rights).
  • RADOS Block Device: In­te­gra­tion using block storage through a kernel module or a virtual system like QEMU or KVM.

Al­ter­na­tives to Ceph

The most popular al­ter­na­tive is GlusterFS, which also belongs to the Linux dis­trib­u­tor RedHat/RHEL and can also be used at no cost. Gluster follows a similar approach for ag­gre­gat­ing dis­trib­uted memory into a unified storage location within the network. Both solutions, GlusterFS vs Ceph, have their own pros and cons.

There are other free al­ter­na­tives, such as XtremFS and BeeGfs. Microsoft offers com­mer­cial, software-based storage solutions for Windows servers, including Storage Spaces Direct (S2D).

Compute Engine
The ideal IaaS for your workload
  • Cost-effective vCPUs and powerful dedicated cores
  • Flex­i­bil­i­ty with no minimum contract
  • 24/7 expert support included

Pros and cons of Ceph

Ceph is the best choice in many sit­u­a­tions, but this method for storing data also comes with some dis­ad­van­tages.

Pros of Ceph

Ceph is free and is also an es­tab­lished method, despite its com­pa­ra­bly young de­vel­op­ment history. You can find a large amount of helpful in­for­ma­tion online regarding its set-up and main­te­nance. In addition, the ap­pli­ca­tion has been ex­ten­sive­ly doc­u­ment­ed by the man­u­fac­tur­er. The take-over by RedHat is enough to ensure that it will continue to be developed for the near future. Its scal­a­bil­i­ty and in­te­grat­ed re­dun­dan­cy ensure data security and flex­i­bil­i­ty within the network. On top of that, avail­abil­i­ty is also guar­an­teed by the CRUSH algorithm.

Note

Re­dun­dan­cy in this sense means “surplus.” In computer tech­nol­o­gy, it is used to designate ad­di­tion­al, surplus data. In this case, data re­dun­dan­cy is often de­lib­er­ate­ly executed in order to ensure data security and re­li­a­bil­i­ty. This is possible on both the soft- and hardware levels: On the one hand, data or in­for­ma­tion that is required for data re­con­struc­tion can be stored several times in the memory; on the other hand, phys­i­cal­ly separate storage com­po­nents can be made available in several places, in order to com­pen­sate for any mal­func­tions from in­di­vid­ual computers.

Cons of Ceph

Due to the variety of com­po­nents provided, a com­pre­hen­sive network is required, in order to be able to fully use all of Ceph’s func­tion­al­i­ties. In addition, the set-up is rel­a­tive­ly time consuming, and the user cannot be entirely sure where the data is phys­i­cal­ly being stored.

Go to Main Menu