What is a hash function? Definition, usage, and examples

Contents

Given the massive increase in the amount of data being processed by local and global data networks, computer scientists are always looking for ways to speed up data access and ensure that data can be exchanged securely. One solution they use, alongside other security technologies, is the hash function. This article explains the properties of hash functions and how they are used.

The meaning of the verb “to hash” – to chop or scramble something – provides a clue as to what hash functions do to data. That’s right, they “scramble” data and convert it into a numerical value. And no matter how long the input is, the output value is always of the same length. Hash functions are also referred to as hashing algorithms or message digest functions. They are used across many areas of computer science, for example:

To encrypt communication between web servers and browsers, and generate session IDs for internet applications and data caching
To protect sensitive data such as passwords, web analytics, and payment details
To add digital signatures to emails
To locate identical or similar data sets via lookup functions

Definition

A hash function converts strings of different length into fixed-length strings known as hash values or digests. You can use hashing to scramble passwords into strings of authorized characters for example. The output values cannot be inverted to produce the original input.

In the hash function example above, all of the different passwords are converted into fixed-length strings before being stored in the database. These output strings cannot be converted back to find out the actual passwords.

What are the properties of hash functions?

Hash functions are designed so that they have the following properties:

One-way

Once a hash value has been generated, it must be impossible to convert it back into the original data. For instance, in the example above, there must be no way of converting “$P$Hv8rpLanTSYSA/2bP1xN.S6Mdk32.Z3” back into “susi_562#alone”.

Collision-free

For a hash function to be collision-free, no two strings can map to the same output hash. In other words, every input string must generate a unique output string. This type of hash function is also referred to as a cryptographic hash function. In the example hash function above, there are no identical hash values, so there are no “collisions” between the output strings. Programmers use advanced technologies to prevent such collisions.

Lightning-fast

If it takes too long for a hash function to compute hash values, the procedure is not much use. Hash functions must, therefore, be very fast. In databases, hash values are stored in so-called hash tables to ensure fast access.

What is a hash value?

A hash value is the output string generated by a hash function. No matter the input, all of the output strings generated by a particular hash function are of the same length. The length is defined by the type of hashing technology used. The output strings are created from a set of authorized characters defined in the hash function.

Hash values generated using the SHA256 function are always of the same length, irrespective of the number and type of characters in the input string.

The hash value is the result calculated by the hash function and algorithm. Because hash values are unique, like human fingerprints, they are also referred to as “fingerprints”. If you take the lower-case letters “a” to “f” and the digits “0” to “9” and define a hash value length of 64 characters, there are 1.1579209e+77 possible output values – that’s 70 followed by 24 zeros! This shows that even with shorter strings, you can still generate acceptable fingerprints.

The hash values in the example above can be generated with just a few lines of PHP code:

<?php
echo hash('sha256', 'apple'); 
?>

Here, the “sha256” encryption algorithm is being used to hash the input value “apple”. The corresponding hash value or fingerprint is always “3a42c503953909637f78dd8c99b3b85ddde362415585afc11901bdefe8349102”.

Hash functions and websites

With SSL-encrypted data transmission, when the web server receives a request, it sends the server certificate to the user’s browser. A session ID is then generated using a hash function, and this is sent to the server where it is decrypted and verified. If the server approves the session ID, the encrypted HTTPS connection is established and data can be exchanged. All of the data packets exchanged are also encrypted, so it is almost impossible for hackers to gain access.

An extract from the certificate for German broadcasting corporation Deutsche Welle, showing the key the server uses to establish a communication session with the user’s browser.

Session IDs are generated using data relating to a site visit, such as the IP address and time stamp, and communicated with the URL. One common use of session IDs is to give unique identifiers to people shopping on a website. Nowadays, session IDs are rarely passed as a URL parameter (for example, as something like www.domain.tld/index?sid=d4ccaf2627557c756a0762419a4b6695). Instead, they are stored as a cookie in the website header.

Hash values are also used to encrypt cached data to prevent unauthorized users from using the cache to access login and payment details or other information about a site.

Communication between an FTP server and a client using the SFTP protocol also works in a similar way.

Be secure. Buy an SSL certificate.

Secures data transfers
Avoids browser warnings
Improves your Google ranking

Protection of sensitive data

Login details for online accounts are frequently the target of cyber-attacks. Hackers either want to disrupt operation of a website (for example, to reduce income generated by traffic-based ads) or access information about payment methods.

The WordPress Content Management System offers a range of security functions for authenticating registered site users. The keys shown above were generated using various hashing algorithms.

In the WordPress example above, you can see that passwords are always encrypted before they are stored. Combined with the session IDs generated in the system, this ensures a high level of security. This is especially important for protection against “brute force attacks”. In this kind of attack, hackers use their own hash functions to repeatedly try out combinations until they get a result that allows them access. Using long passwords with high security standards makes these attacks less likely to succeed, because the amount of computing power required is so high. Remember: Never use simple passwords, and be sure to protect all of your login details and data against unauthorized access.

Digital signatures

Email traffic is sent via servers that are specially designed to transmit this type of message. Keys generated using hash functions are also used to add a digital signature to messages.

Adding a digital signature to an email is like signing a handwritten letter – you sign once, and your signature is unique.

The steps involved in sending an email with a digital signature are:

Alice (the sender) converts her message into a hash value and encrypts the hash value using her private key. This encrypted hash value is the digital signature.
Alice sends the email and the digital signature to the recipient, Bob.
Bob generates a hash value of the message using the same hash function. He also decrypts the hash value using Alice’s public key and compares the two hashes.
If the two hash values match, Bob knows that Alice’s message has not been tampered with during transmission.

Please note that a digital signature proves the integrity of a message but does not actually encrypt it. If you’re sending confidential data, it’s therefore best to encrypt it as well as using a digital signature.

How can hash functions be used to perform lookups?

Searching through large quantities of data is a very resource-intensive process. Imagine you’ve got a table listing every inhabitant of a big city, with lots of different fields for each entry (first name, second name, address, etc.). Finding just one term would be very time-consuming and require a lot of computing power. To simplify the process, each entry in the table can be converted into a unique hash value. The search term is then converted to a hash value. This limits the number of letters, digits and symbols that have to be compared, which is much more efficient than searching every field that exists in the data table, for example, for all first names beginning with “Ann”.

Summary

Hash functions are used to improve security in electronic communications, and lots of highly sophisticated standards have now been developed. However, hackers are aware of this and are constantly coming up with more advanced hacking techniques.

Reviewer

Christian Heldmaier
Christian Heldmaier is an experienced online marketing and SEO specialist from Karlsruhe. He has been working as an SEO Manager at IONOS since July 2020.

10 Years Digital Guide: A Success Story

Turn calls into revenue: AI receptionist

What are rainbow tables?

Rainbow tables: they may sound innocent, but they’re actually a strong attack method for cybercriminals. Using rainbow tables, you can find out specific passwords in just a few seconds. To protect yourself and your users from such attacks, you should understand how the tables…

Encryption
Security

kamilopafilmsShutterstock

What is TOTP? (Time-based one-time password)

Data leaks and hacker attacks mean that internet security is becoming increasingly important for users. Standard passwords based on names and dates of birth can be cracked in seconds and give criminals access to all kinds of accounts. In combination with…

Security

kentohShutterstock

NTLM (NT LAN Manager)

The NTLM protocol was a practical solution for Windows devices for a long time: A user just had to sign in once and then gain direct access to various network services. However, this authentication method is now considered unsafe and is no longer in use. How does NTLM work, and…

Encyclopedia

Romolo TavaniShutterstock

Brute Force Attack | Definition and protective measures

Being constantly faced with headlines about stolen passwords, it’s understandable that many users are concerned. Your best bet is to make your passwords as complicated as possible and have them consist of many different types of characters. But even this won’t help if it’s the…

Encryption
Security