What is a hash function? Definition, usage, and examples
Given the massive increase in the amount of data being processed by local and global data networks, computer scientists are always looking for ways to speed up data access and ensure that data can be exchanged securely. One solution they use, alongside other security technologies, is the hash function. This article explains the properties of hash functions and how they are used.
The meaning of the verb “to hash” – to chop or scramble something – provides a clue as to what hash functions do to data. That’s right, they “scramble” data and convert it into a numerical value. And no matter how long the input is, the output value is always of the same length. Hash functions are also referred to as hashing algorithms or message digest functions. They are used across many areas of computer science, for example:
- To encrypt communication between web servers and browsers, and generate session IDs for internet applications and data caching
- To protect sensitive data such as passwords, web analytics, and payment details
- To add digital signatures to emails
- To locate identical or similar data sets via lookup functions
A hash function converts strings of different length into fixed-length strings known as hash values or digests. You can use hashing to scramble passwords into strings of authorized characters for example. The output values cannot be inverted to produce the original input.
What are the properties of hash functions?
Hash functions are designed so that they have the following properties:
Once a hash value has been generated, it must be impossible to convert it back into the original data. For instance, in the example above, there must be no way of converting “$P$Hv8rpLanTSYSA/2bP1xN.S6Mdk32.Z3” back into “susi_562#alone”.
For a hash function to be collision-free, no two strings can map to the same output hash. In other words, every input string must generate a unique output string. This type of hash function is also referred to as a cryptographic hash function. In the example hash function above, there are no identical hash values, so there are no “collisions” between the output strings. Programmers use advanced technologies to prevent such collisions.
If it takes too long for a hash function to compute hash values, the procedure is not much use. Hash functions must, therefore, be very fast. In databases, hash values are stored in so-called hash tables to ensure fast access.
What is a hash value?
A hash value is the output string generated by a hash function. No matter the input, all of the output strings generated by a particular hash function are of the same length. The length is defined by the type of hashing technology used. The output strings are created from a set of authorized characters defined in the hash function.
The hash value is the result calculated by the hash function and algorithm. Because hash values are unique, like human fingerprints, they are also referred to as “fingerprints”. If you take the lower-case letters “a” to “f” and the digits “0” to “9” and define a hash value length of 64 characters, there are 1.1579209e+77 possible output values – that’s 70 followed by 24 zeros! This shows that even with shorter strings, you can still generate acceptable fingerprints.
The hash values in the example above can be generated with just a few lines of PHP code:
<?php echo hash('sha256', 'apple'); ?>
Here, the “sha256” encryption algorithm is being used to hash the input value “apple”. The corresponding hash value or fingerprint is always “3a42c503953909637f78dd8c99b3b85ddde362415585afc11901bdefe8349102”.
Hash functions and websites
With SSL-encrypted data transmission, when the web server receives a request, it sends the server certificate to the user’s browser. A session ID is then generated using a hash function, and this is sent to the server where it is decrypted and verified. If the server approves the session ID, the encrypted HTTPS connection is established and data can be exchanged. All of the data packets exchanged are also encrypted, so it is almost impossible for hackers to gain access.
Session IDs are generated using data relating to a site visit, such as the IP address and time stamp, and communicated with the URL. One common use of session IDs is to give unique identifiers to people shopping on a website. Nowadays, session IDs are rarely passed as a URL parameter (for example, as something like www.domain.tld/index?sid=d4ccaf2627557c756a0762419a4b6695). Instead, they are stored as a cookie in the website header.
Hash values are also used to encrypt cached data to prevent unauthorized users from using the cache to access login and payment details or other information about a site.
Communication between an FTP server and a client using the SFTP protocol also works in a similar way.
SSL certificates from IONOS
Protect your domain and gain visitors' trust with an SSL-encrypted website!
Protection of sensitive data
Login details for online accounts are frequently the target of cyber-attacks. Hackers either want to disrupt operation of a website (for example, to reduce income generated by traffic-based ads) or access information about payment methods.
In the WordPress example above, you can see that passwords are always encrypted before they are stored. Combined with the session IDs generated in the system, this ensures a high level of security. This is especially important for protection against “brute force attacks”. In this kind of attack, hackers use their own hash functions to repeatedly try out combinations until they get a result that allows them access. Using long passwords with high security standards makes these attacks less likely to succeed, because the amount of computing power required is so high. Remember: Never use simple passwords, and be sure to protect all of your login details and data against unauthorized access.
Email traffic is sent via servers that are specially designed to transmit this type of message. Keys generated using hash functions are also used to add a digital signature to messages.
The steps involved in sending an email with a digital signature are:
- Alice (the sender) converts her message into a hash value and encrypts the hash value using her private key. This encrypted hash value is the digital signature.
- Alice sends the email and the digital signature to the recipient, Bob.
- Bob generates a hash value of the message using the same hash function. He also decrypts the hash value using Alice’s public key and compares the two hashes.
- If the two hash values match, Bob knows that Alice’s message has not been tampered with during transmission.
Please note that a digital signature proves the integrity of a message but does not actually encrypt it. If you’re sending confidential data, it’s therefore best to encrypt it as well as using a digital signature.
How can hash functions be used to perform lookups?
Searching through large quantities of data is a very resource-intensive process. Imagine you’ve got a table listing every inhabitant of a big city, with lots of different fields for each entry (first name, second name, address, etc.). Finding just one term would be very time-consuming and require a lot of computing power. To simplify the process, each entry in the table can be converted into a unique hash value. The search term is then converted to a hash value. This limits the number of letters, digits and symbols that have to be compared, which is much more efficient than searching every field that exists in the data table, for example, for all first names beginning with “Ann”.
Hash functions are used to improve security in electronic communications, and lots of highly sophisticated standards have now been developed. However, hackers are aware of this and are constantly coming up with more advanced hacking techniques.