OpenSSL::Digest allows you to compute message digests (sometimes interchangeably called “hashes”) of arbitrary data that are cryptographically secure, i.e. a Digest implements a secure one-way function.
One-way functions offer some useful properties. E.g. given two distinct inputs the probability that both yield the same output is highly unlikely. Combined with the fact that every message digest algorithm has a fixed-length output of just a few bytes, digests are often used to create unique identifiers for arbitrary data. A common example is the creation of a unique id for binary documents that are stored in a database.
Another useful characteristic of one-way functions (and thus the name) is that given a digest there is no indication about the original data that produced it, i.e. the only way to identify the original input is to “brute-force” through every possible combination of inputs.
These characteristics make one-way functions also ideal companions for public key signature algorithms: instead of signing an entire document, first a hash of the document is produced with a considerably faster message digest algorithm and only the few bytes of its output need to be signed using the slower public key algorithm. To validate the integrity of a signed document, it suffices to re-compute the hash and verify that it is equal to that in the signature.
Among the supported message digest algorithms are:
-
SHA, SHA1, SHA224, SHA256, SHA384 and SHA512
-
MD2, MD4, MDC2 and MD5
-
RIPEMD160
-
DSS, DSS1 (Pseudo algorithms to be used for DSA signatures. DSS is equal to SHA and DSS1 is equal to SHA1)
For each of these algorithms, there is a sub-class of Digest that can be instantiated as simply as e.g.
digest = OpenSSL::Digest::SHA1.new
Mapping between Digest class and sn/ln
The sn (short names) and ln (long names) are defined in <openssl/object.h> and <openssl/obj_mac.h>. They are textual representations of ASN.1 OBJECT IDENTIFIERs. Each supported digest algorithm has an OBJECT IDENTIFIER associated to it and those again have short/long names assigned to them. E.g. the OBJECT IDENTIFIER for SHA-1 is 1.3.14.3.2.26 and its sn is “SHA1” and its ln is “sha1”.
MD2
-
sn: MD2
-
ln: md2
MD4
-
sn: MD4
-
ln: md4
MD5
-
sn: MD5
-
ln: md5
SHA
-
sn: SHA
-
ln: SHA
SHA-1
-
sn: SHA1
-
ln: sha1
SHA-224
-
sn: SHA224
-
ln: sha224
SHA-256
-
sn: SHA256
-
ln: sha256
SHA-384
-
sn: SHA384
-
ln: sha384
SHA-512
-
sn: SHA512
-
ln: sha512
“Breaking” a message digest algorithm means defying its one-way function characteristics, i.e. producing a collision or finding a way to get to the original data by means that are more efficient than brute-forcing etc. Most of the supported digest algorithms can be considered broken in this sense, even the very popular MD5 and SHA1 algorithms. Should security be your highest concern, then you should probably rely on SHA224, SHA256, SHA384 or SHA512.
Hashing a file
data = File.read('document') sha256 = OpenSSL::Digest::SHA256.new digest = sha256.digest(data)
Hashing several pieces of data at once
data1 = File.read('file1') data2 = File.read('file2') data3 = File.read('file3') sha256 = OpenSSL::Digest::SHA256.new sha256 << data1 sha256 << data2 sha256 << data3 digest = sha256.digest
Reuse a Digest instance
data1 = File.read('file1') sha256 = OpenSSL::Digest::SHA256.new digest1 = sha256.digest(data1) data2 = File.read('file2') sha256.reset digest2 = sha256.digest(data2)