An MD5 hash value is a simple text string that serves as digital 'signatures' of files. For example "e43b3e2da3be90029bc45f9c7ad061e7" or "7d3bdffed2314b377db21c4148b2be81"


In the eDiscovery process, they are usually used to spot exact duplicates of files, as two identical files will have the same MD5 hash values (and thus the same digital signature.)


The values are based on the low-level, binary data of files and not directly on the textual content or metadata of the file. Thus it is even possible for two files to have the same visual appearance, textual content, and metadata values, but still produce different MD5 hashes if there is even the slightest change to the underlying binary data of the file. In general, any operation that modifies a file will change the MD5 value. This includes editing images or the contents of a document.


MD5 hash values in productions

Changes are often made to files during the production process (in non-native productions), which lead to a difference between the MD5 hash value presented in a load file and the actual hash of the final, produced file. For example, redactions or Bates stamping are examples of visual changes to the data of documents. Non-visual causes or changes that aren't user-initiated which can lead to a difference in MD5 values include normalization operations performed by production software. For example,  images may be compressed or PDFs may be modified to make them more likely to open and appear consistently in different PDF viewers, or to reduce the production's output disk size. Here, the MD5 values provided in load files are normally the hash values of the original document, and not necessarily the final produced documents.