Emails in eDiscovery datasets are known for having duplicates - improperly configured email clients can cause this and can trigger each time you forward or reply to a message or even Cc a message to a group. During eDiscovery, coming across the same message in multiple instances adds to your workload. In this solution, we will look into how to identify duplicate emails in your eDiscovery dataset using GoldFynch's deduplication system.


Deduplication method for finding duplicate emails

You can identify duplicate emails by using the Message-ID deduplication method. Message-ID deduplication depends on unique identifiers assigned to email messages by mail servers or email clients. The Message ID is a header field in email messages that helps train and identify unique messages. 


During deduplication, all the emails in your eDiscovery dataset are put into various groups. Within these groups, the system will designate one set of emails as the primary email, and any matches will be duplicates.


Running Message ID deduplication in GoldFynch

Step 1. Navigate to the "De-dupe" view and click the "+ New De-dupe Session" button.


Navigate to the dedupe function and click on the New Dedupe session button 

Step 2. Enter a name for the de-dupe session and click on the "Create" button


Enter a name for the dedupe session and click on Create


Step 3. Select a de-dupe Scope 

Select the appropriate scope and a message-id based strategy for the dedupe process

  • If you select the Whole Case option, we recommend that you check the "Untag current case-wide DUPEs and start over" checkbox to provide an accurate evaluation of duplicate emails in your GoldFynch case.
  • If you select the Whole Case Vs. Folder or Folder A Vs Folder B options, click on the respective "Browse..." button and select the folder you want to use from your GoldFynch case.


Step 4. Select one of the Message-ID-based strategies from the drop-down. The available strategies are - 

  • Message-ID based
  • Message-ID + Subject based
  • Message-ID + Subject + Time based


Step 5. Click on the "Save and Evaluate" button. The system will generate a report of the specified datasets and their duplicate emails. Before proceeding with the DUPE tagging process you can analyze the dupe items by downloading the report. 


Click on Save and evalute to get the dupe stats for your case

Step 6. Click the "Apply" button to display the "Apply De-dupe" confirmation screen. The confirmation screen lets you know how the duplicates will be tagged and if you have chosen to remove the tag on the current case-wide tags before starting the de-dupe process. 


Step 7. Click on "Apply" on the confirmation screen to tag the duplicate files


Note: 

  • If there are no duplicates in the case, you will not be able to proceed further
  • Click on the Download Report button to download a report of the de-dupe evaluation


Learn more about deduplicating the files in your case here