GoldFynch features a “deduplication” function that helps identify whether there are multiple copies of the same file present in a case, and flag such files with a special "DUPE" tag.
After running an initial "evaluation," it also displays detailed statistics related to the information that falls within your Scope (the specified sets of files to run the comparison between) and Strategy (the type of algorithm the system uses to identify duplicates.) Scroll down to the end of this article to learn more about defining the Strategy and Scope of a deduplication process.
The system then lets you run the dedupe process to add the special "DUPE" system tag to all duplicate files. It also transfers any tags and tag notes on dupe files to the "primary" file (learn more about "primaries" in the Scope section below.)
Step 1. Navigate to the "De-dupe" view and click on the "+ New De-dupe Session" button.
Step 2. Enter a name for the de-dupe session and click on the "Create" button.
Step 3. Select a de-dupe Scope and Strategy.
- If you select the Whole Case option, it is recommended you check the "Untag current case-wide DUPEs and start over" checkbox to provide an accurate evaluation based on current dupes present in the case
- If you have selected the Whole Case vs. Folder or Folder A vs. Folder B options, click on the respective "Browse..." button and select the respective folder in your GoldFynch case.
Step 4. Click on the "Save and Evaluate" button. You will then be presented with a report of the specified datasets along with information about the duplicates present in them.
Step 5. Click on the "Apply..." button to run the final de-dupe process. It will open an "Apply De-dupe" overlay.
- If no duplicates are found then you will not be able to proceed further from here.
- You can download a report of the de-dupe evaluation by clicking on the Download Report button
Step 6. Click on the "Apply" button.
The system scans for conflicts within the selected file set(s) that may affect the de-dupe process and will display warnings in the following scenarios:
Note: the total number of items that are tagged may be more than the total number of dupes found during the evaluation. This is because the tag will also be applied to all attachments of dupe items (which are not considered or counted during the de-dupe evaluation process.)
Once the de-dupe process is complete you will see a confirmatory message at the top of your screen with the Scope and Strategy used.
If a more recent de-dupe operation has been performed, the dedupe session will indicate this instead.
When a de-dupe operation is run, all duplicate documents are collected into groups, and within these groups one or more "primary" candidates and designates as such, with all other documents as "dupes."
The de-dupe process can be run on specific groups of files as described below:
- Whole Case - All duplicate files in the case are found
- Whole Case vs. Folder - Compares a single folder against the entire case (i.e. "do any of the files in this folder exist in the case"). The "folder" files will be marked as duplicates
- Folder A vs. Folder B - Compares one folder – also called a Target – against another folder – also called a Source (i.e. "Are there any duplicates in Folder A for each item in Folder B")
- Hash-based Strategies compare the item hashes directly and apply to all kinds of file types (learn more about MD5 hash values here)
- Message-ID based Strategies are used specifically for eml/msg files and look at Email IDs/Message IDs to find dupes. If an item doesn't have a Message ID, they are ignored. The Message-ID-based options listed below compare the following parameters and require them to be the same for files to be flagged as duplicates.
- Message IDs alone
- Message IDs AND Email/Message Subjects
- Message IDs AND Email/Message Subjects AND Time of the Emails/Messages
Reset case-wide Dupe files
You can easily reset all the duplicate files across your case from the Deduplication view
1. Navigate to the Dedupe view
2. Click on the Reset case-wide DUPE files button
3. Click on the Yes, reset DUPEs button on the confirmation screen overlay
On successful completion of the reset process, a message will be displayed on the screen
Managing Dedupe sessions
You can delete completed or unwanted de-duplication sessions. Note that this just removes the dedupe session, and doesn't actually delete any files.
To do so, in the de-dupe view, click on the trashcan icon against a particular dedupe session, then click on the "Delete" button in the Confirm Deletion screen overlay.