Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

farsheedify · 2024-11-09T12:29:51Z

Bug description
In our organization, we are experiencing an issue with SAST scans where changes in code (the position of the vulnerable line) cause DefectDojo to mark the finding as mitigated and create a new finding for the same bug at the new line position. Currently, we are uploading the report with the scan type set to GitLab SAST Report in our CI/CD. As I understand, the default behavior for deduplication of static scans includes the line number in the algorithm.

I noticed that for Semgrep JSON Reports, DefectDojo uses the fingerprint ID provided by Semgrep for deduplication. I decided to test this change in the demo instance of DefectDojo to see if it could resolve our issue with SAST findings being closed and reopened due to line changes.

I scanned a sample repository on GitHub with Semgrep and generated a JSON report. I uploaded the report using the following curl command on the demo server:

curl -X "POST" "https://demo.defectdojo.org/api/v2/import-scan/" -H "accept: application/json" -H "Authorization: Token ***" -H "Content-Type: multipart/form-data" -F [email protected] -F "product_type_name=ptype_name" -F "active=true" -F "verified=true" -F "close_old_findings=true" -F "engagement_name=${GITHUB_RUN_ID}(SAST)" -F "build_id=${GITHUB_RUN_ID}" -F "minimum_severity=Info" -F "close_old_findings_product_scope=true" -F "scan_date=$TODAY" -F "engagement_end_date=$TODAY" -F "commit_hash=${GITHUB_SHA}" -F "product_name=${product_name}" -F "auto_create_context=true" -F "scan_type=Semgrep JSON Report" -F "tags=${GITHUB_REF_NAME}" -F "source_code_management_uri=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}"

Originally, a bug was found on the 26th line of a file. After changing that line of code to the 21st line and rerunning the scan, the original bug was marked as mitigated, and the new bug was marked as a duplicate. Now, the bug has disappeared from the active bugs. The image below summarizes the situation. The bug and the file are the same. It seems that DefectDojo correctly marked the new finding as a duplicate of the previous finding (since only the line changed, but the fingerprints are the same). However, it incorrectly marked the original bug as mitigated, causing it to be removed from the active findings.

Steps to reproduce
Steps to reproduce the behavior:

Upload a Semgrep JSON Report.
Change the line of code for one of the findings.
Rerun the scan.
Upload the new generated report
The finding with the changed line of code disappears from active findings.

Expected behavior
When rerunning the scan on the same file, if there is a change in the line of code, DefectDojo should:

1.Check the Fingerprint First: Use the fingerprint in the Semgrep report to identify findings.
2. Mark New Findings Correctly: Mark newly imported findings with similar instances (with different line numbers) as active and verified, since they can be assumed the latest instances.
3. Handle Original Findings Appropriately: Mark the original findings as duplicates (since they have outdated line of code).

Since the new finding contains the latest, updated line of code, it should be kept as the active finding. Ideally, DefectDojo should update the original finding's line of code without creating a new finding. This approach ensures that:

If the number of duplicates to be kept is set to 0, there will be only one, updated instance per sast finding.
We have the latest line of code for static findings.

farsheedify · 2024-11-26T14:00:41Z

I tested the same scenario, but this time I re-uploaded the same report using the UI (Ad Hoc Import), and the problem did not occur. It seems the issue only arises when importing the report through subsequent API imports, and possibaly related to engagement type, which are different (CI/CD vs Interactive). Here are the steps I followed:

Scan the code and import the report via API: Initially, I scanned the code and imported the report using the API (CI/CD).
Modify a finding and rescan: I changed the line number of one of the findings and rescanned the repository.
Manual upload via UI: Finally, I manually uploaded the report through the UI (Interactive).

I plan to create an engagement with the type "Interactive" and then upload the test report into that engagement. This approach will be used instead of "auto creating context" in the import-scan request to determine if it resolves the issue.

Update:
I used "Interactive" as the engagement type. During the second import-scan, the finding was initially marked as mitigated but then reactivated immediately. As a result, the finding with the changed line of code remained open, although a comment about its auto-closure was added to the notes. However, when I performed a third import-scan, the finding was mitigated and subsequently disappeared from the active findings.

mtesauro · 2024-11-27T23:51:30Z

@farsheedify
So, a couple of things here:

Engagement type isn't relevant. The only real difference between the two is the extra repo related meta-data you can add to a CICD engagement. Engagement type doesn't change how findings are handled besides grouping 1+ test results into something that can be reported on.
You don't mention if you're doing imports or reimports - which you're doing is important. Imports are a one-time thing. Reimports require the scope of what's tested not to change and will automatically diff the current results with the previously ones.
Updating finding status for things like dedup, etc is an async process so things changing is expected.
It appears that line number is important in how you have deduplication configured. Look at the docs here. For Semgrep specifically, this is how dedup is handled

HTH

farsheedify · 2024-11-30T09:24:44Z

Thanks for your reply @mtesauro

As you can see in the curl command I mentioned in the first post, the request is import-scan. Please also look for the flags in that command (e.g., close_old_findings_product_scope=true). Additionally, all subsequent imports are also done with import-scan. The screenshots and status of findings are reviewed after the async deduplication process.
Additionally, since I use the Semgrep JSON Report, based on the documentation you provided, the deduplication is based on the unique id from the tool by default, which is the "fingerprint" parameter in the Semgrep report. Therefore, the line number should not affect the deduplication process, as we are not using the legacy deduplication algorithm that considers line numbers by default.
The original finding that is getting mitigated has the same fingerprint as the duplicated one. This is clearly true, and DD correctly marks it as a duplicate. However, it incorrectly mitigates the original finding, leaving the duplicate finding as "inactive, duplicate". We expect the original finding to remain "active, verified" and not "inactive, verified, mitigated", while the duplicate one, with the different line of code, should be marked as "inactive, duplicate".

I can provide more accurate test data/scenario if it can help.

mtesauro · 2024-12-03T21:47:01Z

@farsheedify This is super interesting.

Is it possible you can add a sanitized scan file that shows this behavior so we can reproduce the issue?

farsheedify · 2024-12-04T11:42:08Z

@farsheedify This is super interesting.

Is it possible you can add a sanitized scan file that shows this behavior so we can reproduce the issue?

Sure.

I ran the scan on some sample files. To reproduce the issue, first upload semgrep-report_Line31.json using the following curl command:

curl -X "POST" "https://demo.defectdojo.org/api/v2/import-scan/" -H "accept: application/json" -H "Authorization: Token ${{ secrets.DEFECTDOJO_TOKEN }}" -H "Content-Type: multipart/form-data" -F [email protected] -F "product_type_name=clubpay" -F "active=true" -F "verified=true" -F "close_old_findings=true" -F "engagement_name=${GITHUB_RUN_ID}(SAST)" -F "build_id=${GITHUB_RUN_ID}" -F "minimum_severity=Info" -F "close_old_findings_product_scope=true" -F "scan_date=$TODAY" -F "engagement_end_date=$TODAY" -F "commit_hash=${GITHUB_SHA}" -F "product_name=${product_name}" -F "auto_create_context=true" -F "scan_type=Semgrep JSON Report" -F "tags=${GITHUB_REF_NAME}" -F "source_code_management_uri=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}"

Next, upload semgrep-report_second_run_Line24.json using the same curl command. After the second upload, view all findings for the product. You will see two findings with the title:

python.lang.security.audit.dangerous-subprocess-use-tainted-env-args.dangerous-subprocess-use-tainted-env-args

The first finding (Original Finding) will be marked as "inactive, verified, mitigated" and is detected on sample/brute.py (Line 31). The second finding will be marked as "inactive, duplicate" and is detected on sample/brute.py (Line 24).
semgrep-report_Line31.json
semgrep-report_second_run_Line24.json

mtesauro · 2024-12-04T20:24:10Z

@farsheedify Thanks this helps a ton.

farsheedify added the bug label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

farsheedify commented Nov 9, 2024 •

edited

Loading

farsheedify commented Nov 26, 2024 •

edited

Loading

mtesauro commented Nov 27, 2024

farsheedify commented Nov 30, 2024 •

edited

Loading

mtesauro commented Dec 3, 2024

farsheedify commented Dec 4, 2024 •

edited

Loading

mtesauro commented Dec 4, 2024

Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

Comments

farsheedify commented Nov 9, 2024 • edited Loading

farsheedify commented Nov 26, 2024 • edited Loading

mtesauro commented Nov 27, 2024

farsheedify commented Nov 30, 2024 • edited Loading

mtesauro commented Dec 3, 2024

farsheedify commented Dec 4, 2024 • edited Loading

mtesauro commented Dec 4, 2024

farsheedify commented Nov 9, 2024 •

edited

Loading

farsheedify commented Nov 26, 2024 •

edited

Loading

farsheedify commented Nov 30, 2024 •

edited

Loading

farsheedify commented Dec 4, 2024 •

edited

Loading