Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR - Better PII classification for JSON data #17734

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

pmbrull
Copy link
Collaborator

@pmbrull pmbrull commented Sep 5, 2024

Describe your changes:

  1. Prepare base scanner to keep code organized
  2. Check data before parsing to recursively check if it's JSON/list and analyze it properly instead of as a single big string
  3. Better testing
  4. Rename SSN to US_SSN to match the NER model from presidio when tagging Entities in Collate

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copy link
Contributor

github-actions bot commented Sep 5, 2024

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

sonarcloud bot commented Sep 5, 2024

@pmbrull pmbrull merged commit 8191202 into open-metadata:main Sep 6, 2024
15 of 16 checks passed
hurongliang added a commit to hurongliang/OpenMetadata that referenced this pull request Sep 6, 2024
* main: (39 commits)
  MINOR - Better PII classification for JSON data (open-metadata#17734)
  New Email Templates (OSS) (open-metadata#17606)
  fix pom. (open-metadata#17682)
  GEN-1333 Add TS validation on DQ and Porfiler data ingestion (open-metadata#17731)
  make cost analysis as collate only (open-metadata#17719)
  Minor: remove unused dependency (open-metadata#17709)
  test: migrate login config to playwright (open-metadata#17740)
  minor(test): fix ingestion related flaky for aut (open-metadata#17727)
  fix expand all operation on terms page (open-metadata#17733)
  Docs: Updating the Image Reference for Bots (open-metadata#17736)
  fix ui freezing due to images in feed changes (open-metadata#17703)
  add links to menus (open-metadata#17659)
  supported followed data in Following widget using search api (open-metadata#17689)
  minor(ui): align dependency version to fix vulnerabilities (open-metadata#17729)
  Fixes some things on the APICollection (open-metadata#17704)
  DOCS - OSS deployment is flagged as Collate False (open-metadata#17722)
  minor: disable image upload support in description editor (open-metadata#17697)
  fix user spec flaky playwright test (open-metadata#17684)
  fetch domains before any widget is loaded (open-metadata#17695)
  minor(test): migrate persona spec (open-metadata#17701)
  ...
pmbrull added a commit that referenced this pull request Sep 6, 2024
* MINOR - Better PII classification for JSON data

* linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ingestion safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants