A package for performing Data Catalog operations on object storage solutions.
- 1. Environment setup
- 2. Create DataCatalog entries based on object storage files
- 3 Delete up object storage entries on entry group
- Disclaimers
git clone https://github.com/mesmacosta/datacatalog-object-storage-processor
cd datacatalog-object-storage-processor
- Data Catalog Admin
- Storage Admin or Custom Role with storage.buckets.list acl
./credentials/datacatalog-object-storage-processor-sa.json
Using virtualenv is optional, but strongly recommended unless you use Docker.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --upgrade --editable .
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-object-storage-processor-sa.json
Docker may be used as an alternative to run all the scripts. In this case, please disregard the Virtualenv install instructions.
- python
datacatalog-object-storage-processor \
object-storage create-entries --type cloud-storage \
--project-id my_project \
--entry-group-name my_entry_group_name \
--bucket-prefix my_bucket
- docker
docker build --rm --tag datacatalog-object-storage-processor .
docker run --rm --tty -v your_credentials_folder:/data datacatalog-object-storage-processor \
--type cloud-storage \
--project-id my_project \
--entry-group-name my_entry_group_name \
--bucket-prefix my_bucket
Delete entries for given entry group
datacatalog-object-storage-processor \
object-storage delete-entries --type cloud-storage \
--project-id my_project \
--entry-group-name my_entry_group_name
This is not an officially supported Google product.