Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of images during GC when Postgresql server runs out of connections #19401

Closed
dmitry-g opened this issue Sep 27, 2023 · 8 comments
Closed
Assignees

Comments

@dmitry-g
Copy link

Hello,

It has been the second time this month when we faced an issue with image loss during garbage collection. It probably has something to do with error handling, since the issue happens only when our database server runs out of connections.

Retention policy is configured as For the repositories matching **, retain the most recently pushed 30 artifacts with tags matching **, but garbage collector deletes the latest artifact, even regardless the fact that the repository had only 3 artifacts in total.

  1. Garbage collection starts at the defined schedule 01:00 UTC

  2. Postgresql database runs out of available connections sorry, too many clients already (SQLSTATE 53300)

  3. Instead of failing/retrying/handling the error, garbage collector choses to delete the latest artifact.

Also in the logs provided below you can see that tags of some artifacts have not been loaded, though all of them were actually tagged.

Image loss on 2023-09-06
2023-09-06T01:00:03Z [INFO] [/pkg/retention/job.go:86]: Run retention process.
 Repository: project-name/repository-name/image-name 
 Rule Algorithm: or 
 Dry Run: false
2023-09-06T01:00:03Z [INFO] [/pkg/retention/job.go:101]: Load 3 candidates from repository project-name/repository-name/image-name
2023-09-06T01:00:04Z [ERROR] [/pkg/retention/job.go:126]: Refresh quota error after deleting candidates, error: failed to connect to `host=db-host user=db-user database=db-name`: server error (FATAL: sorry, too many clients already (SQLSTATE 53300))
2023-09-06T01:00:04Z [INFO] [/pkg/retention/job.go:212]: 
|                                 Digest                                  | Tag | Kind  | Labels |     PushedTime      |     PulledTime      |     CreatedTime     | Retention |
|-------------------------------------------------------------------------|-----|-------|--------|---------------------|---------------------|---------------------|-----------|
| sha256:ea8cf3b41ddd07879296245f8fd66892cccf240a7fea6d6bedf01c519b224091 |     | image |        | 2023/09/04 07:14:10 | 2023/09/05 12:50:27 | 2023/09/04 07:14:10 | DEL       |
| sha256:76f8270aafef4cad4470e73a0a374f8367500c7fa0c02a036f17b5effa42dd64 |     | image |        | 2023/04/21 10:20:51 | 2023/09/05 11:15:07 | 2023/04/21 10:20:51 | ERR       |
| sha256:854540d08b4ee1886306408a3c7922e8883ba73d032aaa65639be5ef3062b255 |     | image |        | 2023/02/08 07:19:29 | 2023/09/05 11:40:14 | 2023/02/08 07:19:29 | ERR       |
2023-09-06T01:00:04Z [INFO] [/pkg/retention/job.go:217]: Retention error for artifact image:project-name/repository-name/image-name:sha256:76f8270aafef4cad4470e73a0a374f8367500c7fa0c02a036f17b5effa42dd64 : http error: code 500, message {"errors":[{"code":"UNKNOWN","message":"internal server error"}]}
2023-09-06T01:00:04Z [INFO] [/pkg/retention/job.go:217]: Retention error for artifact image:project-name/repository-name/image-name:sha256:854540d08b4ee1886306408a3c7922e8883ba73d032aaa65639be5ef3062b255 : http error: code 500, message {"errors":[{"code":"UNKNOWN","message":"internal server error"}]}
Image loss on 2023-09-26
2023-09-26T01:00:04Z [INFO] [/pkg/retention/job.go:86]: Run retention process.
 Repository: project-name/repository-name/image-name 
 Rule Algorithm: or 
 Dry Run: false
2023-09-26T01:00:04Z [INFO] [/pkg/retention/job.go:101]: Load 24 candidates from repository project-name/repository-name/image-name
2023-09-26T01:00:04Z [ERROR] [/pkg/retention/job.go:126]: Refresh quota error after deleting candidates, error: failed to connect to `host=db-host user=db-user database=db-name`: server error (FATAL: sorry, too many clients already (SQLSTATE 53300))
2023-09-26T01:00:04Z [INFO] [/pkg/retention/job.go:212]: 
|                                 Digest                                  |                      Tag                      | Kind  | Labels |     PushedTime      |     PulledTime      |     CreatedTime     | Retention |
|-------------------------------------------------------------------------|-----------------------------------------------|-------|--------|---------------------|---------------------|---------------------|-----------|
| sha256:2377acd27aa11388c1d13c046eb5740d1525862fba5f5700394ca38d1f931d30 |                                               | image |        | 2023/09/21 06:28:25 | 2023/09/25 10:35:52 | 2023/09/21 06:28:25 | DEL       |
| sha256:23fc415d40c9c54365429525ec142dc5208528729f1915fef4621050682c5e43 | 20230920123746-6ec444f                        | image |        | 2023/09/20 12:42:51 | 2023/09/20 13:38:02 | 2023/09/20 12:42:50 | RETAIN    |
| sha256:169ffb18477fdc6b904ce3108cc34201f53b60d8d293c4b5709d4e4fd32f9f30 | 20230830111140-09ddf85                        | image |        | 2023/08/30 11:13:44 | 2023/09/12 09:58:53 | 2023/08/30 11:13:43 | RETAIN    |
| sha256:396d366acbc29628a9b251f4273a24807edc80e47b9fa83c02494c8cdecbbef5 | 20230829104129-484ad66                        | image |        | 2023/08/29 10:44:37 | 2023/08/30 11:09:35 | 2023/08/29 10:44:37 | RETAIN    |
| sha256:151a701af2af5746c9e3b1046dbab2e8feef115943b0f9065a194a45f9b4655d | 20230828082509-de0e1c7                        | image |        | 2023/08/28 08:27:57 | 2023/09/12 13:30:09 | 2023/08/28 08:27:57 | RETAIN    |
| sha256:58cd81b3f451d544452171640bd970ce547be4ec8e6ec373e7a9727cb275718c | 20230712114656-bf45ea2                        | image |        | 2023/07/12 11:49:38 | 2023/08/02 08:24:46 | 2023/07/12 11:49:37 | RETAIN    |
| sha256:562f8312eca6fb9850facbb20e722d1cf068504294a87093ea687039108dc161 | 20230627122455-718177a                        | image |        | 2023/06/27 09:27:20 | 2023/08/23 11:14:40 | 2023/06/27 09:27:20 | RETAIN    |
| sha256:def0f6b5963214588744fa353c53bd0f14effdaf7917db44e7ec9de20f53c94c | 20230627090144-402ef95                        | image |        | 2023/06/27 06:03:42 | 2023/06/27 06:28:15 | 2023/06/27 06:03:42 | RETAIN    |
| sha256:b5f151c46887079af57c9c45a3342f343539f52cfe8cc035d1b05f07d07d3417 | 20230626155919-40848ca                        | image |        | 2023/06/26 13:02:00 | 2023/06/26 13:28:29 | 2023/06/26 13:01:59 | RETAIN    |
| sha256:26cacbfe8affc7f6c9ba13968ac63217c13179604a2836680e56bcaa4dfe2f92 | 20230623063723-5811551                        | image |        | 2023/06/23 06:39:55 | 2023/06/23 06:42:28 | 2023/06/23 06:39:54 | RETAIN    |
| sha256:3b27bd57219e74f8905dcb447c7dfac32281e43c4928d3d42c6243c129c7c0d4 | 20230623053542-6e8b9ca                        | image |        | 2023/06/23 05:37:18 | 2023/06/23 05:42:46 | 2023/06/23 05:37:18 | RETAIN    |
| sha256:3abd7b8fb65300334a794f8155c864a2d64474e6f25a4028b2a63d762fc01a5e | 20230622074235-d5dabfc                        | image |        | 2023/06/22 07:45:02 | 2023/06/23 11:56:39 | 2023/06/22 07:45:01 | RETAIN    |
| sha256:9d9ab931eda6855a8e54e2c8e8297c5f5c49b9e4f04e7d0d712b1984c781e192 | 20230621085258-24a3ce4                        | image |        | 2023/06/21 08:54:51 | 2023/06/22 06:59:22 | 2023/06/21 08:54:51 | RETAIN    |
| sha256:49a0fd403fd987ac9646ab67d17caa4611748acb0322af96af6e28457625f6f4 | 20230621070618-b65b952                        | image |        | 2023/06/21 07:08:41 | 2023/06/21 07:15:51 | 2023/06/21 07:08:40 | RETAIN    |
| sha256:44608aba9bebf8e190648ae0177fbf29be6d4475dcc08f901a651f4f9624f761 | 20230523065811-f9135b1                        | image |        | 2023/05/23 07:00:26 | 2023/09/25 10:35:57 | 2023/05/23 07:00:26 | RETAIN    |
| sha256:56d9cdbf1ee20fe3d594b8fd028b602e9b43d989820d609e93b6f0a83f563237 | 20230522131503-8550771                        | image |        | 2023/05/22 13:17:25 | 2023/05/23 06:10:07 | 2023/05/22 13:17:25 | RETAIN    |
| sha256:e91ec458de74878cc1c80e47e078726fc4591cd9f8fbfe34a24cbd869545fbee | 20230320131430-0477029,20230320123518-4c41b79 | image |        | 2023/03/20 13:16:05 | 2023/05/22 11:38:47 | 2023/03/20 12:37:33 | RETAIN    |
| sha256:759ceebb1c009b8ff3bf519591225449a3b4221a683de57aef68d3ca713ffcff | 20230320122819-4c41b79                        | image |        | 2023/03/20 12:30:45 |                     | 2023/03/20 12:30:44 | RETAIN    |
| sha256:9c31c9e15b30e128697481588e27688dfb846a4cc12299e9d7f1aaf4ec900707 | 20221219103658-c1a7d92                        | image |        | 2022/12/19 10:40:35 | 2023/04/27 07:06:58 | 2022/12/19 10:40:35 | RETAIN    |
| sha256:96e767796949fb1e4a204b1216739a9ccd01f3bcfd17d9f232fb19966d88201e | 20221012112122-c2b27a3                        | image |        | 2022/10/12 11:27:50 | 2022/12/15 09:52:25 | 2022/10/12 11:27:49 | RETAIN    |
| sha256:8fdedfb5f75b91f832ff75d5b21560d9261932eafcee6b8231a48d12fc4d8e30 | 20221012110709-833ec56                        | image |        | 2022/10/12 11:13:04 | 2022/10/12 11:18:55 | 2022/10/12 11:13:04 | RETAIN    |
| sha256:1c47d6ec85d9ce6a2f22db9dd2902b013d99b377637f2ec0aed0a82e1f090623 | 20220916060633-7da9b0d                        | image |        | 2022/10/02 12:51:13 | 2022/10/10 15:10:53 | 2022/10/02 12:51:13 | RETAIN    |
| sha256:8eba18556b74384fcfb92cfa112ed155a9058b0e37ac027c022429de64e047d8 | 20220718153133-b94053b                        | image |        | 2022/10/02 12:51:02 | 2023/06/29 11:54:33 | 2022/10/02 12:51:02 | RETAIN    |
| sha256:13da9e1e67fb4a663d3efa1f8e79c3e53592d04157b0d87af3d15ed2ea047de1 | 20220929122703-b9e5049                        | image |        | 2022/09/29 12:33:09 | 2023/04/24 11:29:42 | 2022/09/29 12:33:09 | RETAIN    |

Harbor version v2.8.1-48a2061d

Thank you

@chlins
Copy link
Member

chlins commented Oct 3, 2023

Is the postgres error too many clients already occurred persistently or only during the GC? the tag retention and GC are two different things, the execution of GC will not rely on the tag retention policy.

@dmitry-g
Copy link
Author

dmitry-g commented Oct 4, 2023

Hi @chlins, you are correct, I meant retention policies - the ones which are executed for the repositories of a project. For us too many clients issue happened only during retention policy execution, but this was due to misconfiguration on our side. I just wanted to report the issue with image loss during that period, I believe deletion of unrelated images is not the way which is supposed for error handling by design. Thanks

@wy65701436
Copy link
Contributor

can you share the gc log as well?

@dmitry-g
Copy link
Author

Hi @wy65701436, unfortunately GC logs have been already cleaned up

Copy link

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Dec 23, 2023
@wy65701436 wy65701436 removed the Stale label Jan 4, 2024
@chlins
Copy link
Member

chlins commented Feb 27, 2024

@dmitry-g Hi, do you also checked the untagged artifact checkbox for the retention policy?

Copy link

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Apr 27, 2024
Copy link

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
@Vad1mo Vad1mo mentioned this issue Aug 15, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Completed
Development

No branches or pull requests

4 participants