Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard: Can't delete workspaces which failed to be created due to lack of memory on pod #8274

Open
felladrin opened this issue Feb 17, 2022 · 11 comments
Labels
meta: never-stale This issue can never become stale team: webapp Issue belongs to the WebApp team type: bug Something isn't working

Comments

@felladrin
Copy link
Contributor

felladrin commented Feb 17, 2022

Bug description

After trying to open https://github.com/gitpod-io/gitpod repository on Gitpod and failing due to lack of memory, the workspaces that failed to open (it failed two times, as you can see in the screenshot below) got stuck in the dashboard, with “Failed” status. Then when I click the Delete Workspace button, it's not deleting them (triggering the error in the screenshot below).

Screenshot 2022-02-16 at 11 14 26

image (2)

image (1)

Steps to reproduce

  • Try reproducing the OutOfMemory error by opening a big repository, like https://github.com/gitpod-io/gitpod
  • After it failed to create, go to your dashboard, click the three-dots icon, and select "Delete Workspace"
  • You'll see that nothing happens, and an error is displayed on Dev Tools Console.

Workspaces affected

  1. gitpodio-gitpod-wdvp0x65v3a
  2. gitpodio-gitpod-px58o9iw73v

Note: Both workspaces were automatically deleted by the garbage collector on 2022-03-02. [1]

Expected behavior

The workspace should disappear from my dashboard after clicking the Delete Workspace button.

Example repository

I can't share the workspace either. As it failed to be created, when I click the Share button, it triggers the following error:

image

Anything else?

No response

@felladrin felladrin added team: workspace Issue belongs to the Workspace team type: bug Something isn't working labels Feb 17, 2022
@sagor999
Copy link
Contributor

I think this is for @gitpod-io/engineering-webapp to handle this case appropriately.
I will remove workspace for now from this issue, but feel free to tag us in if needed.

@sagor999 sagor999 added team: webapp Issue belongs to the WebApp team and removed team: workspace Issue belongs to the Workspace team labels Feb 17, 2022
@geropl
Copy link
Member

geropl commented Feb 18, 2022

@felladrin How long did you wait between 1) workspace failed and 2) try to delete?
We do not allow any state-changing operation (e.g., delete) on workspaces that are still running. We rely on ws-manager to report the status, and terminate/delete workspace that failed. For rare cases where workspaces are "stuck in stopping", for example, we have timeouts: 1h in this case.

I will remove workspace for now from this issue

@sagor999 What did you do? Manipulate the DB? Or remove the pod from the k8s control plane? 🤔

@felladrin
Copy link
Contributor Author

@felladrin How long did you wait between 1) workspace failed and 2) try to delete?

It failed on Feb 16th at 11:15 AM, and I tried to delete it on Feb 17th at 12:38 PM, so a difference >25h.

I will remove workspace for now from this issue
@sagor999 What did you do? Manipulate the DB? Or remove the pod from the k8s control plane? 🤔

I believe @sagor999 was talking about removing the tag "team: workspace" (which I added when I created the issue) (and adding "team: webapp" in place of it) on this issue. Cause the workspaces records are still listed in my dashboard:

image

@geropl
Copy link
Member

geropl commented Feb 18, 2022

I believe @sagor999 was talking about removing the tag "team: workspace"

Ok. We'll need to investigate. 👍

@JanKoehnlein JanKoehnlein moved this to Scheduled in 🍎 WebApp Team Feb 18, 2022
@JanKoehnlein
Copy link
Contributor

Scheduled for investigation

@sagor999
Copy link
Contributor

@geropl I believe this happens when workspace never had a chance to actually start. Due to this out of memory error, pod was scheduled and ws-manager considered it to be started. But it was never actually started.
So when that happens, ws will be stuck in limbo like this.
For what it is worth, this PR should improve ws-manager handling of such edge cases from workspace point of view, but maybe from webapp you need to handle this as well.

@geropl
Copy link
Member

geropl commented Feb 18, 2022

I believe this happens when workspace never had a chance to actually start

💡 That is indeed the case: So far the (implicit) contract has been that once the StartWorkspace succeeded, we rely on updates from ws-manager. We already have a timeout for such cases; I wonder why it did not kick in (requires investigation I mentioned earlier).

@geropl
Copy link
Member

geropl commented Apr 21, 2022

This might be more common with an upcoming workspace PR (#9438 ), so we should prioritize this.

@stale
Copy link

stale bot commented Aug 11, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Aug 11, 2022
@j-elmer123
Copy link

this is also happened to me.

@NguyenCongVN
Copy link

Is there any workaround here? Kind of delete with notice? I experienced this yesterday

@stale stale bot removed the meta: stale This issue/PR is stale and will be closed soon label Aug 14, 2022
@axonasif axonasif added the meta: never-stale This issue can never become stale label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: never-stale This issue can never become stale team: webapp Issue belongs to the WebApp team type: bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

7 participants