Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance large scale mc-infra provisioning stability #1874

Merged
merged 3 commits into from
Oct 23, 2024

Conversation

seokho-son
Copy link
Member

@seokho-son seokho-son commented Oct 22, 2024

ref #1873

Enhance large scale mc-infra provisioning stability

  • 함께 병렬처리 되던 VM 오브젝트 생성과 VM 생성 요청/결과 처리 모듈을 분리하여, 안정성 향상
  • CSP API별 콜 리밋을 고려하여, CSP 별 100 VM 동시 실행의 sleep 타이밍 조정 (현재 VM 생성 요청간 1000ms 지연 추가)
  • 기타 중복된 DB 처리 구문 정리 등.

@seokho-son
Copy link
Member Author

image

Azure에서 오류 발생.

@seokho-son
Copy link
Member Author

seokho-son commented Oct 22, 2024

  • 참고: 대규모 VM 생성 후 삭제시, CB-Spider와 DB 싱크가 맞지 않는 문제가 발생하였음. (AWS VM 100 건 이상) VM이 삭제되지 않았으나, 삭제된 것으로 인지하게 되므로, 심도 있게 상황 파악 필요.

  • CB-Spider에서 정상 종료되지 않은 VM들을 DB상에서 삭제하고 종료된 것으로 처리한 것으로 보임. CSP에서 잘못된 응답을 준 것일 수도 있음. 내부 처리 로직 살펴봐야 할 수 있음.

@seokho-son
Copy link
Member Author

/approve

@github-actions github-actions bot added the approved This PR is approved and will be merged soon. label Oct 23, 2024
@cb-github-robot cb-github-robot merged commit 1d56aec into cloud-barista:main Oct 23, 2024
4 checks passed
@seokho-son
Copy link
Member Author

cb-tumblebug       | 2:03AM ERR src/core/infra/provisioning.go:1458 > error="Error from Spider while creating VM: [Error from: http://cb-spider:1024/spider/vm] Status code: 500 Internal Server Error, Message: {\"message\":\"Failed to Start VM. err = PUT https://management.azure.com/subscriptions/a20fed83-96bd-4480-92a9-140b8e3b7c3a/resourceGroups/koreacentral/providers/Microsoft.Compute/virtualMachines/csc5fpjhe00e2iiivjq0\\n--------------------------------------------------------------------------------\\nRESPONSE 409: 409 Conflict\\nERROR CODE: OperationNotAllowed\\n--------------------------------------------------------------------------------\\n{\\n  \\\"error\\\": {\\n    \\\"code\\\": \\\"OperationNotAllowed\\\",\\n    \\\"message\\\": \\\"Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: KoreaCentral, Current Limit: 50, Current Usage: 50, Additional Required: 1, (Minimum) New Limit Required: 51. Setup Alerts when Quota reaches threshold. Learn more at https://aka.ms/quotamonitoringalerting . Submit a request for Quota increase at https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%22subscriptionId%22:%22a20fed83-96bd-4480-92a9-140b8e3b7c3a%22,%22command%22:%22openQuotaApprovalBlade%22,%22quotas%22:[%7B%22location%22:%22KoreaCentral%22,%22providerId%22:%22Microsoft.Compute%22,%22resourceName%22:%22cores%22,%22quotaRequest%22:%7B%22properties%22:%7B%22limit%22:51,%22unit%22:%22Count%22,%22name%22:%7B%22value%22:%22cores%22%7D%7D%7D%7D]%7D by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests\\\"\\n  }\\n}\\n--------------------------------------------------------------------------------\\n, and Finished to rollback deleting\"}\n"

Azure 쿼터 리밋 이슈도 있음

@seokho-son
Copy link
Member Author

image

azure 기본 쿼터

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved This PR is approved and will be merged soon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants