-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better errors for too large deployments #380
Conversation
The better errors consistent of: * Detecting errors that were likely because of chalice applications that were too large. * Indicating that it occurred during the lambda deployment phase * Indicating suggestions on how to fix it In addition, added a unified ChaliceDeploymentError that can be used to improve the error message when deploying a chalice application for future enhancements.
This was done by checking that message was specifically about not being able to assume and IAM role.
Codecov Report
@@ Coverage Diff @@
## master #380 +/- ##
==========================================
+ Coverage 91.2% 91.62% +0.41%
==========================================
Files 18 18
Lines 2172 2280 +108
Branches 281 294 +13
==========================================
+ Hits 1981 2089 +108
Misses 142 142
Partials 49 49
Continue to review full report at Codecov.
|
Haven't had a chance to look at the code yet, but some questions/feedback on the output:
I'll try to look at the code tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple comments/questions:
- What do these errors look like in a short terminal?
- Can we clean up the request error with the tuple in it a little bit?
Just answering some questions:
If there is no context for the error, it will currently show up as the string of the raised exception for any deployment error:
In terms of write timeouts, receiving that message is entirely determined on if the deployment package is larger than 50MB. If it is, you get an error message similar to the first option. If not you will get something like this:
That's fine with me.
That's fair. I was phrasing it based on the current feature state of chalice as there is no mult-handler support currently. I could get that to work currently, but I would be interested to see how it will all fit in when auth handlers get introduced. There will probably need to be reworking of where the errors get thrown as the deployment errors are raised from the
I was thinking about that too. It is not too bad on the terminal as it wraps around. It just does not look great on GitHub. How would this format sound:
I could even have it print out in red or something to make it easier to understand all of the related information about the exception.
It may be tricky to see if there is any attributes on |
To answer your questions: What do these errors look like in a short terminal? Can we clean up the request error with the tuple in it a little bit? |
@kyleknap I don't think preserving the original exception is useful from a user's perspective. Sure we can keep it in debug logs once we add those, but seeing The multi-line format you suggest seems a bit much. I was just asking you to put line breaks so it's not a super long single line error message. Also, for the first question, it looks like there's a missing gap. Saying there's a write timeout without actually saying what we were writing when we timed out is going to confuse users as see in the issue I linked. |
@jamesls One thing that I can do to break it up is add a newline between the exception and the suggestion on how to fix it. Would that help? There is really only three possible logical components of the error: the location of where the error occurred, the error message, and the suggestion on how to resolve. Currently the location and the error message are combined into one sentence. On the: We've had so many people (not just in chalice) asking about these broken pipe errors that I'm convinced it does more harm than good. Does it make sense to add improved messaging to botocore then? If I was to do this in chalice, I would prefer we improve the error message for any client call by at least adding information on what operation threw the ConnectionError as a ConnectionError technically could happen for any client call and the error message will still be very bad. Also currently, it is only somewhat manageable/understandable for botocore/boto3/awscli because we have logging and tracebacks enabled. |
Please avoid using colors for now. I'm not against it but I don't think we should start with this change. Let's just focus on improving the error messages themselves for now. |
No I don't think so. Going into botocore would require we could generalize a broken pipe error and why it happened. We can only do that here because we know the exact API call and common reasons why it can fail. I strongly suggest to just keep the scope small. This is just about improving error messages, and I'd rather move on to other features once we've done that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the error messages we talked about offline:
ERROR - While sending your chalice handler code to Lambda, received the following error:
Connection aborted, the Lambda service closed the connection before we finished sending all your data.
This is likely because the deployment package is 58.2 MB. Lambda only allows
deployment packages that are 50.0 MB or less in size. To avoid this error,
decrease the size of your chalice application by removing code or removing
dependencies from your chalice application.
ERROR - While sending your chalice handler code to Lambda, received the following error:
Connection aborted, timed out sending your app to Lambda.
chalice/awsclient.py
Outdated
if attempts >= self.LAMBDA_CREATE_ATTEMPTS: | ||
message = e.response['Error'].get('Message', '') | ||
retryable_message = ( | ||
'The role defined for the function cannot be assumed ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to relax the matching here. Either partial string match or regex matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to just check to see if the string 'role' is included in the error message if that is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer something more robust than that. If the function name can ever be in the error message and that name contains role
you'll get a false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the literal error message, you will get for both create and update is:
The role defined for the function cannot be assumed by Lambda.
It does not include the name of the function currently. I can check for 'role' and 'assumed by Lambda'? Or just 'assumed by Lambda'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean in general for InvalidParameterValueException, which is the code block this is in. This a) applies to more than just the role not being able to be assumed and b) has no guarantees about the message changing in the future. Something like role.*cannot be assumed
seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I can do that.
chalice/deploy/deployer.py
Outdated
# type: (Any) -> ChaliceDeploymentError | ||
suggestion = None | ||
where = None | ||
if isinstance(error, DeploymentPackageTooLargeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this out into the ChaliceDeploymentError
class. We should want the deployer to only be about deploying resources and move the logic of helpful error messages out to a separate class.
chalice/awsclient.py
Outdated
# aborts the connection as chalice is still sending it | ||
# data | ||
additional_details = ( | ||
'This is likely because the deployment package is %s. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this up the stack and have stuff in awsclient just collect all the raw data, deployment package size, original exceptions, etc. We'd then have the Exception object assemble the suggestions and pretty print the error messages to the user.
chalice/awsclient.py
Outdated
return self._call_client_method_with_retries( | ||
self._client('lambda').create_function, kwargs)['FunctionArn'] | ||
except Exception as e: | ||
self._raise_large_deployment_related_errors(e, len(zip_contents)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating some lambda specific error so pieces higher up the stack know that we failed in the lambda function.
Specifically made the following updates: * Bubble the DeploymentPackageTooLargeError logic to the ChaliceDeploymentError instead of having it in Deployer * Wrap the error text to 79 characters * Indent the exact error message
Specifically did the following: * Created a general LambdaClientError to help indicate where an error takes place * Created a LambdaErrorContext to store information about the error from lambda * Introspect the ConnectionErrors from Lambda to create a more user friendly message about broken pipes and timeouts
@jamesls I updated. I think I got everything we talked about. Should be good to look at again. |
@kyleknap Could you show what the updated sample error messages look like? |
@jamesls Here are some examples: Too big when unzipped:
Connection aborted:
Timeout:
Zip too large (connection not aborted)
|
Cool, looks better. The first sentence is a little confusing, we'll not really sending our code to lambda function '{our function}'. Something like "while sending your chalice handler code to Lambda to create the function "function-name", we received the following error. Or something along those lines. Looking over the code now. |
chalice/awsclient.py
Outdated
@@ -37,6 +42,28 @@ class ResourceDoesNotExistError(Exception): | |||
pass | |||
|
|||
|
|||
class LambdaClientError(Exception): | |||
def __init__(self, original_error, context): | |||
# type: (Any, LambdaErrorContext) -> None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to avoid Any
unless there's really not a better way. Can this not be Exception
or something more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check but I believe it was mypy complaining on accessing attributes that are not on Exception such as context.
chalice/awsclient.py
Outdated
try: | ||
return self._call_client_method_with_retries( | ||
self._client('lambda').create_function, kwargs)['FunctionArn'] | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is right. We're catching all exceptions and casting them down into a more specific exception (LambdaClientError
). If there is actually a bug or something exceptional that happens in the code I'd want the original stack trace to propagate up, not be masked by LambdaClientError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline: Decided to just catch botocore.exceptions.ClientError and RequestsConnectionError for this.
chalice/awsclient.py
Outdated
# We're assuming that if we receive an | ||
# InvalidParameterValueException, it's because | ||
# the role we just created can't be used by | ||
# Lambda so retry until it can be. | ||
self._sleep(self.DELAY_TIME) | ||
attempts += 1 | ||
if attempts >= self.LAMBDA_CREATE_ATTEMPTS: | ||
message = e.response['Error'].get('Message', '') | ||
# Do not retry if exceeded retries or does not have anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a style nitpick, but I have said this several times before. Rather than write a comment explaining what the code is doing, just put it in a separate function and omit the comment:
if self._is_iam_role_related_error(e):
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
chalice/awsclient.py
Outdated
# data | ||
error_cls = DeploymentPackageTooLargeError | ||
elif isinstance(error, ClientError): | ||
code = error.response['Error'].get('Code') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will default to None
which will crash in your __contains__
call on L169. These should default to str
types, probably just the empty string.
chalice/awsclient.py
Outdated
try: | ||
return_value = lambda_client.update_function_code( | ||
FunctionName=function_name, ZipFile=zip_contents) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing, we should really only have except Exceptions
and the top most layer of the code. Restrict this to the subset of exceptions we care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline: Catch RequestsConnectionError and botocore.exceptions.ClientError here instead.
chalice/deploy/deployer.py
Outdated
@@ -169,6 +177,104 @@ def _validate_manage_iam_role(config): | |||
) | |||
|
|||
|
|||
class ChaliceDeploymentError(Exception): | |||
def __init__(self, error): | |||
# type: (Any) -> None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to avoid Any
unless it's really not practical. This applies to all the Any
's in this class.
chalice/deploy/deployer.py
Outdated
# is a socket.error that has the message 'Broken pipe' in it. So we | ||
# don't want to be assuming all socket.error are broken pipes so just | ||
# check if the message has 'Broken pipe' in it. | ||
if 'Broken pipe' in str(underlying_error): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more appropriately handled in compat.py
. Here we actually have a solid isinstance
way of checking for broken pipe in python3, and it seems to make sense to use it if possible rather than rely on __contains__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
chalice/deploy/deployer.py
Outdated
@@ -206,6 +312,13 @@ def deploy(self, config, chalice_stage_name=DEFAULT_STAGE_NAME): | |||
rest api, lambda function, role, etc. | |||
|
|||
""" | |||
try: | |||
return self._do_deploy(config, chalice_stage_name) | |||
except Exception as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing about except Exception
. This would be fine in say cli/__init__.py
, but not in the deployer module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline: Catch LambdaClientError and botocore.exceptions.ClientError here instead.
chalice/deploy/deployer.py
Outdated
msg += self._wrap_text(self._get_error_message(error), indent=' ') | ||
msg += '\n\n' | ||
suggestion = self._get_error_suggestion(error) | ||
if suggestion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if suggestion is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not matter if the check changes here. The output will stay the same, but sure I can update.
tests/functional/test_awsclient.py
Outdated
@@ -475,6 +491,75 @@ def test_can_provide_tags(self, stubbed_session): | |||
handler='app.app') == 'arn:12345:name' | |||
stubbed_session.verify_stubs() | |||
|
|||
def test_raises_large_deployment_error_for_connection_error( | |||
self, mock_lambda_client): | |||
mock_session = mock.Mock(botocore.session.Session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this?? There's an existing way to test this, which is using the StubbedSession.
If the necessary abstractions aren't in place in StubbedSession, I think it's more appropriate to add to them rather than work around them by bypassing the StubbedSession and dropping down to botocore sessions directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline: Going to subclass this from the botocore.stub.Stubber and add todo to port back to botocore.
Consists of: * Catching more specific exceptions on awsclient and Deployer * Make compat function for checking broken pipes * Be more specific on Any on exception type checks * Other smaller updates
Subclassed botocore.stub.Stubber to allow for raising custom errors that are not ClientErrors.
Include information on whether it was an create or update when the code is being sent to Lambda.
@jamesls I updated based on feedback. It should be good to look at again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
||
def is_broken_pipe_error(error): | ||
# type: (Exception) -> bool | ||
return isinstance(error, BrokenPipeError) # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is # noqa
needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint on python2 yelled at me for having an unknown variable name of BrokenPipeError as it only exists in Python3. Is there a better way to handle this? It seemed that using #noqa
is the precedent that has been set in previous places for pylint errors due to need for type hinting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that's the case? Pylint doesn't honor # noqa
AFAIK. That's for pyflakes/flake8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you are right. I misspoke. It was for flake8.
The better errors consistent of:
that were too large.
In addition, added a unified ChaliceDeploymentError that can be
used to improve the error message when deploying a chalice
application for future enhancements.
Fixes #246
Here are some examples for the error messages: