Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better errors for too large deployments #380

Merged
merged 8 commits into from
Jun 21, 2017

Conversation

kyleknap
Copy link
Contributor

The better errors consistent of:

  • Detecting errors that were likely because of chalice applications
    that were too large.
  • Indicating that it occurred during the lambda deployment phase
  • Indicating suggestions on how to fix it

In addition, added a unified ChaliceDeploymentError that can be
used to improve the error message when deploying a chalice
application for future enhancements.

Fixes #246

Here are some examples for the error messages:

  1. Connection getting aborted as chalice is sending data while Lambda cuts off the connection.
DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')). This is likely because the deployment package is 58.2 MB. Lambda only allows deployment packages that are 50.0 MB or less in size. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.
  1. Lambda returns an error saying the zipped deployment package is too large.
DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: An error occurred (RequestEntityTooLargeException) when calling the CreateFunction operation: Request must be smaller than 69905067 bytes for the CreateFunction operation. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.
  1. Lambda returns an error saying the contents of the unzipped deployment package is too large.
DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: An error occurred (InvalidParameterValueException) when calling the CreateFunction operation: Unzipped size must be smaller than 262144000 bytes. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.

kyleknap added 2 commits June 14, 2017 15:00
The better errors consistent of:
* Detecting errors that were likely because of chalice applications
  that were too large.
* Indicating that it occurred during the lambda deployment phase
* Indicating suggestions on how to fix it

In addition, added a unified ChaliceDeploymentError that can be
used to improve the error message when deploying a chalice
application for future enhancements.
This was done by checking that message was specifically
about not being able to assume and IAM role.
@kyleknap kyleknap requested review from jamesls and stealthycoin June 14, 2017 22:05
@codecov-io
Copy link

codecov-io commented Jun 14, 2017

Codecov Report

Merging #380 into master will increase coverage by 0.41%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #380      +/-   ##
==========================================
+ Coverage    91.2%   91.62%   +0.41%     
==========================================
  Files          18       18              
  Lines        2172     2280     +108     
  Branches      281      294      +13     
==========================================
+ Hits         1981     2089     +108     
  Misses        142      142              
  Partials       49       49
Impacted Files Coverage Δ
chalice/constants.py 100% <100%> (ø) ⬆️
chalice/deploy/deployer.py 87.98% <100%> (+2.64%) ⬆️
chalice/compat.py 72% <100%> (+7%) ⬆️
chalice/awsclient.py 90.8% <100%> (+1.76%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c947eb0...ce8af7e. Read the comment docs.

@jamesls
Copy link
Member

jamesls commented Jun 14, 2017

Haven't had a chance to look at the code yet, but some questions/feedback on the output:

  1. What do the error messages look like when it's not an exception caused by one of the cases you've explicitly accounted for? For example, write timeouts as shown here: Error on chalice deploy: "The write operation timed out" #330. Do we still say you should reduce your deployment package size?
  2. I think it's more common to just have ERROR - then your message. The deployment context is inferred given you run chalice deploy.
  3. I'd prefer something other than "chalice handler", it's vague to me. Maybe something like "chalice API handler". That way builtin auth could say "chalice auth handler".
  4. The line lengths are really long (you have to scroll horizontally to read the only thing). I'd prefer if we added line breaks and added a nice multi line error message.
  5. Can we clean up the first error message? I realize this comes from requests, but it just seems odd that only the first error message has this tuple format: received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')).

I'll try to look at the code tomorrow.

Copy link
Contributor

@stealthycoin stealthycoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple comments/questions:

  • What do these errors look like in a short terminal?
  • Can we clean up the request error with the tuple in it a little bit?

@kyleknap
Copy link
Contributor Author

kyleknap commented Jun 15, 2017

Just answering some questions:

What do the error messages look like when it's not an exception caused by one of the cases you've explicitly accounted for? For example, write timeouts as shown here: #330. Do we still say you should reduce your deployment package size?

If there is no context for the error, it will currently show up as the string of the raised exception for any deployment error:

DEPLOYMENT ERROR - <str(Exception)>

In terms of write timeouts, receiving that message is entirely determined on if the deployment package is larger than 50MB. If it is, you get an error message similar to the first option. If not you will get something like this:

DEPLOYMENT ERROR - ('Connection aborted.', timeout('The write operation timed out',))

I think it's more common to just have ERROR - then your message. The deployment context is inferred given you run chalice deploy.

That's fine with me.

I'd prefer something other than "chalice handler", it's vague to me. Maybe something like "chalice API handler". That way builtin auth could say "chalice auth handler".

That's fair. I was phrasing it based on the current feature state of chalice as there is no mult-handler support currently. I could get that to work currently, but I would be interested to see how it will all fit in when auth handlers get introduced. There will probably need to be reworking of where the errors get thrown as the deployment errors are raised from the Deployer.deploy method.

The line lengths are really long (you have to scroll horizontally to read the only thing). I'd prefer if we added line breaks and added a nice multi line error message.

I was thinking about that too. It is not too bad on the terminal as it wraps around. It just does not look great on GitHub. How would this format sound:

ERROR - 
   LOCATION: Deploying the chalice API handler
   EXCEPTION RAISED: <str(Exception)>
   SUGGESTION: To resolve this issue, do the following...

I could even have it print out in red or something to make it easier to understand all of the related information about the exception.

Can we clean up the first error message? I realize this comes from requests, but it just seems odd that only the first error message has this tuple format: received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')).

It may be tricky to see if there is any attributes on requests.ConnectionError that I can use. The ConnectionError subclasses from an IOError. Based on the Python docs I could maybe inspect the args property and try to pull information out of the tuple but I am not sure how well I will be able to reformat into a better looking looking sentence or if I would be making inappropriate assumptions in what the args will be in the tuple. The second option would be to provide your own error message for ConnectionErrors. I am hesitant to do that now as it is nice to see what the exact error we are getting back so it is possible to trace the source of the issue. If we had a --debug mode, I think it would be easier to make the case for providing your own message for a connection error, but until then I am not sure if we would want to do that.

@kyleknap
Copy link
Contributor Author

@stealthycoin

To answer your questions:

What do these errors look like in a short terminal?
The text just wraps to make a taller paragraph.

Can we clean up the request error with the tuple in it a little bit?
See my last paragraph in the previous response. I feel that if we move to the format that I proposed in the answering 4th question, it might remove the need to clean it up as we would be preserving the exact exception that got raised.

@jamesls
Copy link
Member

jamesls commented Jun 15, 2017

@kyleknap I don't think preserving the original exception is useful from a user's perspective. Sure we can keep it in debug logs once we add those, but seeing ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')). is confusing for end users. We've had so many people (not just in chalice) asking about these broken pipe errors that I'm convinced it does more harm than good.

The multi-line format you suggest seems a bit much. I was just asking you to put line breaks so it's not a super long single line error message.

Also, for the first question, it looks like there's a missing gap. Saying there's a write timeout without actually saying what we were writing when we timed out is going to confuse users as see in the issue I linked.

@kyleknap
Copy link
Contributor Author

@jamesls One thing that I can do to break it up is add a newline between the exception and the suggestion on how to fix it. Would that help? There is really only three possible logical components of the error: the location of where the error occurred, the error message, and the suggestion on how to resolve. Currently the location and the error message are combined into one sentence.

On the: We've had so many people (not just in chalice) asking about these broken pipe errors that I'm convinced it does more harm than good.

Does it make sense to add improved messaging to botocore then? If I was to do this in chalice, I would prefer we improve the error message for any client call by at least adding information on what operation threw the ConnectionError as a ConnectionError technically could happen for any client call and the error message will still be very bad. Also currently, it is only somewhat manageable/understandable for botocore/boto3/awscli because we have logging and tracebacks enabled.

@kyleknap
Copy link
Contributor Author

I'm still looking on how to improve the connection error message part, but how about something like this?
screen shot 2017-06-15 at 12 53 54 pm

@jamesls
Copy link
Member

jamesls commented Jun 16, 2017

Please avoid using colors for now. I'm not against it but I don't think we should start with this change. Let's just focus on improving the error messages themselves for now.

@jamesls
Copy link
Member

jamesls commented Jun 16, 2017

Does it make sense to add improved messaging to botocore then?

No I don't think so. Going into botocore would require we could generalize a broken pipe error and why it happened. We can only do that here because we know the exact API call and common reasons why it can fail.

I strongly suggest to just keep the scope small. This is just about improving error messages, and I'd rather move on to other features once we've done that.

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the error messages we talked about offline:

ERROR - While sending your chalice handler code to Lambda, received the following error:

  Connection aborted, the Lambda service closed the connection before we finished sending all your data.

This is likely because the deployment package is 58.2 MB. Lambda only allows
deployment packages that are 50.0 MB or less in size. To avoid this error,
decrease the size of your chalice application by removing code or removing
dependencies from your chalice application.
ERROR - While sending your chalice handler code to Lambda, received the following error:

  Connection aborted, timed out sending your app to Lambda.

if attempts >= self.LAMBDA_CREATE_ATTEMPTS:
message = e.response['Error'].get('Message', '')
retryable_message = (
'The role defined for the function cannot be assumed '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to relax the matching here. Either partial string match or regex matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to just check to see if the string 'role' is included in the error message if that is fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something more robust than that. If the function name can ever be in the error message and that name contains role you'll get a false positive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the literal error message, you will get for both create and update is:

The role defined for the function cannot be assumed by Lambda.

It does not include the name of the function currently. I can check for 'role' and 'assumed by Lambda'? Or just 'assumed by Lambda'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean in general for InvalidParameterValueException, which is the code block this is in. This a) applies to more than just the role not being able to be assumed and b) has no guarantees about the message changing in the future. Something like role.*cannot be assumed seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I can do that.

# type: (Any) -> ChaliceDeploymentError
suggestion = None
where = None
if isinstance(error, DeploymentPackageTooLargeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this out into the ChaliceDeploymentError class. We should want the deployer to only be about deploying resources and move the logic of helpful error messages out to a separate class.

# aborts the connection as chalice is still sending it
# data
additional_details = (
'This is likely because the deployment package is %s. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this up the stack and have stuff in awsclient just collect all the raw data, deployment package size, original exceptions, etc. We'd then have the Exception object assemble the suggestions and pretty print the error messages to the user.

return self._call_client_method_with_retries(
self._client('lambda').create_function, kwargs)['FunctionArn']
except Exception as e:
self._raise_large_deployment_related_errors(e, len(zip_contents))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating some lambda specific error so pieces higher up the stack know that we failed in the lambda function.

kyleknap added 2 commits June 19, 2017 16:02
Specifically made the following updates:
* Bubble the DeploymentPackageTooLargeError logic to the
  ChaliceDeploymentError instead of having it in Deployer
* Wrap the error text to 79 characters
* Indent the exact error message
Specifically did the following:
* Created a general LambdaClientError to help indicate
  where an error takes place
* Created a LambdaErrorContext to store information about
  the error from lambda
* Introspect the ConnectionErrors from Lambda to create a more
  user friendly message about broken pipes and timeouts
@kyleknap
Copy link
Contributor Author

@jamesls I updated. I think I got everything we talked about. Should be good to look at again.

@jamesls
Copy link
Member

jamesls commented Jun 20, 2017

@kyleknap Could you show what the updated sample error messages look like?

@kyleknap
Copy link
Contributor Author

@jamesls Here are some examples:

Too big when unzipped:

ERROR - While sending your chalice handler code to Lambda function "bigapp-
dev", received the following error:

 An error occurred (InvalidParameterValueException) when calling the 
 CreateFunction operation: Unzipped size must be smaller than 262144000 bytes

To avoid this error, decrease the size of your chalice application by removing 
code or removing dependencies from your chalice application.

Connection aborted:

ERROR - While sending your chalice handler code to Lambda function 
"bigappv2-dev", received the following error:

 Connection aborted. Lambda closed the connection before chalice finished 
 sending all of the data.

This is likely because the deployment package is 58.2 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

Timeout:

ERROR - While sending your chalice handler code to Lambda function 
"bigappv2-dev", received the following error:

 Connection aborted. Timed out sending your app to Lambda.

This is likely because the deployment package is 58.2 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

Zip too large (connection not aborted)

ERROR - While sending your chalice handler code to Lambda function 
"bigappv3-dev", received the following error:

 An error occurred (RequestEntityTooLargeException) when calling the 
 CreateFunction operation: Request must be smaller than 69905067 bytes for the 
 CreateFunction operation

This is likely because the deployment package is 50.9 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

@jamesls
Copy link
Member

jamesls commented Jun 20, 2017

Cool, looks better. The first sentence is a little confusing, we'll not really sending our code to lambda function '{our function}'. Something like "while sending your chalice handler code to Lambda to create the function "function-name", we received the following error. Or something along those lines.

Looking over the code now.

@@ -37,6 +42,28 @@ class ResourceDoesNotExistError(Exception):
pass


class LambdaClientError(Exception):
def __init__(self, original_error, context):
# type: (Any, LambdaErrorContext) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to avoid Any unless there's really not a better way. Can this not be Exception or something more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check but I believe it was mypy complaining on accessing attributes that are not on Exception such as context.

try:
return self._call_client_method_with_retries(
self._client('lambda').create_function, kwargs)['FunctionArn']
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is right. We're catching all exceptions and casting them down into a more specific exception (LambdaClientError). If there is actually a bug or something exceptional that happens in the code I'd want the original stack trace to propagate up, not be masked by LambdaClientError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline: Decided to just catch botocore.exceptions.ClientError and RequestsConnectionError for this.

# We're assuming that if we receive an
# InvalidParameterValueException, it's because
# the role we just created can't be used by
# Lambda so retry until it can be.
self._sleep(self.DELAY_TIME)
attempts += 1
if attempts >= self.LAMBDA_CREATE_ATTEMPTS:
message = e.response['Error'].get('Message', '')
# Do not retry if exceeded retries or does not have anything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a style nitpick, but I have said this several times before. Rather than write a comment explaining what the code is doing, just put it in a separate function and omit the comment:

if self._is_iam_role_related_error(e):
    raise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# data
error_cls = DeploymentPackageTooLargeError
elif isinstance(error, ClientError):
code = error.response['Error'].get('Code')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will default to None which will crash in your __contains__ call on L169. These should default to str types, probably just the empty string.

try:
return_value = lambda_client.update_function_code(
FunctionName=function_name, ZipFile=zip_contents)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, we should really only have except Exceptions and the top most layer of the code. Restrict this to the subset of exceptions we care about.

Copy link
Contributor Author

@kyleknap kyleknap Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline: Catch RequestsConnectionError and botocore.exceptions.ClientError here instead.

@@ -169,6 +177,104 @@ def _validate_manage_iam_role(config):
)


class ChaliceDeploymentError(Exception):
def __init__(self, error):
# type: (Any) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to avoid Any unless it's really not practical. This applies to all the Any's in this class.

# is a socket.error that has the message 'Broken pipe' in it. So we
# don't want to be assuming all socket.error are broken pipes so just
# check if the message has 'Broken pipe' in it.
if 'Broken pipe' in str(underlying_error):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more appropriately handled in compat.py. Here we actually have a solid isinstance way of checking for broken pipe in python3, and it seems to make sense to use it if possible rather than rely on __contains__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -206,6 +312,13 @@ def deploy(self, config, chalice_stage_name=DEFAULT_STAGE_NAME):
rest api, lambda function, role, etc.

"""
try:
return self._do_deploy(config, chalice_stage_name)
except Exception as error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about except Exception. This would be fine in say cli/__init__.py, but not in the deployer module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline: Catch LambdaClientError and botocore.exceptions.ClientError here instead.

msg += self._wrap_text(self._get_error_message(error), indent=' ')
msg += '\n\n'
suggestion = self._get_error_suggestion(error)
if suggestion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if suggestion is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not matter if the check changes here. The output will stay the same, but sure I can update.

@@ -475,6 +491,75 @@ def test_can_provide_tags(self, stubbed_session):
handler='app.app') == 'arn:12345:name'
stubbed_session.verify_stubs()

def test_raises_large_deployment_error_for_connection_error(
self, mock_lambda_client):
mock_session = mock.Mock(botocore.session.Session)
Copy link
Member

@jamesls jamesls Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this?? There's an existing way to test this, which is using the StubbedSession.

If the necessary abstractions aren't in place in StubbedSession, I think it's more appropriate to add to them rather than work around them by bypassing the StubbedSession and dropping down to botocore sessions directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline: Going to subclass this from the botocore.stub.Stubber and add todo to port back to botocore.

kyleknap added 3 commits June 21, 2017 09:02
Consists of:

* Catching more specific exceptions on awsclient and Deployer
* Make compat function for checking broken pipes
* Be more specific on Any on exception type checks
* Other smaller updates
Subclassed botocore.stub.Stubber to allow for raising custom
errors that are not ClientErrors.
Include information on whether it was an create or update when the
code is being sent to Lambda.
@kyleknap
Copy link
Contributor Author

@jamesls I updated based on feedback. It should be good to look at again.

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.


def is_broken_pipe_error(error):
# type: (Exception) -> bool
return isinstance(error, BrokenPipeError) # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is # noqa needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylint on python2 yelled at me for having an unknown variable name of BrokenPipeError as it only exists in Python3. Is there a better way to handle this? It seemed that using #noqa is the precedent that has been set in previous places for pylint errors due to need for type hinting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that's the case? Pylint doesn't honor # noqa AFAIK. That's for pyflakes/flake8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you are right. I misspoke. It was for flake8.

@kyleknap kyleknap merged commit 9f6705a into aws:master Jun 21, 2017
@kyleknap kyleknap deleted the deploy-err-msg branch June 21, 2017 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'Connection aborted.', error(32, 'Broken pipe' when uploading ImageMagick
4 participants