Add better errors for too large deployments #380

kyleknap · 2017-06-14T22:05:32Z

The better errors consistent of:

Detecting errors that were likely because of chalice applications
that were too large.
Indicating that it occurred during the lambda deployment phase
Indicating suggestions on how to fix it

In addition, added a unified ChaliceDeploymentError that can be
used to improve the error message when deploying a chalice
application for future enhancements.

Fixes #246

Here are some examples for the error messages:

Connection getting aborted as chalice is sending data while Lambda cuts off the connection.

DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')). This is likely because the deployment package is 58.2 MB. Lambda only allows deployment packages that are 50.0 MB or less in size. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.

Lambda returns an error saying the zipped deployment package is too large.

DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: An error occurred (RequestEntityTooLargeException) when calling the CreateFunction operation: Request must be smaller than 69905067 bytes for the CreateFunction operation. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.

Lambda returns an error saying the contents of the unzipped deployment package is too large.

DEPLOYMENT ERROR - While sending your chalice handler code to Lambda, received the following error: An error occurred (InvalidParameterValueException) when calling the CreateFunction operation: Unzipped size must be smaller than 262144000 bytes. To avoid this error, decrease the size of your chalice application by removing code or removing dependencies from your chalice application.

The better errors consistent of: * Detecting errors that were likely because of chalice applications that were too large. * Indicating that it occurred during the lambda deployment phase * Indicating suggestions on how to fix it In addition, added a unified ChaliceDeploymentError that can be used to improve the error message when deploying a chalice application for future enhancements.

This was done by checking that message was specifically about not being able to assume and IAM role.

codecov-io · 2017-06-14T22:09:48Z

Codecov Report

Merging #380 into master will increase coverage by 0.41%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #380      +/-   ##
==========================================
+ Coverage    91.2%   91.62%   +0.41%     
==========================================
  Files          18       18              
  Lines        2172     2280     +108     
  Branches      281      294      +13     
==========================================
+ Hits         1981     2089     +108     
  Misses        142      142              
  Partials       49       49

Impacted Files	Coverage Δ
chalice/constants.py	`100% <100%> (ø)`	⬆️
chalice/deploy/deployer.py	`87.98% <100%> (+2.64%)`	⬆️
chalice/compat.py	`72% <100%> (+7%)`	⬆️
chalice/awsclient.py	`90.8% <100%> (+1.76%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c947eb0...ce8af7e. Read the comment docs.

jamesls · 2017-06-14T23:39:00Z

Haven't had a chance to look at the code yet, but some questions/feedback on the output:

What do the error messages look like when it's not an exception caused by one of the cases you've explicitly accounted for? For example, write timeouts as shown here: Error on chalice deploy: "The write operation timed out" #330. Do we still say you should reduce your deployment package size?
I think it's more common to just have ERROR - then your message. The deployment context is inferred given you run chalice deploy.
I'd prefer something other than "chalice handler", it's vague to me. Maybe something like "chalice API handler". That way builtin auth could say "chalice auth handler".
The line lengths are really long (you have to scroll horizontally to read the only thing). I'd prefer if we added line breaks and added a nice multi line error message.
Can we clean up the first error message? I realize this comes from requests, but it just seems odd that only the first error message has this tuple format: received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')).

I'll try to look at the code tomorrow.

stealthycoin

Just a couple comments/questions:

What do these errors look like in a short terminal?
Can we clean up the request error with the tuple in it a little bit?

kyleknap · 2017-06-15T17:44:32Z

Just answering some questions:

What do the error messages look like when it's not an exception caused by one of the cases you've explicitly accounted for? For example, write timeouts as shown here: #330. Do we still say you should reduce your deployment package size?

If there is no context for the error, it will currently show up as the string of the raised exception for any deployment error:

DEPLOYMENT ERROR - <str(Exception)>

In terms of write timeouts, receiving that message is entirely determined on if the deployment package is larger than 50MB. If it is, you get an error message similar to the first option. If not you will get something like this:

DEPLOYMENT ERROR - ('Connection aborted.', timeout('The write operation timed out',))

I think it's more common to just have ERROR - then your message. The deployment context is inferred given you run chalice deploy.

That's fine with me.

I'd prefer something other than "chalice handler", it's vague to me. Maybe something like "chalice API handler". That way builtin auth could say "chalice auth handler".

That's fair. I was phrasing it based on the current feature state of chalice as there is no mult-handler support currently. I could get that to work currently, but I would be interested to see how it will all fit in when auth handlers get introduced. There will probably need to be reworking of where the errors get thrown as the deployment errors are raised from the Deployer.deploy method.

The line lengths are really long (you have to scroll horizontally to read the only thing). I'd prefer if we added line breaks and added a nice multi line error message.

I was thinking about that too. It is not too bad on the terminal as it wraps around. It just does not look great on GitHub. How would this format sound:

ERROR - 
   LOCATION: Deploying the chalice API handler
   EXCEPTION RAISED: <str(Exception)>
   SUGGESTION: To resolve this issue, do the following...

I could even have it print out in red or something to make it easier to understand all of the related information about the exception.

Can we clean up the first error message? I realize this comes from requests, but it just seems odd that only the first error message has this tuple format: received the following error: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')).

It may be tricky to see if there is any attributes on requests.ConnectionError that I can use. The ConnectionError subclasses from an IOError. Based on the Python docs I could maybe inspect the args property and try to pull information out of the tuple but I am not sure how well I will be able to reformat into a better looking looking sentence or if I would be making inappropriate assumptions in what the args will be in the tuple. The second option would be to provide your own error message for ConnectionErrors. I am hesitant to do that now as it is nice to see what the exact error we are getting back so it is possible to trace the source of the issue. If we had a --debug mode, I think it would be easier to make the case for providing your own message for a connection error, but until then I am not sure if we would want to do that.

kyleknap · 2017-06-15T17:48:09Z

@stealthycoin

To answer your questions:

What do these errors look like in a short terminal?
The text just wraps to make a taller paragraph.

Can we clean up the request error with the tuple in it a little bit?
See my last paragraph in the previous response. I feel that if we move to the format that I proposed in the answering 4th question, it might remove the need to clean it up as we would be preserving the exact exception that got raised.

jamesls · 2017-06-15T17:56:01Z

@kyleknap I don't think preserving the original exception is useful from a user's perspective. Sure we can keep it in debug logs once we add those, but seeing ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')). is confusing for end users. We've had so many people (not just in chalice) asking about these broken pipe errors that I'm convinced it does more harm than good.

The multi-line format you suggest seems a bit much. I was just asking you to put line breaks so it's not a super long single line error message.

Also, for the first question, it looks like there's a missing gap. Saying there's a write timeout without actually saying what we were writing when we timed out is going to confuse users as see in the issue I linked.

kyleknap · 2017-06-15T19:30:57Z

@jamesls One thing that I can do to break it up is add a newline between the exception and the suggestion on how to fix it. Would that help? There is really only three possible logical components of the error: the location of where the error occurred, the error message, and the suggestion on how to resolve. Currently the location and the error message are combined into one sentence.

On the: We've had so many people (not just in chalice) asking about these broken pipe errors that I'm convinced it does more harm than good.

Does it make sense to add improved messaging to botocore then? If I was to do this in chalice, I would prefer we improve the error message for any client call by at least adding information on what operation threw the ConnectionError as a ConnectionError technically could happen for any client call and the error message will still be very bad. Also currently, it is only somewhat manageable/understandable for botocore/boto3/awscli because we have logging and tracebacks enabled.

kyleknap · 2017-06-15T19:57:00Z

I'm still looking on how to improve the connection error message part, but how about something like this?

jamesls · 2017-06-16T00:54:53Z

Please avoid using colors for now. I'm not against it but I don't think we should start with this change. Let's just focus on improving the error messages themselves for now.

jamesls · 2017-06-16T01:00:52Z

Does it make sense to add improved messaging to botocore then?

No I don't think so. Going into botocore would require we could generalize a broken pipe error and why it happened. We can only do that here because we know the exact API call and common reasons why it can fail.

I strongly suggest to just keep the scope small. This is just about improving error messages, and I'd rather move on to other features once we've done that.

jamesls

Some of the error messages we talked about offline:

ERROR - While sending your chalice handler code to Lambda, received the following error:

  Connection aborted, the Lambda service closed the connection before we finished sending all your data.

This is likely because the deployment package is 58.2 MB. Lambda only allows
deployment packages that are 50.0 MB or less in size. To avoid this error,
decrease the size of your chalice application by removing code or removing
dependencies from your chalice application.

ERROR - While sending your chalice handler code to Lambda, received the following error:

  Connection aborted, timed out sending your app to Lambda.

jamesls · 2017-06-16T21:57:26Z

chalice/awsclient.py

-                if attempts >= self.LAMBDA_CREATE_ATTEMPTS:
+                message = e.response['Error'].get('Message', '')
+                retryable_message = (
+                    'The role defined for the function cannot be assumed '


We might want to relax the matching here. Either partial string match or regex matching

I am planning to just check to see if the string 'role' is included in the error message if that is fine?

I'd prefer something more robust than that. If the function name can ever be in the error message and that name contains role you'll get a false positive.

So the literal error message, you will get for both create and update is:

The role defined for the function cannot be assumed by Lambda.

It does not include the name of the function currently. I can check for 'role' and 'assumed by Lambda'? Or just 'assumed by Lambda'?

No, I mean in general for InvalidParameterValueException, which is the code block this is in. This a) applies to more than just the role not being able to be assumed and b) has no guarantees about the message changing in the future. Something like role.*cannot be assumed seems reasonable to me.

Alright I can do that.

jamesls · 2017-06-16T22:11:23Z

chalice/deploy/deployer.py

+        # type: (Any) -> ChaliceDeploymentError
+        suggestion = None
+        where = None
+        if isinstance(error, DeploymentPackageTooLargeError):


Let's move this out into the ChaliceDeploymentError class. We should want the deployer to only be about deploying resources and move the logic of helpful error messages out to a separate class.

jamesls · 2017-06-16T22:15:21Z

chalice/awsclient.py

+            # aborts the connection as chalice is still sending it
+            # data
+            additional_details = (
+                'This is likely because the deployment package is %s. '


Lets move this up the stack and have stuff in awsclient just collect all the raw data, deployment package size, original exceptions, etc. We'd then have the Exception object assemble the suggestions and pretty print the error messages to the user.

jamesls · 2017-06-16T22:19:32Z

chalice/awsclient.py

+            return self._call_client_method_with_retries(
+                self._client('lambda').create_function, kwargs)['FunctionArn']
+        except Exception as e:
+            self._raise_large_deployment_related_errors(e, len(zip_contents))


Consider creating some lambda specific error so pieces higher up the stack know that we failed in the lambda function.

Specifically made the following updates: * Bubble the DeploymentPackageTooLargeError logic to the ChaliceDeploymentError instead of having it in Deployer * Wrap the error text to 79 characters * Indent the exact error message

Specifically did the following: * Created a general LambdaClientError to help indicate where an error takes place * Created a LambdaErrorContext to store information about the error from lambda * Introspect the ConnectionErrors from Lambda to create a more user friendly message about broken pipes and timeouts

kyleknap · 2017-06-20T20:13:06Z

@jamesls I updated. I think I got everything we talked about. Should be good to look at again.

jamesls · 2017-06-20T20:46:31Z

@kyleknap Could you show what the updated sample error messages look like?

kyleknap · 2017-06-20T20:58:48Z

@jamesls Here are some examples:

Too big when unzipped:

ERROR - While sending your chalice handler code to Lambda function "bigapp-
dev", received the following error:

 An error occurred (InvalidParameterValueException) when calling the 
 CreateFunction operation: Unzipped size must be smaller than 262144000 bytes

To avoid this error, decrease the size of your chalice application by removing 
code or removing dependencies from your chalice application.

Connection aborted:

ERROR - While sending your chalice handler code to Lambda function 
"bigappv2-dev", received the following error:

 Connection aborted. Lambda closed the connection before chalice finished 
 sending all of the data.

This is likely because the deployment package is 58.2 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

Timeout:

ERROR - While sending your chalice handler code to Lambda function 
"bigappv2-dev", received the following error:

 Connection aborted. Timed out sending your app to Lambda.

This is likely because the deployment package is 58.2 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

Zip too large (connection not aborted)

ERROR - While sending your chalice handler code to Lambda function 
"bigappv3-dev", received the following error:

 An error occurred (RequestEntityTooLargeException) when calling the 
 CreateFunction operation: Request must be smaller than 69905067 bytes for the 
 CreateFunction operation

This is likely because the deployment package is 50.9 MB. Lambda only allows 
deployment packages that are 50.0 MB or less in size. To avoid this error, 
decrease the size of your chalice application by removing code or removing 
dependencies from your chalice application.

jamesls · 2017-06-20T21:39:50Z

Cool, looks better. The first sentence is a little confusing, we'll not really sending our code to lambda function '{our function}'. Something like "while sending your chalice handler code to Lambda to create the function "function-name", we received the following error. Or something along those lines.

Looking over the code now.

jamesls · 2017-06-20T21:41:26Z

chalice/awsclient.py

@@ -37,6 +42,28 @@ class ResourceDoesNotExistError(Exception):
    pass


+class LambdaClientError(Exception):
+    def __init__(self, original_error, context):
+        # type: (Any, LambdaErrorContext) -> None


I try to avoid Any unless there's really not a better way. Can this not be Exception or something more specific?

I will check but I believe it was mypy complaining on accessing attributes that are not on Exception such as context.

jamesls · 2017-06-20T21:43:59Z

chalice/awsclient.py

+        try:
+            return self._call_client_method_with_retries(
+                self._client('lambda').create_function, kwargs)['FunctionArn']
+        except Exception as e:


I still don't think this is right. We're catching all exceptions and casting them down into a more specific exception (LambdaClientError). If there is actually a bug or something exceptional that happens in the code I'd want the original stack trace to propagate up, not be masked by LambdaClientError.

Offline: Decided to just catch botocore.exceptions.ClientError and RequestsConnectionError for this.

jamesls · 2017-06-20T21:45:32Z

chalice/awsclient.py

                # We're assuming that if we receive an
                # InvalidParameterValueException, it's because
                # the role we just created can't be used by
                # Lambda so retry until it can be.
                self._sleep(self.DELAY_TIME)
                attempts += 1
-                if attempts >= self.LAMBDA_CREATE_ATTEMPTS:
+                message = e.response['Error'].get('Message', '')
+                # Do not retry if exceeded retries or does not have anything


So a style nitpick, but I have said this several times before. Rather than write a comment explaining what the code is doing, just put it in a separate function and omit the comment:

if self._is_iam_role_related_error(e): raise

jamesls · 2017-06-20T21:47:39Z

chalice/awsclient.py

+            # data
+            error_cls = DeploymentPackageTooLargeError
+        elif isinstance(error, ClientError):
+            code = error.response['Error'].get('Code')


These will default to None which will crash in your __contains__ call on L169. These should default to str types, probably just the empty string.

jamesls · 2017-06-20T21:48:36Z

chalice/awsclient.py

+        try:
+            return_value = lambda_client.update_function_code(
+                FunctionName=function_name, ZipFile=zip_contents)
+        except Exception as e:


Same thing, we should really only have except Exceptions and the top most layer of the code. Restrict this to the subset of exceptions we care about.

Offline: Catch RequestsConnectionError and botocore.exceptions.ClientError here instead.

jamesls · 2017-06-20T21:49:13Z

chalice/deploy/deployer.py

@@ -169,6 +177,104 @@ def _validate_manage_iam_role(config):
            )


+class ChaliceDeploymentError(Exception):
+    def __init__(self, error):
+        # type: (Any) -> None


Prefer to avoid Any unless it's really not practical. This applies to all the Any's in this class.

jamesls · 2017-06-20T21:51:10Z

chalice/deploy/deployer.py

+        # is a socket.error that has the message 'Broken pipe' in it. So we
+        # don't want to be assuming all socket.error are broken pipes so just
+        # check if the message has 'Broken pipe' in it.
+        if 'Broken pipe' in str(underlying_error):


This seems more appropriately handled in compat.py. Here we actually have a solid isinstance way of checking for broken pipe in python3, and it seems to make sense to use it if possible rather than rely on __contains__.

jamesls · 2017-06-20T21:52:07Z

chalice/deploy/deployer.py

@@ -206,6 +312,13 @@ def deploy(self, config, chalice_stage_name=DEFAULT_STAGE_NAME):
            rest api, lambda function, role, etc.

        """
+        try:
+            return self._do_deploy(config, chalice_stage_name)
+        except Exception as error:


Same thing about except Exception. This would be fine in say cli/__init__.py, but not in the deployer module.

Offline: Catch LambdaClientError and botocore.exceptions.ClientError here instead.

jamesls · 2017-06-20T21:53:49Z

chalice/deploy/deployer.py

+        msg += self._wrap_text(self._get_error_message(error), indent=' ')
+        msg += '\n\n'
+        suggestion = self._get_error_suggestion(error)
+        if suggestion:


if suggestion is not None

It should not matter if the check changes here. The output will stay the same, but sure I can update.

jamesls · 2017-06-20T21:59:25Z

tests/functional/test_awsclient.py

@@ -475,6 +491,75 @@ def test_can_provide_tags(self, stubbed_session):
            handler='app.app') == 'arn:12345:name'
        stubbed_session.verify_stubs()

+    def test_raises_large_deployment_error_for_connection_error(
+            self, mock_lambda_client):
+        mock_session = mock.Mock(botocore.session.Session)


Why are we doing this?? There's an existing way to test this, which is using the StubbedSession.

If the necessary abstractions aren't in place in StubbedSession, I think it's more appropriate to add to them rather than work around them by bypassing the StubbedSession and dropping down to botocore sessions directly.

Offline: Going to subclass this from the botocore.stub.Stubber and add todo to port back to botocore.

Consists of: * Catching more specific exceptions on awsclient and Deployer * Make compat function for checking broken pipes * Be more specific on Any on exception type checks * Other smaller updates

Subclassed botocore.stub.Stubber to allow for raising custom errors that are not ClientErrors.

Include information on whether it was an create or update when the code is being sent to Lambda.

kyleknap · 2017-06-21T16:52:25Z

@jamesls I updated based on feedback. It should be good to look at again.

jamesls

Looks good to me.

jamesls · 2017-06-21T17:37:05Z

chalice/compat.py

+
+    def is_broken_pipe_error(error):
+        # type: (Exception) -> bool
+        return isinstance(error, BrokenPipeError)  # noqa


Why is # noqa needed here?

Pylint on python2 yelled at me for having an unknown variable name of BrokenPipeError as it only exists in Python3. Is there a better way to handle this? It seemed that using #noqa is the precedent that has been set in previous places for pylint errors due to need for type hinting.

Are you sure that's the case? Pylint doesn't honor # noqa AFAIK. That's for pyflakes/flake8.

No you are right. I misspoke. It was for flake8.

kyleknap added 2 commits June 14, 2017 15:00

Make retry on invalid parameter more specific

0b2c74a

This was done by checking that message was specifically about not being able to assume and IAM role.

kyleknap requested review from jamesls and stealthycoin June 14, 2017 22:05

stealthycoin reviewed Jun 15, 2017

View reviewed changes

jamesls reviewed Jun 16, 2017

View reviewed changes

kyleknap added 2 commits June 19, 2017 16:02

Updating error message based on feedback

f237c9d

Specifically made the following updates: * Bubble the DeploymentPackageTooLargeError logic to the ChaliceDeploymentError instead of having it in Deployer * Wrap the error text to 79 characters * Indent the exact error message

kyleknap force-pushed the deploy-err-msg branch from bd36772 to b620d4e Compare June 20, 2017 19:59

jamesls reviewed Jun 20, 2017

View reviewed changes

kyleknap force-pushed the deploy-err-msg branch from e01f353 to 13b3af0 Compare June 21, 2017 15:48

kyleknap added 3 commits June 21, 2017 09:02

Made minor updates based on feedback

d5a4f46

Consists of: * Catching more specific exceptions on awsclient and Deployer * Make compat function for checking broken pipes * Be more specific on Any on exception type checks * Other smaller updates

Add own implementation of Stubber

0acd8b2

Subclassed botocore.stub.Stubber to allow for raising custom errors that are not ClientErrors.

Update error message

f99cd22

Include information on whether it was an create or update when the code is being sent to Lambda.

kyleknap force-pushed the deploy-err-msg branch from 13b3af0 to f99cd22 Compare June 21, 2017 16:02

jamesls approved these changes Jun 21, 2017

View reviewed changes

Add changelog entry

ce8af7e

kyleknap force-pushed the deploy-err-msg branch from d3444f4 to ce8af7e Compare June 21, 2017 19:25

kyleknap merged commit 9f6705a into aws:master Jun 21, 2017

kyleknap deleted the deploy-err-msg branch June 21, 2017 19:31

Add better errors for too large deployments #380

Add better errors for too large deployments #380

Conversation

kyleknap commented Jun 14, 2017

codecov-io commented Jun 14, 2017 • edited Loading

Codecov Report

jamesls commented Jun 14, 2017

stealthycoin left a comment

Choose a reason for hiding this comment

kyleknap commented Jun 15, 2017 • edited by jamesls Loading

kyleknap commented Jun 15, 2017

jamesls commented Jun 15, 2017

kyleknap commented Jun 15, 2017

kyleknap commented Jun 15, 2017

jamesls commented Jun 16, 2017

jamesls commented Jun 16, 2017

jamesls left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleknap commented Jun 20, 2017

jamesls commented Jun 20, 2017

kyleknap commented Jun 20, 2017

jamesls commented Jun 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleknap Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesls Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleknap commented Jun 21, 2017

jamesls left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jun 14, 2017 •

edited

Loading

kyleknap commented Jun 15, 2017 •

edited by jamesls

Loading

kyleknap Jun 20, 2017 •

edited

Loading

jamesls Jun 20, 2017 •

edited

Loading