Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture exceptions outside of spiders #6

Open
maiksprenger opened this issue Dec 7, 2016 · 6 comments
Open

Capture exceptions outside of spiders #6

maiksprenger opened this issue Dec 7, 2016 · 6 comments

Comments

@maiksprenger
Copy link
Contributor

maiksprenger commented Dec 7, 2016

scrapy-sentry currently does not capture exceptions that happen outside the spider's on_error signal. This limits the usefulness of the extension. Luckily, It is pretty easy to fix as well. I currently have this in my Scrapy settings.py instead of configuring the extension:

from twisted.python import log
from raven import Client

client = Client(dsn=SENTRY_DSN)

def log_sentry(dictionary):
    if dictionary.get('isError') and 'failure' in dictionary:
        try:
            # Raise the failure
            dictionary['failure'].raiseException()
        except:
            # so we can capture it here.
            client.captureException()


log.addObserver(log_sentry)

It should be pretty easy to do the same in the extension, and possibly remove the on_error handling. @llonchj, if you agree, I'm happy to prepare a PR for that.

Above snippet is not my work, it was discovered by @elmkarami in http://stackoverflow.com/questions/25262765/handle-all-exception-in-scrapy-with-sentry and scrapy/scrapy#852.

@maiksprenger
Copy link
Contributor Author

maiksprenger commented Dec 7, 2016

I toyed some more with this, and managed to get proper tracebacks working, and am capturing the original exception instead of re-raising:

from twisted.python import log
from raven import Client

client = Client(dsn=SENTRY_DSN)

def log_sentry(dictionary):
    if dictionary.get('isError') and 'failure' in dictionary:
        failure = dictionary['failure']
        exc_type = failure.type
        exc_value = failure.value
        traceback = failure.getTracebackObject()
        client.captureException(exc_info=(exc_type, exc_value, traceback))

log.addObserver(log_sentry)

Pretty happy with this now; Sentry shows the expected traceback.

@nuklea
Copy link

nuklea commented Mar 14, 2017

What about pull request? Maybe more suitable for raven documentation?

@maiksprenger
Copy link
Contributor Author

I wanted to let you decide what do to with it, because it more or less replaces the inner workings of this project. Personally I think there's value in having scrapy-sentry; I'd switch over to something like I propose, and advertise in the README that really all there is is some boilerplate to attach to Twisted's logging infrastructure. But it's all up to you :)

@vionemc
Copy link
Contributor

vionemc commented Apr 17, 2017

https://github.com/llonchj/scrapy-sentry

It's available in PYPI

I am curious though. Is it possible to do it without Sentry? Without any client.

@Framartin
Copy link

Using sentry-python (named sentry-sdk in pip) instead of the legacy raven, there is a much simpler solution based on the fact that scrapy logging features are based on the stdlib logging module.

I develop a very simple scrapy extension that catches exceptions and errors inside and outside spiders (including downloader middlewares, item middlewares, etc.): https://gist.github.com/Framartin/4e57b57139ed31f36684cfc514037bf6
sentry-python catches by default logging events as follow:

  • critical and error levels: sent as sentry events
  • info and above levels: sent as sentry breadcrumbs

@ric2b
Copy link

ric2b commented Mar 28, 2021

Thanks @Framartin, works really well so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants