Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python-gcm does not support connection pooling #89

Closed
alexdej opened this issue Oct 20, 2015 · 7 comments
Closed

python-gcm does not support connection pooling #89

alexdej opened this issue Oct 20, 2015 · 7 comments

Comments

@alexdej
Copy link

alexdej commented Oct 20, 2015

Hello! I observe that an SSL connection is being created for each request. This was true at least in 0.1.5 and was still true in 0.2. This appears to be a consequence of using requests.post which creates and closes an internal Session object. In Requests, connection pooling is bound to the Session object (http://docs.python-requests.org/en/latest/user/advanced/#keep-alive).

I tested a monkey patched version of python-gcm where gcm.gcm.requests is replaced with a gcm.gcm.requests.Session() object and observed that connections are reused. From my test server in AWS us-east-1, each GCM POST request took avg ~140ms without the patch and avg ~40ms with (7 qps vs 25) which at our volume is a significant help.

It wasn't clear to me how best to integrate connection pooling into python-gcm so I thought I'd post the issue for comment before sending a PR. Also, the thread safety of requests.Session is an open question so that's a trade-off to consider: https://github.com/kennethreitz/requests/issues/1871.

@alibitek
Copy link
Collaborator

@alexdej Hi! Thanks for raising this issue! Indeed this is a performance problem and we should reuse the underlying TCP connection/socket.

Currently, using json_request you can send push notifications in bulks of 1000, so if you have 1 million tokens you would open 1000 TCP connections to the GCM server.

I've just tested with a few million notifications something along the lines of: https://github.com/mnemonicflow/python-gcm/commit/00a93d69288bc786e30d43df89ef271ac54117ce which should be a good starting point, although this means forcing the clients of the library to use a context manager and existing clients need to update their code.

with GCM(API_KEY) as gcm:
    response = gcm.json_request(registration_ids=registration_ids, data=notification,
                                collapse_key='xyz',
                                priority='high',
                                restricted_package_name="com.mycompany.myawesomeapp,
                                delay_while_idle=True,
                                dry_run=False)

But I think would be better to add another level of indirection and run the cleanup code inside the GCM object by creating a static session object wrapper using the https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager decorator or similar.

Regarding thread safety, I think it should be handled by the client of the library and maybe add an option to the GCM object to specify if you want connection reuse or not.

@alexdej
Copy link
Author

alexdej commented Oct 20, 2015

Great! Your suggestion would work fine for our case, though I agree you might want to make the new behavior optional to avoid changing existing clients (and to preserve thread safety of gcm library by default).

@tgwizard
Copy link

👍 on reusing connections. @mnemonicflow your suggestion only works if the data is the same for all recipients, right? If the data is unique for everyone you'd have to call gcm.json_request() once per recipient.

@alibitek
Copy link
Collaborator

@tgwizard Yes, the data is the same for all recipients you pass in the registration_ids parameter. If you want to send different data for different recipients you have to group the recipients (preferrably in bulks of 1000) and pass the specific data you want to send to them in a different json_request call.
The underlying TCP connection is still reused due to the session object but as the documentation says http://docs.python-requests.org/en/latest/user/advanced/ the data is NOT.
"Note, however, that method-level parameters will not be persisted across requests, even if using a session."
Method level parameters refers to the parameters of the .post, .get, .put, etc. methods of the requests.Session() object

@tgwizard
Copy link

Thanks for the response @mnemonicflow. The session, and TCP connection, reuse will only be enabled when #96 being merged, as now there's a call to requests.post, which creates new sessions for every call. Or am I missing something?

@alibitek
Copy link
Collaborator

@tgwizard Yes! #96 got merged in develop branch https://github.com/geeknam/python-gcm/tree/develop

@tgwizard
Copy link

Cool, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants