Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to enable threadsafe operation #1064

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

oir
Copy link

@oir oir commented Nov 13, 2017

This PR includes changes to (optionally) enable threadsafe operation of dynet, providing the ability to run multiple dynet models within a single application, or executing a single dynet model over multiple data instances (computation graphs) concurrently.

This includes:

  • Disabling the restriction that one CG should exist at a time, and disabling the notion of staleness
  • Switching to a dynamic memory pool for memory allocation for tensors to avoid race conditions when managing memory for tensors / CGs from different threads.

Multithread data parallelism without copying a single model in memory works as follows:

  • Have a single dynet::ParameterCollection object shared between threads containing (physical) model parameters
  • Have multiple copies of a model object (e.g. LSTMBuilder). This copy causes copies of model parameters but that is okay because these are just shells that contain pointers to the same physical storage.
  • Each thread runs one copy of these models.

Main motivation was multithreaded inference, but possibly the changes might apply to training-time as well, similar to asynchronous SGD training (which I did not test).

My implementation is limited (and tested on) only the SimpleExecutionEngine (so no autobatch) and only for CPU devices.

@neubig
Copy link
Contributor

neubig commented Dec 20, 2017

Thanks! This is highly appreciated. I haven't been able to make the time to do a careful review yet, but it's upcoming.

@danielhers
Copy link
Collaborator

danielhers commented Feb 4, 2018

Multithreading support would be great, but even in a single thread, maintaining multiple computational graphs in parallel would help be a lot since it would enable model ensemble without having to reset the CG between querying different models. That would be great too!

@neubig
Copy link
Contributor

neubig commented Mar 23, 2018

Thanks again for contributing, and I'm super-sorry for taking so long to get to this! Here are a few comments/questions:

  1. DynamicCPUMemoryPool: If I understand correctly, this is a direct parallel to InternalMemoryPool, correct? If so, shall we call them DynamicMemoryPool and StaticMemoryPool?
  2. I've been wondering about this for a while: do you know how much going from static to dynamic memory allocation hurts performance?
  3. Has this tested for compatibility with multi-device support?
  4. Typo: dynet-dynemic-mem.
  5. It seems this is missing checks for unsupported cases: dynamic-mem + autobatch, etc.
  6. Could you rebase to the current master one more time? Things have gotten out of sync. I'll make sure that this can get incorporated before things get out of sync again, so I promise this will be the last rebase!

@neubig
Copy link
Contributor

neubig commented Mar 30, 2018

@oir FYI: If you're busy and don't have time to handle this we can pick things up and do the rest on our side. Of course if you're willing to help we'll be happy to have you.

@oir
Copy link
Author

oir commented Apr 2, 2018

@neubig Hey! Sorry for the late response, I am willing to pick up. I will go through your comments and address them, as well as attempt a rebase, hopefully soon enough.

@neubig neubig mentioned this pull request Apr 20, 2018
@oir
Copy link
Author

oir commented May 30, 2018

@neubig To keep you updated: This week I am starting to look again at this (possibly alongside NAACL). We have noticed another minor issue with the PR which needs to be fixed (about guarding shared parameter pools), which will also be part of this PR after I do the rebase.

@neubig
Copy link
Contributor

neubig commented Jun 15, 2018

@oir Great, thanks!

@kwalcock
Copy link
Contributor

kwalcock commented Jun 8, 2020

Has any progress been made on this in the past two years?

@MihaiSurdeanu
Copy link

Hi @neubig and @oir: can you please let us know the status of this PR? We are very interested in the same issue. Can we help?

@neubig
Copy link
Contributor

neubig commented Jun 9, 2020

If my comments above could be addressed I'd be happy to merge a PR!

@kwalcock
Copy link
Contributor

kwalcock commented Jul 1, 2020

Please see also oir#1. It doesn't appear that the modifications are sufficient.

for (size_t t = 0; t < 4; ++t) { threads[t].join(); }
for (size_t t = 0; t < 4; ++t) {
for(size_t i = 1; i < results[t].size(); ++i) {
BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code never runs because results[t].size() is always 1. Thread safety is never tested (unless the test crashes, which it does).

@kwalcock
Copy link
Contributor

kwalcock commented Jul 24, 2020

The line

BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001);
which is supposed to check whether the multi-threading functionality works, never seems to be executed. In the line just before it,
for(size_t i = 1; i < results[t].size(); ++i) {
, results[t].size() is always 1, so the loop is not entered.

If the code is changed to

  for (size_t t = 1; t < threadCount; ++t) {
    BOOST_CHECK_CLOSE(results[0][0], results[t][0], 0.0001);
  }

the check will pass when the threads are all processed serially, which is not much of a surprise. When they are processed in parallel, the test crashes and the check is never performed.

The PR contains some promising code, but it does not appear to be usable/correct. It should not be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants