-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPP] Invented a parallel algorithm based on seidel implementation. #163
base: develop
Are you sure you want to change the base?
Conversation
Cool. Thanks for contributing @JS00000. I will try to get around to review the code quickly. |
Do you have any numbers/graphs demonstrating the new algorithm? |
Not yet. The idea of this parallel optimization is very simple, that is, separate the parts that have no dependencies in the serial calculation of the controllable set, and then process them in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this point. I need a bit more information to understand your changes.
-
Duplicating the API to achieve parallelization is a bit excessive. The current API does not support parallelization. I would be in favor of updating the API to have only one API that supports parallelization, rather than changing it opportunistically.
-
I may be overlooking, but Seidel and SeidelParallel seems to have a lot of code in common. Is this code duplication necessary ? This is bad because:
- it makes code review hard because differences are hidden,
- it makes maintenance harder.
From what I understand, you want to store the result as an attribute of the class. I guess you also removed the reordering of the LP constraints. Am I missing something ?
I also made a few minor comments.
initialize(); | ||
size_t en = GetCurrentTimeUsec(); | ||
printf("time of init: %fms\n", (en - be)/1000.0); | ||
be = en; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace printf
by TOPPRA_LOG_DEBUG
, here and everywhere in the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
struct timeval tv; | ||
gettimeofday(&tv,NULL); | ||
return ((unsigned long long)tv.tv_sec * 1000000 + (unsigned long long)tv.tv_usec); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a C-style way of measuring time. Although valid, I would be in favor of a more C++ style (e.g. https://www.techiedelight.com/measure-elapsed-time-program-chrono-library).
I think std::chrono
is also more portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, It's a good suggestion.
return res; | ||
} | ||
|
||
constexpr value_type infi = 1e6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This infinity is definitely too low and I don't think this is actually necessary. What issue did you get that made you set this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I set infi to a very large number, it will cause some numerical calculation problems. It may be caused by the limitations of the algorithm itself. In fact, the original seidel algorithm also set an upper limit of 100, which is much smaller than the upper limit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the C++ implementation, I don't think there is any such limit. Are you talking about the limit discussed in #156 ?
There is no good reason for this limit and this limit is modifiable by the user since #160.
Through this API, I change the bound to [ 0, +inf ] in my code and it works well on my use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. In the origin develop branch, if you set m_initXBound to [ 0, +inf ], it will works well. But it doesn't work in parallel branch here. This is also the limitation of this parallel algorithm.
There are two variables, u and x. In principle, the limit of x (which is m_initXBound in origin develop branch) should be set to [ 0, +inf ], and the limit of u should be set to [ -inf, +inf ].
However, after a series of incomplete tests, I found that in parallel branch here, the limit of x can be set to [ 0, +inf ], but the limit of u should be set to [ -1e13, +1e13 ] or smaller. Otherwise, this parallel algorithm will have a large numerical error.
The reason is that the order of constraints in the LP in this parallel algorithm has been adjusted. In order to process unrelated constraints in parallel, I put the two constraints corresponding to xNext at the end. This leads to a high degree of linear correlation between the constraints during the first and last stages of parallel processing, and the difference between the variables u and x is even up to 15 orders of magnitude. This is enough to make the "double“ data structure produce enough errors to affect the result.
So I set infi to 1e6, it can also be set larger, of course, a more complete test is needed. A smaller infi setting may speed up the calculation speed, and it is also sufficient to meet actual data requirements.
I will try to optimize the code structure, which will take a while. |
The API modification can be discussed before you actually do them. I will let @hungpham2511 give his opinion on this. |
To be honest, I believe there is not so much gain we can get from parallelizing the algorithm. At best solving LP(s) is two times faster. Including the time taken for other parts, i.e., calculating the coefficients, the total speedup is probably less than that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After going through the change, I am actually confused about where parallelism is implemented and how does it help. Could you @JS00000 explain briefly? In particular, it is the seidel solver that you are improving or the high-level steps in the TOPPRA algorithm.
I think only by understanding how parallelism is used, we can come up with a suitable set of APIs.
cpp/CMakeLists.txt
Outdated
@@ -17,7 +17,7 @@ option(TOPPRA_DEBUG_ON "Set logging mode to on." OFF) | |||
option(TOPPRA_WARN_ON "Enable warning messages." ON) | |||
option(OPT_MSGPACK "Serialization using msgpack " OFF) | |||
# Bindings | |||
option(PYTHON_BINDINGS "Build bindings for Python" ON) | |||
option(PYTHON_BINDINGS "Build bindings for Python" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change the default flag needlessly. You can always use -DPython_BINDINGS=OFF
during compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
@@ -47,6 +47,13 @@ if(BUILD_WITH_GLPK) | |||
message(STATUS "Found glpk ${GLPK_LIBRARY} ${GLPK_INCLUDE_DIR}") | |||
endif(BUILD_WITH_GLPK) | |||
|
|||
find_package(OpenMP REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to have a BUILD_WITH_OPENMP flag and find OpenMP only if it is switched on. Otherwise this breaks compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
#pragma omp parallel for schedule(dynamic) | ||
for (int i = 0; i < m_N; i++) { | ||
// solver_ret = m_solver->solveStagewiseBatch(i, g_upper, solution_upper[i]); | ||
solver_ret = m_solver->solveStagewiseBatch(i, g_upper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me. Are you sure that this routine returns the correct controllable sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it just store the half-finished solution in solver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here is actually the implementation of parallelism.
Yes. In my tests, the total speed is atmost 2 times faster than origin Seidel solver, but still slower than qpOASES solver. |
Fixes #
PathParametrizationAlgorithm::computeFeasibleSets() now can pass the test.
Changes in this PRs:
Checklists: