Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About rolling_online_management again #572

Closed
yieldbook opened this issue Aug 19, 2021 · 11 comments
Closed

About rolling_online_management again #572

yieldbook opened this issue Aug 19, 2021 · 11 comments
Assignees
Labels
question Further information is requested stale

Comments

@yieldbook
Copy link

❓ Questions and Help

If I set the rolling_step = 20, there will be more than 100 tasks to run and it takes more than 10 minutes to init the data in a single task(some processors are really time consuming). Any suggestion for me to speed up the data processing? Can these tasks run concurrently?
Thanks a lot!

We sincerely suggest you to carefully read the documentation of our library as well as the official paper. After that, if you still feel puzzled, please describe the question clearly under this issue.

@yieldbook yieldbook added the question Further information is requested label Aug 19, 2021
@you-n-g
Copy link
Collaborator

you-n-g commented Aug 22, 2021

For speeding up the data processing, you can refer to this issue

@yieldbook
Copy link
Author

Thank you for the suggestion.
But from time to time, I need to change the processors or use different cols, and I have to train the models again. It will take a long time if there are too many models. I noticed there are some codes related to multiprocessing, like the "force_release" or worker() function. What should I do to make these models run concurrently?

@you-n-g
Copy link
Collaborator

you-n-g commented Aug 29, 2021

@yieldbook
We have developed a task management module.
You can refer to the docs;
You can create a task pool and run multiple workers on different machines.

@yieldbook
Copy link
Author

For speeding up the data processing, you can refer to this issue
I'm not good at coding, so I ask stupid questions.
Ideally, the processors should process the data for just once and use the same processed data in very rolling tasks, but with updated segments.
I used the to_pickle function to save the dataset in the first loop, but the problem is how I can update the segment of the dataset in the next loop?

@you-n-g you-n-g self-assigned this Sep 15, 2021
@you-n-g
Copy link
Collaborator

you-n-g commented Sep 17, 2021

@yieldbook We are drafting a demo to show a case to dump the processed data to the disk to avoid duplicated data processing
#606
Please check if the demo answers your question and help to review it.
Thanks :)

@yieldbook
Copy link
Author

@you-n-g Thanks a lot for the demo. It's much much faster now, but dumping the handler to disk and loading it again is still too slow, especially when the handler is huge. It's better to keep the handler in memory and update the segment in every loop. That should be really efficient.

@Wangwuyi123
Copy link
Collaborator

@yieldbook We updated that demo to show a case to dump the process data to the memory to reduce disk IO #606

Please check if the demo answers your question and help to review it.

@yieldbook
Copy link
Author

@Wangwuyi123 Thanks a lot. It's very helpful. I have a following up question. In the old backtest function, I can pass the pred_score generated from the rolling tasks directly to the backtest, but in the new backtest function, pred_score is no longer accepted. How can I backtest the rolling tasks?

@you-n-g
Copy link
Collaborator

you-n-g commented Oct 8, 2021

@yieldbook
We will add some more user-friendly functions to the new backtest function soon.

@github-actions
Copy link

github-actions bot commented Jan 6, 2022

This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days

@github-actions github-actions bot added the stale label Jan 6, 2022
@you-n-g
Copy link
Collaborator

you-n-g commented Jan 7, 2022

@yieldbook
Here is a more user-friendly interface in the new version of backtesting.
https://qlib.readthedocs.io/en/latest/component/strategy.html#running-backtest

@you-n-g you-n-g closed this as completed Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

3 participants