Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added cache to store the yearly schedule #8

Closed
wants to merge 1 commit into from

Conversation

jamontes
Copy link
Contributor

Hi,

This is a PR in order to add the cache feature for the yearly XML fetched on every add-on run. After tried several options between using cPickle to store the ET object and writing the raw XML data directly to disk, I've decided to split the whole XML on a daily basis, and write the raw serialized XML to a cache file per day. As the resultant XML file size is a little less than half of the size of the whole year schedule, the time to read and parse again the raw XML for the specific day from disk is really effective, so at the end I didn't use the cPickle library to make the cache feature.

As there is one cache file per day, I had to adapt the functions to use the day for loading the data instead of the year, and the tree searches now take the day as the root of the tree to parse.

The rest of modifications on this PR, besides the addition of the year into the info tags and to clean the abstract description from some HTML labels into the show_room function, was mainly focused into the task to improve the performance of loading the navigation menus throughout the add-on, specially noticeable on small hardware devices, like the Raspberry Pi and other well spread SoCs.

The most astonishing for me, after made the cache changes and tested it on my Raspberry Pi model 2, was that the add-on took still around 10 seconds to load the first page with the years list, and it didn't make any operation to fetch the year schedule from the internet nor the cache files! At first I thought of the lack of resources of the Raspi after upgraded recently to the latest KODI release, but then compared with other add-ons' behaviour I realized it had nothing to be with that, but with the time it takes for the add-on to load all the library dependencies before run. When I replaced the requests library by the urllib2 one, the first page (that never fetchs from the internet neither the disk the info to show) loaded within one second.

With that idea in mind, I've tried to reduce the dependency libraries imported to just the strictly required, and refactored the contains_videos function in order to return as soon as it finds a playable video available, instead of let it iterate along the tree downstream until find all the URL videos. As this function is called for every room and for every event inside each room, the change improved the response time in parsing the info along the XML as well. I'm very sorry to replace your really nice functional-style sentence of code on behalf of a more primitive and less elegant one, but the time reduced by this modification is impressive, and that's the reason why I've modified that function as well.

As with the other PR, please, do not hesitate to review the code and remark anything you find it should be modified in order to let it be according to your coding conventions and requirements.

Once again, thanks a lot for your effort and time, and best regards,

jamontes.

@jelly
Copy link
Owner

jelly commented May 16, 2019

It would be nice if these where separate commits, making it easier to review. I wondered if there was an official mechanism for caching in Kodi.

@jelly
Copy link
Owner

jelly commented May 16, 2019

Important note about caching btw: https://kodi.wiki/view/Caches_explained

@jelly
Copy link
Owner

jelly commented May 16, 2019

@dagwieers
Copy link
Contributor

dagwieers commented May 16, 2019

So we recently went through the same process when our plugin felt really slow. We started profiling our code to find out what was going on: add-ons/plugin.video.vrt.nu#169

We discovered that importing Requests took 4 seconds, and because of how Kodi plugins work this would be 4 seconds for every interaction.

From this profiling we also discovered that the use of additional libraries is costly for some, and importing them for everything (i.e. at the top of your plugin or libraries) when it's only being used in very specific cases may be not worth the cost. So in the end we settled for the following changes (in order of importance)

  1. Enable uselanguageinvoker in your addon.xml
  2. Reduce the number of libraries (especially the ones you can easily replace or take longer to load)
  3. Import libraries where they are needed, not as boilerplate
  4. Replace Requests with urllib2/urllib (beware of python 2/3 compatibility)

You can find some rationale (in Dutch) at: https://bitbucket.org/basrieter/xbmc-online-tv/issues/1159/plugin-traag-op-raspberry-pi#comment-51989436

In the meantime, the urllib3 issue has not been fixed (3 weeks now) as the maintainer is waiting for a partial fix rather than downgrading that library. (After the fix, importing urllib3 will still take 500ms to 700ms) So moving away from urllib3/Requests was a good choice (only 250ms). Unfortunately all Raspberry Pi users are affected with most addons.

If you rather keep on using Requests (to keep things simple) you can opt to ship an older urllib3 yourself for the time being (like Retrospect opted to do). However I would probably also include Requests with urllib3.

@jamontes
Copy link
Contributor Author

jamontes commented May 17, 2019

Hi dagwieers,

I totally agree with you on the additional time it takes on every add-on invocation to import all the libraries, specially the requests one. In order to find the hot points and reduce the time taken to run the code I've made a lot of trials and modifications to find out the best option. That's the reason why I've finally discarded to import the cPickle cache library on behalf of to store the serialized XML data directly into disk, and that worked fairly good, as you can see testing directly on a limited device like the Raspberry Pi. On a modest PC (the mine is 11 years old) the resulting time of load a cached day menu is almost instantaneous, so even if you can reduce some milisecs at the expense of importing an additional cache library, the current version of this PR avoids to further depend on future changes made over some of these libraries (because they have as well their own mature process life), as you perfectly pointed out in your comment.

To @jelly: please, give it a try to the code as it is now and tell me about your conclusions. This PR is just an evaluation proposal, so don't hesitate to make all the modifications or amendments you wish until you are fine with them and consider they are OK to merge into the code if you finally agree. Whatever you decide will be fine for me, don't worry about that.

Thanks to both for providing feedback and best regards,

jamontes.

@jelly
Copy link
Owner

jelly commented May 27, 2019

Thanks, I will look into it, it will take some time but I still have this on my mind. I would probably want to split it up a bit and take some time to see if I can setup my old RPI 3 and test it out a bit.

@dagwieers
Copy link
Contributor

We recently added local HTTP caching in our plugin too: add-ons/plugin.video.vrt.nu#258

This is twofold, we use it to improve performance, but also as a fallback mechanism when we get HTTP errors or temporary failures. Since we have to use web-scraping for certain information gathering, and web-scraping is prone to changes to the website, if scraping fails we use the cached data indefinitely (so without invalidating the cache).

It's turned on by default, but we allow users to disable it, or invalidate the current cache.

@jelly
Copy link
Owner

jelly commented May 28, 2019

Interesting, does it also cleanup it's cache. Since that seems to be a major concern for Kodi developers.

@dagwieers
Copy link
Contributor

@jelly What do you mean with "cleanup its cache" ? There is a TTL that we configure per type of file. So if the TTL has exceeded a new version will be downloaded when requested.

If you don't use the addon, or you don't make any request that could be cached, the cache will stay behind. We don't have a service that would actively remove outdated files, because in those cases they wouldn't be available as a fallback.

What would be the major concern of not actively removing caches? I cannot think of any good reason that is not already intrinsic to Kodi.

@jamontes
Copy link
Contributor Author

About cleaning up the cache files, i would propose three options:

  • The most simple: to leave the cache files as they are right now. That means that the two cache files (one per day) are overwritten whenever a FOSDEM year is chosen (they take exactly two files, one per day, no matter of the years selected by the user, so they are no cumulative along the years watched). The two files size (i.e. the whole year schedule) is under 2M at total, so it's not so severe in starving resources even on very small systems, as the RPI, and they aren't definitely eating disk space as the add-on is used along the years, so it could be a fair option.

  • The second option is to remove any existing cache files at the first time the add-on is executed (i.e. whenever it lists the last five years of FOSDEM events). Although it doesn't clean the files when the user has finished to use it (i.e. no expiration time), it makes some house cleaning anyway. Using this option, it doesn't make sense to include an add-on settings menu option to explicitly remove the cache files upon demand by the user.

  • The third option is to set up a helper cleaning service that removes the cache files upon Kodi restart or change of user profile. This option would clean the cache files on a regular basis even if the add-on is not used for a while. The cleaning service is defined into the addon.xml file as an additional extension point of type "xbmc.service" that can be launched automatically by Kodi by loading a python file located into your add-on directory the same way as it is doing right now with addon.py. The way it works is described here: Service add-ons.

@jelly: I would help with any of the latter two solutions if you like.

Best regards,

jamontes.

@dagwieers
Copy link
Contributor

dagwieers commented May 31, 2019

Again I don't see the point in removing the caches, a TTL is fine and having the last working file as cache could work as a fallback in case there is an access or parsing issue. That is what we do. Keep it simple.

If people don't want caches, they can disable it.

@jelly
Copy link
Owner

jelly commented Aug 30, 2019

I should have fixed the main performance with release 0.0.4 which fixes the bug in requests trying to figure out the encoding of the XML document. This is now available in 0.04

@dagwieers
Copy link
Contributor

We also added a Refresh context menu so the user can refresh a listing on request and this will bypass the cache and update the cache.

@jelly
Copy link
Owner

jelly commented Aug 31, 2019

In 0.0.5 I've used reuselanguageinvoker to cache the XML fetched from fosdem's website. The requests is also faster and requests is no longer used. So it should take ~ 2 seconds to load the year specific data. There are more optimisations possible but so far I think this is a pretty good improvement. Thanks @dagwieers and @jamontes for raising the issue and improvements.

@jelly jelly closed this Aug 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants