Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--mtime slow (70s) for 400k messages #4

Open
yeled opened this issue Aug 28, 2016 · 7 comments
Open

--mtime slow (70s) for 400k messages #4

yeled opened this issue Aug 28, 2016 · 7 comments

Comments

@yeled
Copy link

yeled commented Aug 28, 2016

23:50 YELP-CHARLIE~[[email protected]] % keywsync -m /Users/charlie/Maildir/ -q 'not path:spodder/**' --mtime 1472423730 -k --no-replace-chars -d
** keyword <-> tag sync
replace chars: false
=> db: /Users/charlie/Maildir
=> direction: keyword-to-tag
=> query: not path:spodder/**
=> note: dryrun!
=> remove double x-keywords header: 1
mtime: only operating messages with mtime newer than: 2016-Aug-28 22:35:30
* db: current revision: 11562648
*  messages to check: 409777
*  query time: 654.258 ms.
=> done, checked: 409777 messages and changed: 0 messages (skipped: 409777) in 57847 ms [cpu], 74.9375 s [real time].

The large pause is after query time and then it completes.

The count of messages almost matches up:

0:09 YELP-CHARLIE~[[email protected]] % notmuch count 'not path:spodder/**'
409442

I run this regularly, and I have not modified 409,777 messages in the last 5 minutes.

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2016

Charlie Allom writes on august 29, 2016 1:15:

23:50 YELP-CHARLIE~[[email protected]] % keywsync -m /Users/charlie/Maildir/ -q 'not path:spodder/**' --mtime 1472423730 -k --no-replace-chars -d
** keyword <-> tag sync
replace chars: false
=> db: /Users/charlie/Maildir
=> direction: keyword-to-tag
=> query: not path:spodder/**
=> note: dryrun!
=> remove double x-keywords header: 1
mtime: only operating messages with mtime newer than: 2016-Aug-28 22:35:30
* db: current revision: 11562648
*  messages to check: 409777
*  query time: 654.258 ms.
=> done, checked: 409777 messages and changed: 0 messages (skipped: 409777) in 57847 ms [cpu], 74.9375 s [real time].

The large pause is after query time and then it completes.

The count of messages almost matches up:

0:09 YELP-CHARLIE~[[email protected]] % notmuch count 'not path:spodder/**'
409442

I run this regularly, and I have not modified 409,777 messages in the last 5 minutes.

Yes, I'm believe so. The skipped count are the ones that are not checked
because of the mtime check. I have less email than you and spend:

 => done, checked: 67008 messages and changed: 4 messages (skipped: 67001) in 3586.48 ms [cpu], 9.49658 s [real time].

for keywords -> tags

and

 => done, checked: 7 messages and changed: 3 messages (skipped: 0) in 14.887 ms [cpu], 0.0724021 s [real time].

for tags -> keywords.

I am not sure this can be fixed unless we can do some file query where
messages are sorted by mtime. Otherwise, we are not guaranteed that all
message changes are caught.

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2016

ls uses readdir and stores all files and sort them. This might be
faster than notmuch and get_filename. But then we'd have to use a
directory and not a query to find the files that have been modified.
I guess this wouldn't be a big deal for most people.

Perhaps allow query to be replaced with --directory for keyword -> tag sync.

@yeled
Copy link
Author

yeled commented Aug 29, 2016

ah, so the changed: X messages is what has matched the >= mtime?

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2016

Charlie Allom writes on august 29, 2016 12:45:

ah, so the changed: X messages is what has matched the >= mtime?

Correct.

@gauteh
Copy link
Owner

gauteh commented Aug 29, 2016

Charlie Allom writes on august 29, 2016 12:45:

ah, so the changed: X messages is what has matched the >= mtime?

No, sorry. The 'checked:' is the ones that matched. 'changed:' is the
checked ones that needed modification.

@gauteh gauteh changed the title is --mtime really working? --mtime slow (70s) for 400k messages Aug 29, 2016
@gauteh
Copy link
Owner

gauteh commented Mar 3, 2017 via email

@gauteh
Copy link
Owner

gauteh commented Mar 7, 2017

https://github.com/gauteh/gmailieer (brand-new experimental) might do this better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants