|
This change adds a new cleanup mode that avoids cleanup having
re-traverse the directories the index pass just looked at.
Additionally, we efficiently query the Xapian database by walking the
term list instead of doing multiple point-wise path lookups.
I'd noticed that most of my time in mu's cleanup pass consisted of
B-tree lookups in Xapian (one 8KB pread64 at a time). The point
lookups forced Xapian to traverse from the root of the B-tree to the
leaf for every single message. Additionally, in order to join on the
message path, we had to do *another* B-tree traversal after locating
each message term. Now we just walk the terms in order, which is much
more efficient, as we touch each B-tree node only once.
On my system, with 1371861 total messages, the total time of mu
index (no lazy check):
--nocleanup: 3.6s
incremental cleanup: 4.2s (0.6s in cleanup)
legacy cleanup: 5.2s (1.6s in cleanup)
With the new mode, we save 1.0s of the 1.6s cleanup, so we're
~63% faster.
But the incremental cleanup works even better with lazy checking.
If I enable --lazy-check, dirty only my INBOX (360778 messages), and
run index, I get:
--nocleanup: 0.9s
incremental cleanup: 1.1s (0.2s in cleanup)
legacy cleanup: 2.5s (1.6s in cleanup)
We save 1.4s out of 1.6s for ~88% speedup.
This change also fixes a timestamp bug: we should be storing
the *start* time of the index pass in metadata, not the end time, so
that on the next index pass, we notice messages that arrived between
the two times.
All tests pass. You can set the environment variable
MU_NO_INCREMENTAL_CLEANUP to use the legacy cleanup path instead.
|