MongoDB 3 Compression Options #1099

robodude666 · 2015-08-13T23:09:16Z

One of the new features added in MongoDB 3.0 is compression. Could these compression options please be supported, especially combined with GridFS for storing text files?

From the article on MongoDB's blog, it appears the WiredTiger Storage Engine would be required to support this functionality; not sure if this is yet supported or not.

iici-gli · 2015-10-11T03:12:48Z

The compression can be configured in MongoDB start up options. Mongoengine does not need to do anything.
For example, in your config file:
storage:
dbPath: "C:/mongodb3/db"
engine: "wiredTiger"

lafrech · 2016-02-22T08:37:45Z

I'm not familiar with this, so I may be wrong, but these compression options can be used at collection level, as written in the article linked to by @robodude666. To use them, one needs to pass specific options through kwags (see also this SO question) in pymongo's create_collection.

It would make sense to expose these options in MongoEngine and from a quick glance the meta attribute of the collection seems like an appropriate place. If anyone is willing to propose some code, I think it would be an interesting feature indeed.

amcgregor · 2019-07-11T20:24:08Z

You don't have control over the construction of GridFS collections, either the file tracking one, or the one containing the actual chunks. That leaves such configuration to manual effort or server-wide configuration, as was previously pointed out. Additionally, the MongoDB in-database compression algorithm defaults to Snappy, for performance reasons, or lets you use fast zlib, neither of which offer worthy compression ratios. (Zlib being a typical dictionary based Huffman coder, Snappy using no entropy coding at all, instead relying on repetitions described by relative references in the output stream; so, at worst, it's literally 100% worse than gzip. More akin to RLE. ;)

On Lewis Carroll's "Through the Looking Glass" (Project Gutenberg txt edition), which should be highly compressible, "fast gzip" (-2) reduces the 168K source material to 70K. A 58.2% reduction is nothing to sneeze at. (Snappy, using gross estimates and comparisons to gzip, would get around 23% reduction on this file.)

Compare that to something a bit more… modern… like xz… given room to work (compression level -9). Same source material: 54K result. A 68% reduction.

Conclusion: compress material before archiving it into GridFS; WiredTiger compression is intended for absolute speed and data mutability, not efficient archival. This is doubly important if you store mixed content in GridFS, such as including images, audio, or video alongside the text content. Any form of in-database compression would actually increase the size of the stored data, if it's already extremely tightly entropy coded as sound and video are.

lafrech added the Enhancement label Feb 22, 2016

amcgregor mentioned this issue Jun 23, 2016

Migrate away from MongoEngine. marrow/contentment#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MongoDB 3 Compression Options #1099

MongoDB 3 Compression Options #1099

robodude666 commented Aug 13, 2015

iici-gli commented Oct 11, 2015

lafrech commented Feb 22, 2016

amcgregor commented Jul 11, 2019

MongoDB 3 Compression Options #1099

MongoDB 3 Compression Options #1099

Comments

robodude666 commented Aug 13, 2015

iici-gli commented Oct 11, 2015

lafrech commented Feb 22, 2016

amcgregor commented Jul 11, 2019