MongoDB – Internals & Performance

Storage Engine

Starting version 3.0, MongoDB adopted a pluggable architecture allowing the option to choose the storage engine. A storage engine is the part of a database that is responsible for managing how data is stored, both in memory and on disk.

Different engines perform better for specific workloads, one storage engine might offer better performance for read-heavy workloads, and another might support a higher-throughput for write operations.

MMAPv1  – the original MongoDB storage engine and is the default storage engine for MongoDB versions before 3.2. It maps the data files directly to virtual memory allowing the operting system to do the most of the work of the storage engine.

WiredTiger – default storage engine starting in MongoDB 3.2. Details can be found here.

The storage engine determine:

  • data format – different storage engines can implement different types of compression, and different ways of storing the BSON for mongoDB
  • format of indexes – indexes are controlled by the storage engine. MongoDB uses Btrees. With MongoDB 3.0, WiredTiger is using B+ trees, with other formats expected to come in later releases.

To retrieve the current storage engine you can run below command:

MMAPv1

MMAPv1 uses mmap unix system call that maps files on disk directly into virtual memory space. This treats the data files like they were already in the memory.

MMAPv1 provides collection-level locking starting MongoDB 3.0 compared to database-level v2.2 & v2.6. MongoDB implements multiple readers – single writer locks.

By using journal (write ahead-log) MongoDB ensures consistency of the data. Using journal you write what you are about to do, then you do it. So if a disk failure occur while performing fsync() to the disk, the storage engine doesn’t perform the update.

See Journaling for more information about the journal in MongoDB.

By default, MongoDB uses Power of 2 Sized Allocations so that every document in MongoDB is stored in a record which contains the document itself and extra space, or padding. Padding allows the document to grow as the result of updates while minimizing the likelihood of reallocation.

WiredTiger

WiredTiger Storage Engine is the first pluggable storage engine and brings few new features to MongoDB:

  • Document Level Locking – good concurrency protocol – you can technically achieve no locks and writes could scale with the number of threads (assuming no update to the same document or limit the threads to number of cores).
  • Compression
  • It locks some pitfalls of MMAPv1.
  • Big performance gains

To swich MongoDB to use WiredTiger simply start mongod with:

Please be aware your existing mongoDB server should not contain any MMAPv1 existing databases into /data/db/.

WiredTiger stores data on disk in Btrees, similar with Btrees used by MMAPv1 is using for indexes. New writes are initially separate, performed on files on unused regions  and incorporated later in the background.

During an update, WiredTiger writes a new version of documents rather then overwriting existing data. So you don’t need to be worried about document moving or padding factor.

WiredTiger provides two caches:

  • WiredTiger Cache (WT Cache) –  half of your RAM (default)
  • File System Cache (FS Cache)

Checkpoints – act as recovery points and are handle the “transfer” data from WT cache to FS Cache and then to the disk. During a checkpoint data goes from the WT Cache to FS Cache and then flushed to disk. It initiates a new check point 60s after the end of the last checkpoint. Each checkpoint is a consistent shapshot of your data. During the write of a new checkpoint, the previous checkpoint is still valid. As such, even if MongoDB terminates or encounters an error while writing a new checkpoint, upon restart, MongoDB can recover from the last valid checkpoint.

Compression – since WiredTiger has it’s own cache and since the data in WT Cache doesn’t’ have to be in the same format as in FS Cache, WF allows 3 levels of compression:

  • Snappy (default) – fast
  • zlib – more compression
  • none

Additional links:

  1. https://docs.mongodb.org/manual/storage/
  2. http://www.wiredtiger.com/