Pages

Jun 28, 2021 pattern bucket paging mongo

Purpose

The purpose of this document is to learn about MongoDB pagination to serve data faster from their blog.

Use Case

Suppose you need a page that shows the stocks a person has bought and sold.

| Ticker | Type   | Quantity | Date                     |
| ------ | ------ | ---------|--------------------------|
| AAPL   | Buy    | 50       | 2021-06-25 11:01:38.451Z |
| MDB    | Buy    | 21       | 2021-06-25 11:03:28.421Z |
| MSFT   | Buy    | 100      | 2018-06-25 11:06:38.491Z |
...
5000 more

Iteration 1: Simple Mongo Data Model

Pros:

It is easy to insert, update and delete as it stores just one record per transaction.

Cons:

Data fetching - suppose we want to show 50 transactions per page. Here are the queries:

  1. db.history.find({“customerId”:700000}).sort({date:1}).skip(0).limit(50)
  2. db.history.find({“customerId”:700000}).sort({date:1}).skip(50).limit(50)
  3. db.history.find({“customerId”:700000}).sort({date:1}).skip(100).limit(50)

Let's break that down:

  1. The first query uses the index on customerId and date to find the first 50 documents.
  2. The second query uses the index to find 100, skips 50, and returns the next 50.
  3. The third query uses the index to find 150, skips 100, and returns the next 50.
  4. Wait what happens to the 1000th query? Does the DB find 3000, skip 2950 and return 50? Damn Right. This is a problem.

I've used 50 transactions as an example, you could use 1000 as well as long a the limits allow.

Iteration 2: Store page date and apply $gt on it

When the customer asks for the first 50 entries:

db.history.find({"customerId":700000})
          .sort({date:1})
          .limit(50)

For the next, use the 50th transaction date and run a $gt query for the next:

db.history.find({"customerId":700000, date: {$gt: ISODate("2020-08-16T00:20:09.000Z"}})
          .sort({date:1})
          .limit(50)

Similarly for the rest

Pros:

  1. Faster than the first approach.
  2. DB isn't going over the first 3000 transactions just to fetch the last 50.

Cons:

  1. To find page 1000, we need the transaction date of the previous page.
  2. This is time-consuming for higher pages. Problem.

Iteration 3: Use a bucket pattern!

The idea is to group 50 date-sorted transactions into a single record.

a. This is how it looks:

{
    "_id": "7000000_1540568823",
    "customerId": 7000000,
    "count": 2,
    "history": [{
        "type": "buy",
        "ticker": "MDB",
        "qty": 419,
        "date": ISODate("2018-10-26T15:47:03.434Z")
    }, {
        "type": "sell",
        "ticker": "MDB",
        "qty": 29,
        "date": ISODate("2018-10-30T09:32:57.765Z")
    }]
}

b. This is how to query:

db.history.find({"_id":/^700000_/})
          .sort({_id:1})
          .skip(0) // increment for next pages
          .limit(1)

c. This is how to update/insert

db.history.updateOne({
        "_id": /^7000000_/,
        "count": {
            $lt: 50
        }
    }, {
        "$push": { <- appends to the end of the document 
            "history": {
                "type": "buy",
                "ticker": "MDB",
                "qty": 25,
                "date": ISODate("2018-11-02T14:43:10")
            }
        },
        "$inc": { <- increments the count
            "count": 1
        },
        "$setOnInsert": { <- updates id only for inserts
            "_id": "7000000_1541184190"
        }
    }, {
        upsert: true <- tells mongo that both insert and update are possible
    })

Pros:

  1. Fewer DB Calls: Since each document has 50 entries, only one document needs to be fetched to display a page.
  2. Skip and Limit are still being used but for one document instead of 50 which is a lot more performant.

Cons:

  1. Deleting a transaction: Suppose a transaction is deleted, the next insert will drop into an incorrect record, i.e that record won't be date sorted anymore. Here are some solutions:
    1. Have a start/end date field and update based on that.
    2. Need more expressive update instructions - 4.2
  2. Inserting a transaction later on: To enforce date-sorted records, we need to push transactions from one document to another.
  3. Record Size: MongoDB has a 16MB limit - 50 entries need to fall within that.

Even with these cons, a lot of use-cases can be solved with the bucket pattern approach.

Shoutouts to

  1. docker mongo - quickly spin up a container running mongo.

    Create a docker-compose.yml file, docker-compose up, docker ps will show your mongo container.

    version: '3.7'
    services:
      mongo:
        image: mongo
        ports:
          - 27017:27017
    
  2. mgeneratejs - quickly create fake mongo data.

    for i in `seq 1 100`; 
    do mgeneratejs '{"ticket": "MDB", "customerId": 700000, "type":"buy", "qty": {"$integer": {"min": 0, "max": 5}}, "date": {"$date": {"min": "2015-01-01", "max": "2021-12-31T23:59:59.999Z"}}}' -n 100 > a;
    docker cp a 03a699a968c3:/; 
    docker exec -it 03a699a968c3 bash -c "cat a | mongoimport --uri mongodb://localhost:27017/stocks --collection history --mode insert";
    done
    
  3. robo3t - querying mongo.

Conclusion

We discussed two approaches to solve the pagination problem. Both have their pros and cons like any engineering/life challenge. Pick a solution based on your use case and apply it.

Thanks for reading along! I'm @anirudhonezero on Twitter.

References

  1. https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-1
  2. https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-2