$or query with projection significantly slower than in MongoDB distribution

Hi there,

my team is currently migrating from MongoDB Atlas (Community Edition) to a self-hosted cluster running a Percona MongoDB Distribution. While in general we are extremely excited to benefit from things like AWS IAM Authentication etc., we just discovered a performance shortcoming in Percona when comparing it to MongoDB for the following query (the content of the document is probably not further relevant, but as you will see it’s about an $or query that queries either for a time range with recurrence set to null, or a document in any timerange with a recurrence set):

{
  "$or": [
    {
      "end": {
        "$gte": ISODate('2023-08-29T09:42:30.064Z')
      },
      "start": {
        "$lte": ISODate('2023-08-29T10:42:30.064Z')
      },
      "recurrence": null,
      "resource": {
        "$in": [
          ObjectId("6074742a035dda4a37174065")
        ]
      }
    },
    {
      "recurrence": {
        "$type": "object"
      },
      "resource": {
        "$in": [
          ObjectId("6074742a035dda4a37174065")
        ]
      }
    }
  ]
}

The tricky part comes, when one wants to project the result to a certain set of properties only, e.g. like so: { _id: 1, recurrence: 1, start: 1, end: 1, resources: 1 }: when recurrence: 1 is included, the query becomes extremely slow - taking over a minute on our (admittedly not so small) collection of 13.5 million docs. If recurrence is omitted, it’s as fast as one second again.
Note that we of course have indices applied to that collection, but in all cases the explain query plan looks equal. The indexes being used are on resource,end,start for the first $or clause and on resource,recurrence for the second clause.

We are still on version 4.4.22 as of right now. Is this maybe a known issue that has already been improved in a more recent version?
As a workaround, we will split our $or queries where possible and adapt the projections to a minimum.

Looking forward to an answer.
Best regards,
Alex

Hi @thing-it-agd !

Happy to help you. To provide you with precise assistance, I’d need to see your query execution plan and know what indexes you have created.

// Slow query explain
db.collection.find({
  "$or": [ ... ]
}).project({ _id: 1, recurrence: 1, start: 1, end: 1, resources: 1 }).explain("executionStats")

// Fast query explain
db.collection.find({
  "$or": [ ... ]
}).project({ _id: 1, start: 1, end: 1, resources: 1 }).explain("executionStats")

Anyway, in your case the filter needs to check recurrence: null (which requires a FETCH) for every single document that matches the index scan part of the first $or clause, just to retrieve the value of the recurrence field. Given your 13.5M collection, if even a fraction of those are scanned by the index, this results in a huge number of document lookups, causing the query time to explode. Most likely in Atlas it was cached or there was a faster disk.

What you can do is:

  • Create a covering index:
    db.collection.createIndex({ resource: 1, end: 1, start: 1, recurrence: 1 })
  • Upgrade to a newer version. The query planner and execution engine have received substantial improvements in versions 5.0, 6.0, and 7.0. Specifically, there have been many enhancements to how $or queries are handled, index usage, and the cost-based query planner has become much more sophisticated. It is highly probable that a newer version would handle this specific query pattern more intelligently, even without the perfect covering index.