Unlike Global Entities and User Entities, the Custom Entity service stores each app's custom entity collection in a separate MongoDb collection for that app alone -- which in turn allows the developer to define custom indexes specific to that collection -- to improve both scalability and performance.

This article will give an overview of how MongoDB indexes work and how they should be defined.

---

Background

Before we begin - lets look at how an owned entity is stored - the Henchman collection:

{
    "_id" : ObjectId("1234512341abcdef12346589"),
    "entityId" : "aaaaaaaa-bbbb-cccc-dddd-111111111111",
    "version" : 1,
    "acl" : {
        "other" : 0
    },
    "ownerId" : "fffffffff-hhhh-iiii-jjjj-kkkkkkkkkkkk",
    "expiresAt" : null,
    "timeToLive" : null,
    "createdAt" : 1652713414210,
    "updatedAt" : 1652713414210,
    "data" : {
        "characterName" : "Star Ralph",
        "class" : "pirate",
        "level" : 2,
        "health": 12,
        "attach": 5,
        "defence": 5,
        "active": true
    }
}

And here is an example unowned entity - the ShipRef collection:

{
    "_id" : ObjectId("1234512341abcdef12346589"),
    "entityId" : "aaaaaaaa-bbbb-cccc-dddd-2222222222222",
    "version" : 1,
    "acl" : {
        "other" : 1
    },
    "expiresAt" : null,
    "timeToLive" : null,
    "createdAt" : 1652713414210,
    "updatedAt" : 1652713414210,
    "data" : {
        "shipType" : "fighter",
        "shipId" : "starFalcon",
        "level" : 2,
        "hyperspace": true,
        "lasers": true
        "shields": true,
        "attack": 10,
        "defence": 10,
        "maxHealth": 5,
        "available": true
    }
}

Lets quickly go though the fields:

the _id field is an internal MongoDB field - and brainCloud never exposes it
entityId is the unique id (a GUID) for this entity in this specific collection
version is the field brainCloud uses to control edits and ensure data integrity. brainCloud does not store multiple versions of the object. The version is merely an incrementing number that controls edits.
acl stores the permissions for this object
ownerId identifies the Profile (i.e. user account) that owns this entity. Entities in unowned collection do not have owners.
timeToLive and expiresAt are used by the Time-to-Live feature to automatically delete an entity at the appropriate time
createdAt and updatedAt are used to keep track of when the object was created and last edited
Finally, data is where all of the developer's custom data for the object (i.e. all the important stuff) goes.

brainCloud automatically creates a number of indexes for the core fields of the entity, but it doesn't create any for the fields in the data section <- since those are totally custom to your app, and we don't know what they are!

brainCloud automatically creates the following indexes:

{ "entityId": 1 } - the primary key for edits, deletes, etc.
{ "expiresAt": 1 } - this index is used with MongoDB's expiry index feature to automatically clean up objects using the Time-to-Live feature
{ "ownerId": "hashed" } - this index is only used in Owned Custom Entity Collections, and is a sharding-compatible index used to quickly find objects owned by the User

---

So - should you create your own custom indexes?

The answer to this question is usually YES -- but there are a few exceptions.

You may not need custom indexes for your collection if:

It is an Owned Custom Entity collection accessed via the Singleton API. If this is the case, then the ownerId index already present in the collection should be sufficient.
It is an Owned Custom Entity collection accessed via a Non-Sys API, and each user account will only have a few of the entities. Once again, the ownerId index should ensure that queries are only looking at a few objects at a time (based on the profileId filter) - so you can still get away without custom indexes.
If your app is simply using the object's entityId to retrieve the object. This may be the case where the objects in this collection are being retrieved via a direct reference from another collection.

In all other cases, you are probably looking up the objects via custom fields in the data portion of the object - and you should define one or more custom indexes for that purpose!

---

Auditing queries

Ideally MongoDB performs its queries on indexes (which are fast), and then returns the results from the document (i.e. object) collections. If an appropriate index is not available (or not complete), a portion of the query filtering will have to read the documents themselves, which can be very slow if lots of documents must be inspected.

The first step to deciding what indexes to create involves performing an audit of the sort of queries your app is performing. These would be the custom field criteria (and sorting fields) that your app sends to the following Custom Entity methods:

Standard calls

These calls can be made directly from the client or from cloud-code scripts. They adhere to ACL permissions. Important: they pre-pend an ownerId field to all queries for owned collections.

GetEntityPage() / GetEntityPageOffset() <- responsible for most query traffic
DeleteEntities()
GetCount()
GetRandomEntitiesMatching()

Sys calls

Sys calls are cloud-code only and ignore acl permissions. They do not prepend an ownerId for calls.

SysGetEntityPage() / SysGetEntityPageOffset()
SysDeleteEntities()
SysGetCount()
SysGetRandomEntitiesMatching()

---

Defining Indexes

Indexes should be defined for each of the major queries - especially if you expect there to be more than a hundred or so objects that will need to be examined to satisfy the query.

Note that MongoDB can use an index even if all the fields in the index are not in the query -- as long as the fields that *are* in the query are at the *beginning* of the index.

So - for example, if MongoDB has this index:

{ "a": 1, "b": 1, "c": 1 }

It can satisfy the following queries:

{ "a": 5, "b": "hello", "c": {"$gt": 3.2}} <- i.e. "a" == 5 AND "b" == "hello" and "c" >= 3.2
{ "a": 5 } <- i.e. "a" == 5 <- works because "a" is the first field
{ "a": 5, "b": "hello" } <- i.e. "a" == 5 AND "b" == "hello" <- works because "a" and "b" are the first two fields

Note that the order of the fields in the query doesn't necessarily matter - so the index above can just as easily support query { "b": "hello", "a": 5 }

It cannot, however, satisfy the following queries:

{ "c": {"$gt": 3.2} } <- "c" is not the first field in the index
{ "b": "hello", "c": {"$gt": 3.2} } <- "b" and "c" are not the first two fields in the index

Important - when defining your indexes, be sure to add the "data." prefix for any of the custom fields you are indexing on. See examples at the bottom of this page.

---

What about index options?

Most of the time your indexes will not need to use index options. Some options, like partial index filters, can be used to create smaller more targeted indexes.

For more information on index options, see the MongoDB documentation link at the bottom of this article.

---

Compound indexes vs. multiple single indexes (i.e. index intersections)

Although MongoDB documentation states that the database can take advantage of the intersection of multiple indexes in a query, our experience is that the database rarely chooses to do so at real-time.

We therefore recommend that devs create compound (i.e. multi-field) indexes to cover their high traffic queries.

---

Naming indexes

Note that when naming indexes, you normally just name based on the fields in the index.

So for example, for an index with { "data.a": 1, "data.b": 1 } - you might name it dataA_dataB.

---

Simple example

Lets try creating some indexes for the sample object introduced at the top of the page. Repeating the object here again for convenience:

{
    "_id" : ObjectId("1234512341abcdef12346589"),
    "entityId" : "aaaaaaaa-bbbb-cccc-dddd-111111111111",
    "version" : 1,
    "acl" : {
        "other" : 2
    },
    "expiresAt" : null,
    "timeToLive" : null,
    "createdAt" : 1652713414210,
    "updatedAt" : 1652713414210,
    "data" : {
        "shipType" : "fighter",
        "shipId" : "starFalcon",
        "level" : 2,
        "hyperspace": true,
        "lasers": true
        "shields": true,
        "attack": 10,
        "defence": 10,
        "maxHealth": 5,
        "damage": 0,
        "active": true
    }
}

For our example object, the following queries might be common:

{} <- list all ships (no matter what type, etc.)
{ "data.level": { "$lte": 2 }} <- list all ships of level 2 or less
{ "data.shipType": "fighter" } <- list all "fighter" ships
{ "data.shipType": "fighter", "data.hyperspace": 1 } <- list all ships with hyperspace capability

Given the above sets of queries, the following indexes would be appropriate:

{ "data.level": 1 }
{ "data.shipType": 1, "data.hyperSpace": 1 } <- will satisfy both query 3 & 4

---

Advanced index optimization

Note that the order of the fields in an index is significant - besides affecting what subset queries can use the index. Certain queries will perform faster with indexes that where the fields are in the optimal order.

For more information - see MongoDB's documentation regarding the ESR (Equality, Sort, Range) Rule.

---

For more information

For more information on MongoDB indexes, see their documentation

And if you have more questions - reach out to brainCloud support!

Defining and optimizing custom entity indexes