Aggregation Pipeline Stages in MongoDB – Set 2

Last Updated : 17 Oct, 2022

In MongoDB, the aggregation method takes the stages which work in the sequence. In this, the data go through multiple stages and return the resulting output. There are different types of stages in the MongoDB called aggregation pipeline stages.

Syntax:

db.<collection_name>.aggregate(pipeline, options)

or

db.<collection_name>.aggregate( [ { <stage1> }, {<stage2>},… ] )

A pipeline is a sequence of aggregation operations and stages. It is of array type. If the pipeline is not mentioned then it returns all the documents from the collection.
Aggregation Stages always work in sequence means stage1 will execute first and pass the result to stage2 and so on.
Options are optional and only used when you specify pipeline as an array.
Aggregation is used to enhance performance by optimizing the query.
Use variable and field names with a dollar sign($) and string without a dollar sign.

$group Stage

$group stage is used to group together the documents on the basis of specified _id expression.

Syntax:

db.<collection_name>.aggregate(
[
{ $group:
{
_id: <expression>,
<field1>: { <accumulator1> : <expression1> }, …
}
}
]);

The following are the fields of the $group stage

_id is the field in the input document which is used to group the documents together.
field is optional. It is calculated by the accumulator operator. Accumulation operators are $count, $avg, $first, $last, $max etc.

Example:

In the following example, the database name is ‘company’ and we are using employee_details as a collection. The Collection ’employee_details’ contains 6 documents. The following aggregation operation uses $unwind and $group stages. In the begin with, we are using the $unwind stage to deconstruct that document on the basis of ‘skills’ field. In the next stage, we are using the $group stage where we group the coming documents from the prior stage on the basis of the ‘skills’ field. The count is used to count the number of employees who have specific skills fields.

db.employee_details.aggregate(
[
{$unwind : “$skills”},
{$group : {_id :”$skills”, count:{$count:{}}}}
]);

The above operation returns the following document. Here, 3 employees have Java skills, and 6 employees have MongoDB skill.

{ “_id” : “Java”, “count” : 3 }
{ “_id” : “MongoDB”, “count” : 6 }
{ “_id” : “AWS”, “count” : 1 }
{ “_id” : “Javascript”, “count” : 5 }
{ “_id” : “React”, “count” : 3 }

$bucket Stage

$bucket stage is used to group together the documents on the basis of specified boundaries. If a document did not contain the field or that field is outside the buckets, it would be placed in the default bucket. Instead of considering the exact value of the field like in the $group stage, we use the boundaries of the field in $bucket.

Syntax:

db.<collection_name>.aggregate(
[
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, … ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
<output2>: { <$accumulator expression> }, …
}
}
}
]);

The following are the fields of the $bucket stage:

groupBy is the field name from which we will group together the documents. Example: ‘salary’ is the field name.
boundaries take the arrays of values of the field. Example: Suppose boundaries values for ‘salary’ are [30000, 50000, 70000, 90000, 100000] so, 30000-50000 (include lowerbound 30000 and exclude upperbound 50000), 50000-70000, 70000-90000 and 90000-100000 are the buckets.
default is optional. It is a string literal that specifies the _id when the field name value does not fall into the bucket specified by boundaries. Example: Suppose the value of default: “Others”, ‘salary’ which does not fall
in the above-mentioned buckets has ‘_id’ : “Others”.
output is optional. It tells which fields to include with ‘_id’ in the output document.

Example:

In the following example, the database name is ‘company’ and we are using employee_details as a collection. The Collection ’employee_details’ contains 6 documents. The following aggregation operation uses the $bucket stage. We are using the $bucket stage to group together the documents on the basis of ‘salary’ field boundaries. Here, boundaries values for ‘salary’ field are [30000, 50000, 70000, 90000, 100000]. So, 30000-50000 (include lowerbound 30000 and exclude upperbound 50000), 50000-70000, 70000-90000, and 90000-100000 are the buckets.

db.employee_details.aggregate([
{
$bucket: {
groupBy: “$salary”,
boundaries: [30000, 50000, 70000, 90000, 100000],
}
}
]);

The above operation returns the following documents

{ “_id” : 30000, “count” : 2 }
{ “_id” : 50000, “count” : 1 }
{ “_id” : 70000, “count” : 3 }

$unionWith Stage

$unionWith stage is used to combine the fields of two or more collections and passes the result to the next stage.
This operation also includes duplicate entries.

Syntax:

db.<collection_name_1>.aggregate(
[
{ $unionWith: { coll: “<collection_name_2>”, pipeline: [ <stage1>, <stage2>,.. ] } }
]);

The following are the fields of the $unionWith stage

coll field is used to mention another collection.
pipeline is the aggregation pipeline that we apply to another collection mentioned by the coll field.

Example:

In the following example, the database name is ‘company’ and we are using two collections: ’employee_details’ and ’employee_details_contact’. The Collection ’employee_details’ contains 6 documents and employee_details_contact contains 4 documents. The following aggregation operation uses the $project stage and $unionWith stage. Firstly, we are using the $project stage to select the ‘address.city’ field from ’employee_details’ and pass the result to the next stage. In the next stage, we are using the $unionWith stage where ’employee_details_contact’ is the coll field and pipeline has $project stage to select the ‘city’ field from the ’employee_details_contact’ collection. 1 and 0 denote inclusion and exclusion respectively.

db.employee_details.aggregate([
{
$project : {
“address.city”: 1, _id: 0
}
},
{
$unionWith:{ coll: “employee_details_contact”,
pipeline: [{
$project : {
“city”: 1, _id: 0
}
}]
}}
]);

The following is the output of the above operation. Here, the $unionWith stage combines the two collections and returns a total of 10 documents. The First 6 documents belong to the ’employee_details’ collection and the next 4 belong to the ’employee_details_contact’ collection.

{ “address” : { “city” : “Mumbai” } }
{ “address” : { “city” : “Dehradun” } }
{ “address” : { “city” : “Gurgaon” } }
{ “address” : { “city” : “Dehradun” } }
{ “address” : { “city” : “Delhi” } }
{ “address” : { “city” : “Hisar” } }
{ “city” : “Hisar” }
{ “city” : “Gurgaon” }
{ “city” : “Mumbai” }
{ “city” : “Dehradun” }

$lookup Stage

The $lookup stage is used to add a new array field whose elements are the matching documents from another collection. The $lookup stage passes the result documents to the next stage.

Syntax:

db.<collection_name>.aggregate([
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the “from” collection>,
as: <output array field>
}
}]

The following are the fields of the $lookup stage

from is used to mention another collection.
localField is used to mention the field from the input documents of the <collection_name> collection. If the documents of the <collection_name> collection do not contain the localField then $lookup treats it as null.
foreignField is used to mention the field from the documents which belong to from collection that matches the localField. If the documents of the from collection do not contain the foreignField then $lookup treats it as null.
as is the name of the new array which contains the matching documents of the from collection

Example:

Firstly, we are using the $skip stage to skip the first 4 documents and pass the remaining 2 documents to the next stage.
In the next stage, we are using the $lookup stage where ’employee_details_contact’ is the from collection. ‘address.city’ is the localField that belongs to the ’employee_details’ collection and ‘city’ is the foreignField that belongs to the ’employee_details_contact’ and matches the localField. ’employee_complete_details’ is the ‘as’ field that is the array which contains the matching documents of the ‘ from‘ collection.

db.employee_details.aggregate([
{
$skip : 4
},
{
$lookup:
{
from: “employee_details_contact”,
localField: “address.city”,
foreignField: “city”,
as: “employee_complete_details”
}
}])

$match stage

$match stage is used to filter out the documents on the basis of specified conditions and passes the result to the next stage in the pipeline.

Syntax:

db.<collection_name>.aggregate([
{
$match: { <filter> }
}
])

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $match stage. {“name.firstName” : “Romal”} is the condition on the basis of which we filter out the documents from the collection.

db.employee_details.aggregate([
{
$match: {“name.firstName” : “Romal”}
}
])

$replaceWith stage

$replaceWith stage is used to replace the input documents of the collection or documents coming from the prior stage in the pipeline with the replacement document. It replaces all the fields of the input document including the _id field.

Note: $replaceWith is the new stage in version 4.2 and it is the alias for $replaceRoot.

Syntax:

db.<collection_name>.aggregate([
{ $replaceWith: <replacementDocument> }
])

Here, replacementDocument is the document that replaces the input documents completely even the _id field.

Example:

{“full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},”salary” : “$salary” } is the replacementDocument.
In this example, we are replacing the input documents completely with the replacementDocument.

db.employee_details.aggregate([
{ $replaceWith: {
“full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
“salary” : “$salary”
} }
])

$sample stage

$sample stage is used to randomly select the documents from the collection or documents coming from the prior stage in the pipeline.

Syntax:

db.<collection_name>.aggregate([
{
$sample: { size: <positive integer N> }
}
])

Example:

In the following example, the database name is ‘company’ and we are using ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $sample stage:

size is the field of $sample stage which takes a positive integer.
In this example, we are using 2 as a size. So, the $Sample stage will give 2 random documents from the input documents.

db.employee_details.aggregate([
{
$sample: { size: 2}
}
])

$out stage

$out stage is used to write the documents coming from the previous stages in the pipeline in different databases and collections or in the same database but in different collections. It must be the last stage in the pipeline.

Note: $out stage is new staging in version 2.6.

Syntax:

db.<collection_name>.aggregate([
{ $out: <output-collection> or { db: “<output-db>”, coll: “<output-collection>” } }
])

$out stage takes either a string that represents the collection or a document with two fields db and coll representing the database and collection respectively as a value.

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses $replaceWith and $out stage:

Here, $out stage is taking a document { db: “my_db”, coll: “employee_salary_info” } as a value where my_db is the new database and employee_salary_info is the new collection.
It will create a new database with the name my_db and a collection with the name employee_salary_info on the MongoDB server.
The output document comes from the $replaceWith stage written in the employee_salary_info collection.

db.employee_details.aggregate([
{ $replaceWith: {
“full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
“salary” : “$salary”
}},
{$out: { db: “my_db”, coll: “employee_salary_info” }}
])

$merge stage

$merge stage is used to write the documents coming from the previous stages in the pipeline in different databases and collections or in the same database but in different collections. It must be the last stage in the pipeline.

Note: $out stage is new staging in version 4.2.

Syntax:

db.<collection_name>.aggregate([
{ $merge: {
into: <collection> or { db: <db>, coll: <collection> }
} }
])

into takes either a string that represents the collection or a document with two fields db and coll representing the database and collection respectively.

Example:

Here, $merge stage is taking a document as a value {into : { db: “my_merge_db”, coll: “employee_salary_info” }}
‘into’ is the field of the $merge stage which takes document { db: “my_merge_db”, coll: “employee_salary_info” } as a value where my_merge_db is the new database and employee_salary_info is the new collection.
It will create a new database with the name my_merge_db and collection with the name employee_salary_info on the MongoDB server.
The output document comes from the $replaceWith stage written in the employee_salary_info collection.

db.employee_details.aggregate([
{ $replaceWith: {
“full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
“salary” : “$salary”
}},
{$merge: {into : { db: “my_merge_db”, coll: “employee_salary_info” }}}
])

$facet stage

$facet stage is used to apply multiple aggregation stages in a single pipeline stage. It returns an array of documents.

Syntax:

db.<collection_name>.aggregate([
{ $facet:
{
<outputField1>: [ <stage1>, <stage2>, … ],
<outputField2>: [ <stage1>, <stage2>, … ],
…
}
}
])

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $facet stage.

$facet stage takes sub-pipelines. Here, we are taking two sub-pipelines i.e categorizedBySkills and categorizedBySalary. Both sub-pipelines do some operations on the same input documents.
categorizedBySkills sub-pipeline is using $unwind and $group stages.
categorizedBySalary is using $replacewith stage.

db.employee_details.aggregate([
{
$facet:{
categorizedBySkills :[
{$unwind : “$skills”},
{$group : {_id :”$skills”, count:{$count:{}}}}
],
categorizedBySalary : [
{ $replaceWith: {
“full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
“salary” : “$salary”
}},
]
}
}
])