Open In App

Aggregation Pipeline Stages in MongoDB – Set 2

In MongoDB, the aggregation method takes the stages which work in the sequence. In this, the data go through multiple stages and return the resulting output. There are different types of stages in the MongoDB called aggregation pipeline stages.

Syntax:



db.<collection_name>.aggregate(pipeline, options) 

or 



db.<collection_name>.aggregate( [ { <stage1> }, {<stage2>},… ] )

$group Stage

$group stage is used to group together the documents on the basis of specified _id expression.

Syntax:

db.<collection_name>.aggregate(
 [ 
    { $group:
              {
                    _id: <expression>, 
                   <field1>: { <accumulator1> : <expression1> }, …
              }
    }
]);

The following are the fields of the $group stage

Example:

In the following example, the database name is ‘company’ and we are using employee_details as a collection. The Collection ’employee_details’ contains 6 documents. The following aggregation operation uses $unwind and $group stages. In the begin with, we are using the $unwind stage to deconstruct that document on the basis of ‘skills’ field. In the next stage, we are using the $group stage where we group the coming documents from the prior stage on the basis of the ‘skills’ field. The count is used to count the number of employees who have specific skills fields.

  db.employee_details.aggregate(
  [
     {$unwind : “$skills”},
     {$group : {_id :”$skills”, count:{$count:{}}}}
 ]);

The above operation returns the following document. Here, 3 employees have Java skills, and 6 employees have MongoDB skill.

{ “_id” : “Java”, “count” : 3 }
{ “_id” : “MongoDB”, “count” : 6 }
{ “_id” : “AWS”, “count” : 1 }
{ “_id” : “Javascript”, “count” : 5 }
{ “_id” : “React”, “count” : 3 }

 

$bucket Stage

$bucket stage is used to group together the documents on the basis of specified boundaries. If a document did not contain the field or that field is outside the buckets, it would be placed in the default bucket. Instead of considering the exact value of the field like in the $group stage, we use the boundaries of the field in $bucket.

Syntax: 

db.<collection_name>.aggregate(
[   
    {
      $bucket: {
          groupBy: <expression>,
          boundaries: [ <lowerbound1>, <lowerbound2>, … ],
          default: <literal>,
          output: {
               <output1>: { <$accumulator expression> },
               <output2>: { <$accumulator expression> }, …
          }
       }
    } 
]);

The following are the fields of the $bucket stage:

Example: 

In the following example, the database name is ‘company’ and we are using employee_details as a collection. The Collection ’employee_details’ contains 6 documents. The following aggregation operation uses the $bucket stage. We are using the $bucket stage to group together the documents on the basis of ‘salary’ field boundaries. Here, boundaries values for ‘salary’ field are [30000, 50000, 70000, 90000, 100000]. So, 30000-50000 (include lowerbound 30000 and exclude upperbound 50000), 50000-70000, 70000-90000, and 90000-100000 are the buckets.

db.employee_details.aggregate([
   {
     $bucket: {
       groupBy: “$salary”,
       boundaries: [30000, 50000, 70000, 90000, 100000],
     }
   }
]);

The above operation returns the following documents

{ “_id” : 30000, “count” : 2 }
{ “_id” : 50000, “count” : 1 }
{ “_id” : 70000, “count” : 3 }

 

$unionWith Stage

$unionWith stage is used to combine the fields of two or more collections and passes the result to the next stage.
This operation also includes duplicate entries.

Syntax:

db.<collection_name_1>.aggregate(
[  
    { $unionWith: { coll: “<collection_name_2>”, pipeline: [ <stage1>, <stage2>,.. ] } }
]);

The following are the fields of the $unionWith stage 

Example:

In the following example, the database name is ‘company’ and we are using two collections: ’employee_details’ and ’employee_details_contact’. The Collection ’employee_details’ contains 6 documents and employee_details_contact contains 4 documents. The following aggregation operation uses the $project stage and $unionWith stage. Firstly, we are using the $project stage to select the ‘address.city’ field from ’employee_details’ and pass the result to the next stage. In the next stage, we are using the $unionWith stage where  ’employee_details_contact’ is the coll field and pipeline has $project stage to select the ‘city’ field from the ’employee_details_contact’ collection. 1 and 0 denote inclusion and exclusion respectively.

db.employee_details.aggregate([  
   {
       $project : {
           “address.city”: 1, _id: 0
       }
   },
   { 
     $unionWith:{  coll: “employee_details_contact”,
     pipeline: [{
           $project : {
               “city”: 1, _id: 0
           }
        }] 
   }}
]);

The following is the output of the above operation. Here, the $unionWith stage combines the two collections and returns a total of 10 documents. The First 6 documents belong to the ’employee_details’ collection and the next 4 belong to the ’employee_details_contact’ collection. 

{ “address” : { “city” : “Mumbai” } }
{ “address” : { “city” : “Dehradun” } }
{ “address” : { “city” : “Gurgaon” } }
{ “address” : { “city” : “Dehradun” } }
{ “address” : { “city” : “Delhi” } }
{ “address” : { “city” : “Hisar” } }
{ “city” : “Hisar” }
{ “city” : “Gurgaon” }
{ “city” : “Mumbai” }
{ “city” : “Dehradun” }

 

$lookup Stage

The $lookup stage is used to add a new array field whose elements are the matching documents from another collection. The $lookup stage passes the result documents to the next stage.

Syntax: 

db.<collection_name>.aggregate([ 
  {
     $lookup:
      {
        from: <collection to join>,
        localField: <field from the input documents>,
        foreignField: <field from the documents of the “from” collection>,
        as: <output array field>
     }
  }]

The following are the fields of the $lookup stage 

Example: 

In the following example, the database name is ‘company’ and we are using two collections: ’employee_details’ and ’employee_details_contact’. The Collection ’employee_details’ contains 6 documents and employee_details_contact contains 4 documents. The following aggregation operation uses the $skip a stage and $lookup stage:

db.employee_details.aggregate([ 
 {
    $skip : 4
 },
 {
    $lookup:
     {
       from: “employee_details_contact”,
       localField: “address.city”,
       foreignField: “city”,
       as: “employee_complete_details”
    }
 }])

 

$match stage

$match stage is used to filter out the documents on the basis of specified conditions and passes the result to the next stage in the pipeline.

Syntax: 

db.<collection_name>.aggregate([ 
    { 
        $match: { <filter> } 
    }
])
 

Example: 

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $match stage. {“name.firstName” : “Romal”} is the condition on the basis of which we filter out the documents from the collection.

db.employee_details.aggregate([ 
   { 
       $match: {“name.firstName” : “Romal”} 
   }
])

 

$replaceWith stage

$replaceWith stage is used to replace the input documents of the collection or documents coming from the prior stage in the pipeline with the replacement document. It replaces all the fields of the input document including the _id field.

Note: $replaceWith is the new stage in version 4.2 and it is the alias for $replaceRoot.

Syntax:

db.<collection_name>.aggregate([ 
      { $replaceWith: <replacementDocument> }
])

Here, replacementDocument is the document that replaces the input documents completely even the _id field.

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $replaceWith stage.

db.employee_details.aggregate([ 
     { $replaceWith: {
           “full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
           “salary” : “$salary”
     } }
])

 

$sample stage

$sample stage is used to randomly select the documents from the collection or documents coming from the prior stage in the pipeline.

Syntax:

db.<collection_name>.aggregate([ 
      {    
         $sample: { size: <positive integer N> }
     }
])

Example:

In the following example, the database name is ‘company’ and we are using ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $sample stage:

db.employee_details.aggregate([ 
   {    
      $sample: { size: 2}
  }
])

 

$out stage

$out stage is used to write the documents coming from the previous stages in the pipeline in different databases and collections or in the same database but in different collections. It must be the last stage in the pipeline.

Note: $out stage is new staging in version 2.6.

Syntax:

db.<collection_name>.aggregate([ 
  { $out: <output-collection> or  { db: “<output-db>”, coll: “<output-collection>” } }
])

$out stage takes either a string that represents the collection or a document with two fields db and coll representing the database and collection respectively as a value.

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses $replaceWith and $out stage:

db.employee_details.aggregate([ 
   { $replaceWith: {
          “full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
          “salary” : “$salary”
    }},
   {$out: { db: “my_db”, coll: “employee_salary_info” }}
])

 

$merge stage 

$merge stage is used to write the documents coming from the previous stages in the pipeline in different databases and collections or in the same database but in different collections. It must be the last stage in the pipeline.

Note: $out stage is new staging in version 4.2.

Syntax:

db.<collection_name>.aggregate([ 
   { $merge: {
       into: <collection> or { db: <db>, coll: <collection> }              
  } }
])

into takes either a string that represents the collection or a document with two fields db and coll representing the database and collection respectively.

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $replaceWith and $merge stage:

db.employee_details.aggregate([ 
   { $replaceWith: {
          “full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
          “salary” : “$salary”
    }},
   {$merge: {into : { db: “my_merge_db”, coll: “employee_salary_info” }}}
])

 

$facet stage

$facet stage is used to apply multiple aggregation stages in a single pipeline stage. It returns an array of documents. 

Syntax:

db.<collection_name>.aggregate([ 
     { $facet:
        {
            <outputField1>: [ <stage1>, <stage2>, … ],
            <outputField2>: [ <stage1>, <stage2>, … ],
             …
       }
    }
])

Example:

In the following example, the database name is ‘company’ and we are using the ’employee_details’ collection that has 6 documents. The following aggregation operation uses the $facet stage.

db.employee_details.aggregate([ 
 {
   $facet:{
       categorizedBySkills :[
           {$unwind : “$skills”},
           {$group : {_id :”$skills”, count:{$count:{}}}}
       ],
       categorizedBySalary : [
           { $replaceWith: {
               “full_name”: {$concat : [ “$name.firstName”,’ ‘, “$name.lastName”]},
               “salary” : “$salary”
           }},
       ]
    } 
  }
])

 


Article Tags :