AWS DynamoDB – Working with Scans
Amazon DynamoDB is NoSQL managed database that stores semi-structured data like key-value pairs and document data. When creating tables in DynamoDB, no schema structure is required but only a partition key (primary key) is required. DynamoDB tables stores data in form of items and each item consists of attributes that are nothing but key-value pairs. To differentiate between items, the partition key is defined.
"Name": "The Shawshank Redemption",
In this article, we will discuss how to scan items in a table. A scan operation in Amazon DynamoDB reads every item in a table. By default, a Scan operation returns all the items present in the table. Scan always returns a result set. If no matching records are found the result set is empty. A maximum of 1 MB can be retrieved in a scan operation. For scanning data items, we have many features provided by Amazon DynamoDB. The approach to scan data items is given below:
- Create a table and add items: To perform scanning creates a table in Dynamodb, say, Movies with partition key as MoviesID, and add items in the table. See the below image:
- Perform scanning on data: To scan data in a table, Dynamodb provides the below functionalities:
- Filter: To refine our search we require filters. If no filter is provided then all the data is printed. In filter, we specify an attribute, and its value to obtain results. In the below example, we have selected the attribute ‘Director’ and its value as ‘Christopher Nolan’. See the below image:
- Limit: Another feature of scan is limit. It limits the number of items that are obtained in the result. The limit can only be used in Amazon CLI (Command Line Interface). Therefore, set the limit parameter to the number of items that you want to retrieve from the scan operation, prior to filter expression evaluation.
- Paging: This feature can only be availed when using Amazon CLI (Command Line Interface). When data retrieved is more than 1 MB in the result set, then the result is divided into pages, each page containing up to 1 MB. For instance, if 2 MB data is retrieved then there will be at least 2 pages.
- Capacity Units Consumed: A read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. A scan operation does not return any data on the read capacity units consumed. However, you can specify the ReturnConsumedCapacity parameter in a Scan request to obtain this information or change the read capacity unit in the capacity tab of the table. See the below image:
- Read Consistency: By default, a scan operation performs eventually consistent reads. Meaning, the scan results might not include changes due to the recently completed PutItem or UpdateItem request. If required strongly consistent reads, as of the time that the Scan begins, then set the ConsistentRead parameter to true in the Scan request. By doing so, it ensures that all the write operations that completed before the Scan began are included in the Scan result set.
- Parallel Scan: The scan operation logically divides a table or secondary index into multiple segments, each being scanned parallelly by multiple applications. Each worker can be an operating system process or a thread (in programming languages that support multithreading).