Amazon DynamoDB: NoSQL Database Design and Optimization
- Sujeet Prajapati

- Oct 17
- 10 min read
Publication Week: Week 7
Amazon DynamoDB stands as one of the most powerful NoSQL database services in the cloud computing landscape. As a fully managed service, it promises single-digit millisecond performance at any scale, making it the go-to choice for modern applications that demand high performance and seamless scalability. Whether you're building a mobile app, web application, gaming platform, or IoT solution, understanding DynamoDB's design principles and optimization techniques is crucial for success.
Understanding DynamoDB Fundamentals
DynamoDB is a key-value and document database that delivers consistent performance regardless of scale. Unlike traditional relational databases, DynamoDB uses a different approach to data organization and retrieval that requires a shift in thinking from SQL-based design patterns.
Key Characteristics
Fully Managed: No server management, patching, or maintenance required
Multi-Region: Global tables provide multi-master replication across AWS regions
Flexible Schema: Store structured, semi-structured, or unstructured data
Built-in Security: Encryption at rest and in transit with fine-grained access control
Event-Driven: Native integration with AWS Lambda through DynamoDB Streams
DynamoDB Table Design Principles
Successful DynamoDB implementation starts with proper table design. The fundamental principle is to design your tables around your application's access patterns rather than normalizing data like you would in a relational database.
Single Table Design
One of the most important concepts in DynamoDB is the single table design pattern. Instead of creating multiple tables for different entities, you store all your application's data in a single table. This approach:
Reduces the number of requests needed to fetch related data
Minimizes cross-table operations
Optimizes for DynamoDB's pricing model
Simplifies backup and recovery operations
Access Pattern First Design
Before creating any table, document all your application's access patterns:
Query Patterns: How will you retrieve data?
Update Patterns: How will data be modified?
Delete Patterns: How will data be removed?
Frequency: How often will each pattern be used?
This information guides your key design and indexing strategy.
Partition Key and Sort Key Selection
The foundation of DynamoDB performance lies in choosing the right primary key structure. DynamoDB supports two types of primary keys:
Simple Primary Key (Partition Key Only)
Use when you need to access items by a single attribute:
{
"TableName": "Users",
"KeySchema": [
{
"AttributeName": "userId",
"KeyType": "HASH"
}
]
}Composite Primary Key (Partition Key + Sort Key)
Use when you need hierarchical data access:
{
"TableName": "UserPosts",
"KeySchema": [
{
"AttributeName": "userId",
"KeyType": "HASH"
},
{
"AttributeName": "postTimestamp",
"KeyType": "RANGE"
}
]
}Best Practices for Key Selection
Partition Key Guidelines:
Ensure high cardinality to distribute data evenly
Avoid hot partitions by choosing attributes with uniform access patterns
Consider using composite attributes when natural keys don't provide good distribution
Sort Key Guidelines:
Choose attributes that support your query patterns
Use sort keys to enable range queries and sorting
Consider hierarchical patterns for related data grouping
Common Anti-Patterns to Avoid
Sequential Partition Keys: Using timestamps or auto-incrementing IDs as partition keys
Hot Partitions: Concentrating too much traffic on specific partition key values
Large Items: Storing items larger than 400KB
Sparse Indexes: Creating GSIs on attributes that are rarely populated
Global Secondary Indexes (GSI)
GSIs provide additional query flexibility by allowing you to query data using different partition and sort key combinations than your main table.
When to Use GSIs
Query data using attributes other than the primary key
Support multiple access patterns from the same dataset
Enable efficient queries without scanning the entire table
GSI Design Considerations
Projection Types:
KEYS_ONLY: Only key attributes are projected
INCLUDE: Key attributes plus specified non-key attributes
ALL: All attributes from the base table
Example GSI Configuration:
{
"GlobalSecondaryIndexes": [
{
"IndexName": "GSI1",
"KeySchema": [
{
"AttributeName": "GSI1PK",
"KeyType": "HASH"
},
{
"AttributeName": "GSI1SK",
"KeyType": "RANGE"
}
],
"Projection": {
"ProjectionType": "INCLUDE",
"NonKeyAttributes": ["attribute1", "attribute2"]
}
}
]
}GSI Best Practices
Minimize GSI Count: Each GSI adds cost and complexity
Project Only Necessary Attributes: Reduce storage costs and improve performance
Plan for Sparse Indexes: Not all items need to have GSI attributes
Consider Write Costs: GSI updates consume additional write capacity
DynamoDB Streams and Triggers
DynamoDB Streams capture real-time changes to your table data, enabling event-driven architectures and real-time analytics.
Stream Types
KEYS_ONLY: Only key attributes of modified items
NEW_IMAGE: Entire item after modification
OLD_IMAGE: Entire item before modification
NEW_AND_OLD_IMAGES: Both new and old images
Common Use Cases
Real-time Analytics:
import boto3
import json
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
# Process new item
new_item = record['dynamodb']['NewImage']
process_new_user(new_item)
elif record['eventName'] == 'MODIFY':
# Process updated item
old_item = record['dynamodb']['OldImage']
new_item = record['dynamodb']['NewImage']
process_user_update(old_item, new_item)Data Synchronization:
Replicate data to other databases
Update search indexes (Elasticsearch, OpenSearch)
Trigger downstream processing workflows
Audit and Compliance:
Track all changes to sensitive data
Maintain change history for compliance requirements
Generate audit reports
Auto Scaling and On-Demand Billing
DynamoDB offers two capacity modes to handle varying workload demands:
Provisioned Capacity Mode
Define read and write capacity units (RCUs and WCUs) with auto-scaling:
{
"BillingMode": "PROVISIONED",
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
}Auto Scaling Configuration:
{
"TableName": "MyTable",
"BillingMode": "PROVISIONED",
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
},
"AutoScalingSettings": {
"TargetUtilization": 70,
"MinCapacity": 5,
"MaxCapacity": 1000
}
}On-Demand Billing Mode
Pay per request without capacity planning:
{
"BillingMode": "PAY_PER_REQUEST"
}Choosing the Right Billing Mode
Use Provisioned Mode When:
Predictable workload patterns
Consistent traffic levels
Cost optimization is critical
You can forecast capacity needs
Use On-Demand Mode When:
Unpredictable workloads
Serverless applications
Development and testing environments
Getting started with DynamoDB
Performance Optimization Techniques
Hot Partition Mitigation
Distribute Write Load:
import hashlib
import random
def generate_distributed_key(base_key):
# Add random suffix to distribute load
suffix = random.randint(0, 9)
return f"{base_key}#{suffix}"
def write_with_distribution(table, item):
distributed_key = generate_distributed_key(item['original_key'])
item['partition_key'] = distributed_key
table.put_item(Item=item)Use Composite Keys:
# Instead of using timestamp as partition key
partition_key = "2024-01-15T10:30:00Z"
# Use composite approach
partition_key = f"USER#{user_id}"
sort_key = "2024-01-15T10:30:00Z"Query Optimization
Use Query Instead of Scan:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserPosts')
# Efficient query
response = table.query(
KeyConditionExpression=Key('userId').eq('123') & Key('postTimestamp').begins_with('2024-01')
)
# Avoid full table scan
# response = table.scan() # Don't do this for large tablesImplement Pagination:
def get_all_user_posts(table, user_id):
posts = []
last_evaluated_key = None
while True:
if last_evaluated_key:
response = table.query(
KeyConditionExpression=Key('userId').eq(user_id),
ExclusiveStartKey=last_evaluated_key
)
else:
response = table.query(
KeyConditionExpression=Key('userId').eq(user_id)
)
posts.extend(response['Items'])
last_evaluated_key = response.get('LastEvaluatedKey')
if not last_evaluated_key:
break
return postsHands-on: Building a DynamoDB Application
Let's build a social media application backend that demonstrates DynamoDB best practices.
Application Requirements
Our application needs to support:
User profiles
Posts by users
Comments on posts
Following relationships
Timeline generation
Table Design
We'll use a single table design with the following structure:
import boto3
from boto3.dynamodb.conditions import Key, Attr
import uuid
from datetime import datetime, timezone
class SocialMediaApp:
def __init__(self):
self.dynamodb = boto3.resource('dynamodb')
self.table_name = 'SocialMediaApp'
self.table = None
def create_table(self):
table = self.dynamodb.create_table(
TableName=self.table_name,
KeySchema=[
{'AttributeName': 'PK', 'KeyType': 'HASH'},
{'AttributeName': 'SK', 'KeyType': 'RANGE'}
],
AttributeDefinitions=[
{'AttributeName': 'PK', 'AttributeType': 'S'},
{'AttributeName': 'SK', 'AttributeType': 'S'},
{'AttributeName': 'GSI1PK', 'AttributeType': 'S'},
{'AttributeName': 'GSI1SK', 'AttributeType': 'S'},
],
GlobalSecondaryIndexes=[
{
'IndexName': 'GSI1',
'KeySchema': [
{'AttributeName': 'GSI1PK', 'KeyType': 'HASH'},
{'AttributeName': 'GSI1SK', 'KeyType': 'RANGE'}
],
'Projection': {'ProjectionType': 'ALL'},
'ProvisionedThroughput': {
'ReadCapacityUnits': 5,
'WriteCapacityUnits': 5
}
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 5,
'WriteCapacityUnits': 5
}
)
# Wait for table to be created
table.wait_until_exists()
self.table = table
return tableImplementing Core Functionality
User Management:
def create_user(self, username, email, full_name):
user_id = str(uuid.uuid4())
timestamp = datetime.now(timezone.utc).isoformat()
user_item = {
'PK': f'USER#{user_id}',
'SK': f'USER#{user_id}',
'GSI1PK': f'USERNAME#{username}',
'GSI1SK': f'USER#{user_id}',
'entity_type': 'User',
'user_id': user_id,
'username': username,
'email': email,
'full_name': full_name,
'created_at': timestamp,
'follower_count': 0,
'following_count': 0,
'post_count': 0
}
try:
self.table.put_item(
Item=user_item,
ConditionExpression=Attr('PK').not_exists()
)
return user_item
except ClientError as e:
if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
raise ValueError("User already exists")
raise
def get_user_by_username(self, username):
response = self.table.query(
IndexName='GSI1',
KeyConditionExpression=Key('GSI1PK').eq(f'USERNAME#{username}')
)
if response['Items']:
return response['Items'][0]
return NonePost Management:
def create_post(self, user_id, content, image_url=None):
post_id = str(uuid.uuid4())
timestamp = datetime.now(timezone.utc).isoformat()
post_item = {
'PK': f'USER#{user_id}',
'SK': f'POST#{timestamp}#{post_id}',
'GSI1PK': f'POST#{post_id}',
'GSI1SK': f'POST#{timestamp}',
'entity_type': 'Post',
'user_id': user_id,
'post_id': post_id,
'content': content,
'image_url': image_url,
'created_at': timestamp,
'like_count': 0,
'comment_count': 0
}
# Use transaction to update post count
self.dynamodb.meta.client.transact_write_items(
TransactItems=[
{
'Put': {
'TableName': self.table_name,
'Item': post_item
}
},
{
'Update': {
'TableName': self.table_name,
'Key': {
'PK': {'S': f'USER#{user_id}'},
'SK': {'S': f'USER#{user_id}'}
},
'UpdateExpression': 'ADD post_count :inc',
'ExpressionAttributeValues': {
':inc': {'N': '1'}
}
}
}
]
)
return post_item
def get_user_posts(self, user_id, limit=20):
response = self.table.query(
KeyConditionExpression=Key('PK').eq(f'USER#{user_id}') &
Key('SK').begins_with('POST#'),
ScanIndexForward=False, # Latest posts first
Limit=limit
)
return response['Items']Following Relationships:
def follow_user(self, follower_id, followed_id):
timestamp = datetime.now(timezone.utc).isoformat()
# Create following relationship
following_item = {
'PK': f'USER#{follower_id}',
'SK': f'FOLLOWING#{followed_id}',
'GSI1PK': f'USER#{followed_id}',
'GSI1SK': f'FOLLOWER#{follower_id}',
'entity_type': 'Following',
'follower_id': follower_id,
'followed_id': followed_id,
'created_at': timestamp
}
# Use transaction to update counts
self.dynamodb.meta.client.transact_write_items(
TransactItems=[
{
'Put': {
'TableName': self.table_name,
'Item': following_item,
'ConditionExpression': 'attribute_not_exists(PK)'
}
},
{
'Update': {
'TableName': self.table_name,
'Key': {
'PK': {'S': f'USER#{follower_id}'},
'SK': {'S': f'USER#{follower_id}'}
},
'UpdateExpression': 'ADD following_count :inc',
'ExpressionAttributeValues': {
':inc': {'N': '1'}
}
}
},
{
'Update': {
'TableName': self.table_name,
'Key': {
'PK': {'S': f'USER#{followed_id}'},
'SK': {'S': f'USER#{followed_id}'}
},
'UpdateExpression': 'ADD follower_count :inc',
'ExpressionAttributeValues': {
':inc': {'N': '1'}
}
}
}
]
)Monitoring and Optimization
CloudWatch Metrics to Monitor:
ConsumedReadCapacityUnits
ConsumedWriteCapacityUnits
ThrottledRequests
UserErrors
SystemErrors
Performance Monitoring Code:
import boto3
def monitor_table_performance(table_name):
cloudwatch = boto3.client('cloudwatch')
# Get throttle metrics
response = cloudwatch.get_metric_statistics(
Namespace='AWS/DynamoDB',
MetricName='ThrottledRequests',
Dimensions=[
{
'Name': 'TableName',
'Value': table_name
}
],
StartTime=datetime.utcnow() - timedelta(hours=1),
EndTime=datetime.utcnow(),
Period=300,
Statistics=['Sum']
)
throttle_count = sum(point['Sum'] for point in response['Datapoints'])
if throttle_count > 0:
print(f"Warning: {throttle_count} throttled requests in the last hour")
# Implement auto-scaling or alerting logicSecurity Best Practices
IAM Policies
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem",
"dynamodb:Query"
],
"Resource": [
"arn:aws:dynamodb:region:account-id:table/SocialMediaApp",
"arn:aws:dynamodb:region:account-id:table/SocialMediaApp/index/*"
],
"Condition": {
"ForAllValues:StringLike": {
"dynamodb:LeadingKeys": [
"USER#${cognito-identity.amazonaws.com:sub}"
]
}
}
}
]
}Encryption Configuration
def create_encrypted_table(self):
table = self.dynamodb.create_table(
TableName=self.table_name,
# ... other configuration
SSESpecification={
'Enabled': True,
'SSEType': 'KMS',
'KMSMasterKeyId': 'alias/dynamodb-key'
}
)Cost Optimization Strategies
Reserved Capacity
For predictable workloads, purchase reserved capacity to reduce costs by up to 76%:
def purchase_reserved_capacity():
client = boto3.client('dynamodb')
response = client.purchase_reserved_capacity_offerings(
ReservedCapacityOfferingId='offering-id',
ReservationName='MyReservation'
)Capacity Planning
def analyze_capacity_usage():
cloudwatch = boto3.client('cloudwatch')
# Analyze read capacity utilization
response = cloudwatch.get_metric_statistics(
Namespace='AWS/DynamoDB',
MetricName='ConsumedReadCapacityUnits',
Dimensions=[{'Name': 'TableName', 'Value': 'SocialMediaApp'}],
StartTime=datetime.utcnow() - timedelta(days=7),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Average', 'Maximum']
)
# Calculate recommended capacity
max_consumed = max(point['Maximum'] for point in response['Datapoints'])
recommended_capacity = math.ceil(max_consumed * 1.2) # 20% buffer
return recommended_capacityAdvanced Topics
DynamoDB Accelerator (DAX)
For applications requiring microsecond latency:
import boto3
import botocore.session
def setup_dax_client():
session = botocore.session.get_session()
region = 'us-east-1'
dax_client = session.create_client(
'dax',
region_name=region,
endpoint_url=f'dax.{region}.amazonaws.com'
)
return dax_clientGlobal Tables
For multi-region applications:
def enable_global_tables(table_name, regions):
client = boto3.client('dynamodb')
for region in regions:
client.create_global_table(
GlobalTableName=table_name,
ReplicationGroup=[
{'RegionName': region}
for region in regions
]
)Conclusion
Amazon DynamoDB offers incredible performance and scalability when designed and implemented correctly. The key to success lies in understanding your access patterns, designing appropriate partition and sort keys, and leveraging features like GSIs and DynamoDB Streams effectively.
Remember these critical points:
Design tables around access patterns, not entities
Choose partition keys that distribute data evenly
Use single table design for related data
Monitor performance metrics and optimize continuously
Implement proper security and cost optimization strategies
By following these principles and best practices, you'll build DynamoDB applications that scale seamlessly and perform exceptionally well under any load. The hands-on example we built demonstrates how these concepts come together in a real-world application, providing a solid foundation for your own DynamoDB projects.
As you continue your DynamoDB journey, keep experimenting with different design patterns and stay updated with new features and capabilities that AWS continues to add to this powerful NoSQL database service


Comments