Logo
Jaime Elso

AWS Solutions Architect

Automated deployment of static web pages from a repository to an S3 bucket on AWS

Automated deployment of static web pages from a git repository to an S3 bucket on AWS is presented in this article. The implementation is secure, scalable, optimized, and has automated deployment from a git repository in CodeCommit. A CloudFormation template is used to deploy the infrastructure on AWS, and a Lambda function is utilized to synchronize between the repository and S3 bucket. Check out the project GitHub repository.

MyWebsite hosting diagram

Hosting a static web page on AWS

To host a static web page on AWS, Amazon S3 is the go-to service, with an option to enable the bucket as a static web page. To enhance website performance, an Amazon CloudFront distribution is deployed with S3 bucket as the origin. However, uploading new changes to the website manually on S3 was problematic, so an automated solution was sought that would update code in the git repository and synchronize it with the S3 bucket on commit.

An initial attempt was made to configure an AWS CodePipeline with an Amazon EventBridge rule that triggered whenever a new commit was made to the repository. However, this solution had two drawbacks: it uploaded all files from the repository, regardless of whether they had been modified, and did not remove deleted files from the S3 bucket.

AWS Amplify Hosting was another potential solution, but it was not ideal because it reduces user visibility and control over resources used for web hosting. The final solution that worked best was to invoke an AWS Lambda function on each new commit to the repository, which analyzed the changes and synchronized them with the S3 bucket.

Amazon S3 bucket as a log storage

To keep a log and trace of all calls made to the files on our website (including internet calls and actions performed by the synchronization Lambda function), we will use an Amazon S3 bucket to store these logs.

BucketLogs:
	Type: AWS::S3::Bucket
	Properties:
		BucketName: !Sub '${DomainName}-logs'
		AccessControl: LogDeliveryWrite
		# Encryption configuration with S3 managed keys
		BucketEncryption:
			ServerSideEncryptionConfiguration:
				- ServerSideEncryptionByDefault:
						SSEAlgorithm: AES256
		# Lifecycle configuration to expire logs after days defined in LogRetention parameter
		LifecycleConfiguration:
		Rules:
			- Id: DeleteLogsAfterTwoMonths
				Status: Enabled
				Prefix: hosting/
				ExpirationInDays: !Ref LogRetention

To secure the bucket, we encrypted it with the default KMS keys for the S3 service and set the AccessControl to LogDeliveryWrite. This grants write permissions to log files generated by AWS services with that policy. To prevent logs from being stored indefinitely and taking up too much space in the bucket with irrelevant information, I implemented a lifecycle that automatically deletes the logs after two months.

Amazon S3 bucket as a web hostig

Using an object storage service as the hosting for my static website is an interesting choice because I don't require server-side execution. The service simply returns files as they are to the user's browser upon request.

Bucket:
	Type: AWS::S3::Bucket
	Properties:
		BucketName: !Ref DomainName
		# Encryption configuration with S3 managed keys
		BucketEncryption:
			ServerSideEncryptionConfiguration:
				- ServerSideEncryptionByDefault:
						SSEAlgorithm: AES256
		# Logs configuration
		LoggingConfiguration:
			DestinationBucketName: !Ref BucketLogs
			LogFilePrefix: hosting/

The only particularities of configuring this bucket are to encrypt it using the default KMS keys for S3 and enable server access logs, specifying the destination bucket created in the previous step.

Bucket Policy

The S3 bucket is not accessible by default, which means we cannot serve the content of our website to incoming requests. To solve this issue, we need to establish a bucket policy that grants read access to the CloudFront distribution we will deploy soon.

BucketPolicy:
	Type: AWS::S3::BucketPolicy
	Properties:
		Bucket: !Ref Bucket
		PolicyDocument:
		Version: '2012-10-17'
		Statement:
			- Action:
				- s3:GetObject
			Effect: Allow
			Resource: !Sub '${Bucket.Arn}/*'
			Principal:
				CanonicalUser: !GetAtt CloudFrontOriginAccessIdentity.S3CanonicalUserId

Amazon CloudFront to secure and fast delivery web content

Merely using Amazon S3 does not provide us with the necessary tools to configure fundamental aspects of our website, including a custom domain name or HTTPS. Fortunately, Amazon CloudFront can help us with these issues. To use a custom domain for our CloudFront distribution and serve content securely via HTTPS, it's essential to have an SSL certificate deployed in the N. Virginia region (us-east-1) using AWS Certificate Manager.

CloudFront:
	Type: AWS::CloudFront::Distribution
	Properties:
		DistributionConfig:
		# Custom domain name
		Aliases:
			- !Ref DomainName
		# A name for the distribution
		Comment: !Sub 'CloudFront distribution for ${DomainName}'
		DefaultCacheBehavior:
			# Content will be compressed before it is cached (gzip), unless specifically instructed otherwise
			Compress: true
			# Cache content for 1 day
			DefaultTTL: 86400
			# Pass query strings to the origin
			ForwardedValues:
			QueryString: true
			# Max cache content for 1 year
			MaxTTL: 31536000
			# Only allow HTTPS
			ViewerProtocolPolicy: redirect-to-https
			# Set the response headers policy
			ResponseHeadersPolicyId: !Ref ResponseHeadersPolicy
			TargetOriginId: !Sub 'S3Bucket-${AWS::StackName}'
		DefaultRootObject: !Ref RootDocumentPath
		CustomErrorResponses:
			- ErrorCachingMinTTL: 300
				ErrorCode: 404
				ResponseCode: 200
				ResponsePagePath: !Ref ErrorDocumentPath
			- ErrorCachingMinTTL: 300
				ErrorCode: 403
				ResponseCode: 200
				ResponsePagePath: !Ref ErrorDocumentPath
		IPV6Enabled: true
		Enabled: true
		HttpVersion: http2and3
		# Set the origin to the S3 bucket and specify the origin access identity
		Origins:
			- DomainName: !GetAtt Bucket.DomainName
				Id: !Sub 'S3Bucket-${AWS::StackName}'
				S3OriginConfig:
					OriginAccessIdentity:
						!Join ['', ['origin-access-identity/cloudfront/', !Ref CloudFrontOriginAccessIdentity]]
		# Allow CloudFront to use the all edge location
		PriceClass: 'PriceClass_All'
		# Set the certificate
		ViewerCertificate:
			AcmCertificateArn: !Ref CertificateArn
			MinimumProtocolVersion: 'TLSv1.1_2016'
			SslSupportMethod: 'sni-only'

Incorporating a content delivery network (CDN) service above our web hosting enhances the speed of content delivery to end-users on a global scale. AWS has many edge locations that cache the static files of our website. As a result, when a user visits the webpage, the content is retrieved from the closest edge location to their location, avoiding the need to go back to the origin. Within our distribution, we establish the root file of our webpage and detail the path to reroute requests in instances of 403 and 404 errors.

Response header policy

CloudFront enables custom header configuration in the HTTP response. In this instance, we have established several policies to prevent potential cross-site scripting attacks (XSS), script injection attacks, and malicious code execution, among other security vulnerabilities.

ResponseHeadersPolicy:
	Type: AWS::CloudFront::ResponseHeadersPolicy
	Properties:
		ResponseHeadersPolicyConfig:
		# A name for the ResponseHeadersPolicy
		Name: !Sub "${AWS::StackName}-static-site-security-headers"
		# Specifies the security headers configuration
		SecurityHeadersConfig:
			# Specifies the Strict Transport Security (HSTS) header
			StrictTransportSecurity:
				# Specifies the maximum age (in seconds) for which the browser should cache the HSTS policy
				AccessControlMaxAgeSec: 63072000
				# Indicates whether the HSTS policy should apply to all subdomains
				IncludeSubdomains: true
				# Specifies whether to override an existing HSTS policy
				Override: true
				# Specifies whether to preload the HSTS policy in supported browsers
				Preload: true
			# Specifies the Content Security Policy (CSP) header
			ContentSecurityPolicy:
				# Specifies the CSP header value
				ContentSecurityPolicy: !Ref CSPHeader
				# Specifies whether to override an existing CSP policy
				Override: true
			# Specifies the X-Content-Type-Options header
			ContentTypeOptions:
				# Specifies whether to override an existing X-Content-Type-Options policy
				Override: true
			# Specifies the X-Frame-Options header
			FrameOptions:
				# Specifies the value of the X-Frame-Options header
				FrameOption: DENY
				# Specifies whether to override an existing X-Frame-Options policy
				Override: true
			# Specifies the Referrer-Policy header
			ReferrerPolicy:
				# Specifies the value of the Referrer-Policy header
				ReferrerPolicy: "same-origin"
				# Specifies whether to override an existing Referrer-Policy policy
				Override: true
			# Specifies the X-XSS-Protection header
			XSSProtection:
				# Specifies whether to block pages from loading when they detect reflected cross-site scripting (XSS) attacks
				ModeBlock: true
				# Specifies whether to override an existing X-XSS-Protection policy
				Override: true
				# Specifies whether to enable the XSS Protection policy
				Protection: true

Origin Access Identity (OAI)

CloudFront Origin Access Identity (OAI) is a method to authenticate and authorize requests between Amazon CloudFront and an Amazon S3 origin resource. CloudFront OAI enables the creation of an origin access identity that acts as an intermediary between CloudFront and S3. CloudFront uses this access identity to request S3 objects on behalf of users, which means that the objects are no longer directly accessible in S3 but only through CloudFront. This provides greater control and security over access to objects stored in S3 and helps protect against potential attacks.

CloudFrontOriginAccessIdentity:
	Type: AWS::CloudFront::CloudFrontOriginAccessIdentity
	Properties:
		CloudFrontOriginAccessIdentityConfig:
			Comment: !Sub 'CloudFront OAI for ${DomainName}'

AWS CodeCommit as git repository for our web code

To maintain control over the code we write for our website, it's necessary to have it uploaded to a Git repository. Taking advantage of working in AWS, I'll be using AWS CodeCommit for this purpose. This repository will be the starting point for uploading our code to production every time a new commit is made to our repository. To detect when this commit occurs, we'll configure a trigger in CodeCommit that will react to new commits in a specific branch. This trigger will asynchronously invoke a lambda function.

Repository:
	Type: AWS::CodeCommit::Repository
	Properties:
		RepositoryName: MyWebsite
		RepositoryDescription: CodeCommit repository for MyWebsite
		# For each new commit on the master branch, a new deployment on S3 will be triggered through a Lambda function
		Triggers:
			- Events:
					- updateReference
				DestinationArn: !GetAtt SyncCodeCommitWithS3Function.Arn
				Name: SyncCodeCommitWithS3Trigger
				Branches:
					- !Ref BranchName

Amazon SNS to notify in case of synchronization error

Since our Lambda function runs asynchronously every time a new commit is made to the repository, we are not monitoring the result of this execution. Therefore, it is important for us to be notified when, for whatever reason, the Lambda function fails to perform its task. To receive these notifications, we will create a new Amazon SNS topic.

SyncCodeCommitWithS3Topic:
	Type: AWS::SNS::Topic
	Properties:
		TopicName: SyncCodeCommitWithS3
		DisplayName: SyncCodeCommitWithS3

Once our topic is created, we need to subscribe to it. In my case, I want to receive those failure notifications via email. We will create a new subscription and use email as the protocol, adding our email address.

SyncCodeCommitWithS3EmailSubscription:
	Type: AWS::SNS::Subscription
	Properties:
		TopicArn: !Ref SyncCodeCommitWithS3Topic
		Protocol: email
		Endpoint: !Ref SubscriptionEndpoint

Using AWS Lambda to synchronize a repository with a bucket

The goal of our Lambda function, once invoked asynchronously by the CodeCommit trigger, is to obtain the information contained in the new commit, differentiating between files that have been modified or added and files that have been deleted. The files that have been modified or added will be uploaded to the S3 bucket, while the files that have been deleted in the repository will also be deleted from the bucket. Once this synchronization is done, we will create a cache invalidation in CloudFront for the files that have been affected in this new commit. This way, we will force the CDN to update the content of its cache with the new versions of the files in S3.

Lambda function deployment

Let's first take a look at how to configure the function on AWS before we move on to the code. To ensure that our Lambda function can execute certain actions with other AWS services it needs to interact with, it is important to assign it a role with sufficient permissions. For this reason, we will set up a restrictive permission policy that grants it only the necessary permissions to complete its task successfully.

SyncCodeCommitWithS3FunctionRole:
	Type: AWS::IAM::Role
	Properties:
		RoleName: SyncCodeCommitWithS3FunctionRole
		Description: Lambda Role to perform logs and sync file from CodeCommit With a S3 bucket and invalidates the CloudFront distribution cache for that files.
		AssumeRolePolicyDocument:
		Version: "2012-10-17"
		Statement:
			- Effect: Allow
				Principal:
				Service:
					- lambda.amazonaws.com
				Action:
					- 'sts:AssumeRole'

SyncCodeCommitWithS3FunctionRolePolicy:
	Type: AWS::IAM::Policy
	Properties:
	PolicyName: SyncCodeCommitWithS3FunctionRolePolicy
	PolicyDocument:
		Version: '2012-10-17'
		Statement:
			- Effect: Allow
				Action:
					- logs:CreateLogGroup
				Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:*"
			- Effect: Allow
				Action:
					- logs:CreateLogStream
					- logs:PutLogEvents
				Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${SyncCodeCommitWithS3Function}:*"
			- Effect: Allow
				Action: 
					- codecommit:GetCommit
					- codecommit:GetDifferences
					- codecommit:GetFile
				Resource: !GetAtt Repository.Arn
			- Effect: Allow
				Action:
					- s3:PutObject
					- s3:DeleteObject
				Resource: !Sub '${Bucket.Arn}/*'
			- Effect: Allow
				Action:
					- cloudfront:CreateInvalidation
				Resource: !Sub 'arn:aws:cloudfront::${AWS::AccountId}:distribution/${CloudFront}'
			- Effect: Allow
				Action:
				- sns:Publish
				Resource: !Ref SyncCodeCommitWithS3Topic
		Roles:
			- !Ref SyncCodeCommitWithS3FunctionRole

To ensure that the trigger we configured in CodeCommit can successfully invoke the function, it's important to remember that AWS Lambda prohibits CodeCommit from directly executing the function using a permission-enabled role. Instead, it's necessary to assign a resource policy to the function that authorizes CodeCommit to execute it securely.

SyncCodeCommitWithS3FunctionResourcePolicy:
	Type: AWS::Lambda::Permission
	Properties:
		Action: lambda:InvokeFunction
		FunctionName: !Ref SyncCodeCommitWithS3Function
		Principal: codecommit.amazonaws.com
		SourceArn: !GetAtt Repository.Arn

Once we have established all the permissions, we deploy our Lambda function. Although I usually feel more comfortable programming in JavaScript, in this case, I have developed the function in Python, specifically in its 3.10 version, which is the most updated runtime that we can use in AWS today.

The timeout parameter is crucial in our function, as it will require more or less execution time depending on the number of files that have been modified in our last commit. My advice is for each person to adapt this parameter based on their specific needs. The function's code must be compressed in ZIP format and uploaded to an S3 bucket so that CloudFormation can access the code when deploying it.

SyncCodeCommitWithS3Function:
	Type: AWS::Lambda::Function
	Properties:
		FunctionName: SyncCodeCommitWithS3
		Description: Function that sync file from CodeCommit With a S3 bucket and invalidates the CloudFront distribution cache for that files.
		Runtime: python3.10
		Architectures:
			- x86_64
		MemorySize: 128
		# Timeout in seconds. Increase if you have commits with a lot of files to sync
		Timeout: !Ref LambdaTimeout
		# Place Lambda function code in S3 and reference it here. Zip file must contain the index.py file
		Code:
			S3Bucket: !Ref LambdaS3Bucket
			S3Key: !Ref LambdaS3Key
		PackageType: Zip
		Handler: index.lambda_handler
		Role: !GetAtt SyncCodeCommitWithS3FunctionRole.Arn
		# Define environment variables to know the bucket name and the CloudFront distribution ID
		Environment:
			Variables:
				bucketName: !Ref Bucket
				distributionId: !Ref CloudFront
				topicArn: !Ref SyncCodeCommitWithS3Topic

To enable our Lambda function to know the bucket to synchronize, the SNS topic to send error notifications, and the CloudFront distribution to create invalidations, I define three environment variables with this information, so they can be accessed by the code during deployment.

Lambda function code

To start our function, we will import the AWS SDK to interact with other AWS services from our code. Additionally, we will import the time library to get the current time and assign it to a reference, and the os library to retrieve the values of our environment variables. This code is executed only when our Lambda function is deployed and initiated, so the values of the variables declared at this point are not redeclared and reassigned every time the function is invoked. Therefore, to make our code more efficient, we initialize the different AWS SDK clients we are going to use and set the obtained values from the environment variables. Finally, we declare a dictionary that contains all existing MIME types so that later in our code, we can retrieve the MIME type based on the file extension.

# Boto3 is the official AWS library for Python
import boto3
# Import the time library to get the current time
import time
# Import the os library to get environment variables
import os

# Create an instance of the CodeCommit, S3, SNS and CloudFront client
codecommit = boto3.client('codecommit')
s3 = boto3.client('s3')
cloudfront = boto3.client('cloudfront')
sns = boto3.client('sns')

# Get the bucket name and distribution id from the environment variables
try:
	bucketName = os.environ['bucketName']
	distributionId = os.environ['distributionId']
	topicArn = os.environ['topicArn']
except Exception as e:
	raise Exception('Missing environment variable: ' + str(e))

# All the possible MIME types
contentTypes = {
	'3dm': 'x-world/x-3dmf',
	'3dmf': 'x-world/x-3dmf',
	'3g2': 'video/3gpp2',
	...
}

Once the code that will be executed upon starting our function is done, we are ready to develop the handler. The handler is the Python function that is executed each time the Lambda function is invoked.

Our handler receives the event as an argument. This variable contains a dictionary with information about the event that is invoking the function. From this dictionary, we obtain data such as the repository name and the ID of the commit that has been made. With these two pieces of information and by using the CodeCommit API call get_commit, we can also retrieve the ID of the commit prior to the last one.

The CodeCommit API call get_differences, when given two commit IDs and a repository name, returns an array with the differences found between both commits. This allows us to check which changes have been made in the latest commit and, therefore, which actions we need to take to synchronize the bucket with the repository.

By using list comprehensions, we populate arrays differentiating between updated or added files and files that have been deleted, and we iterate over them to perform actions. In the case of updating or creating a new file, we retrieve it from the repository. Since it is returned in base64 and loses its metadata, we use its extension to determine the corresponding MIME type. Additionally, we check if the file is HTML, and if so, we remove the file extension to ensure it doesn't appear in the URL. Finally, we use the S3 client API to upload the file to the bucket.

In the case of needing to delete a file, we remove the HTML extension if it has one and add the path to the cache invalidation array. We perform this action for each file we upload to the bucket in the previous step as well. Once that is done, we delete the file from the S3 bucket using the API call. After we have deleted and uploaded the corresponding files, we create a cache invalidation in CloudFront to force the CDN to update these files and not serve outdated content.

# Handler function
def lambda_handler(event, context):
	
	# Get the repository name and the id of the last commit made
	repoName = event['Records'][0]['eventSourceARN'].split(':')[-1]
	lastCommitId = event['Records'][0]['codecommit']['references'][0]['commit']

	try:
		# Get commit information to get the id of the commit prior to the last commit made
		commitInfo = codecommit.get_commit(
			repositoryName = repoName,
			commitId = lastCommitId
		)
	except Exception as e:
		sns.publish(TopicArn = topicArn, Message = 'Error getting commit information: ' + str(e), Subject = 'Sync Error')
		return {
			'statusCode': 500,
			'body': 'Error getting commit information: ' + str(e)
		}

	# Get the id of the commit prior to the last commit made
	parentCommitId = commitInfo['commit']['parents'][0] if commitInfo['commit']['parents'] else None

	try:
		# Get the changes that have occurred in the last commit made
		if parentCommitId:
			commitChanges = codecommit.get_differences(
				repositoryName = repoName,
				afterCommitSpecifier = lastCommitId,
				beforeCommitSpecifier = parentCommitId
			)
		else:
			commitChanges = codecommit.get_differences(
				repositoryName = repoName,
				afterCommitSpecifier = lastCommitId
			)
	except Exception as e:
		sns.publish(TopicArn = topicArn, Message = 'Error getting commit changes: ' + str(e), Subject = 'Sync Error')
		return {
			'statusCode': 500,
			'body': 'Error getting commit changes: ' + str(e)
		}

	# Declare three arrays to group the updated, deleted and invalidated files| Populate the arrays using list comprehension
	updatedFiles = [difference['afterBlob']['path'] for difference in commitChanges['differences'] if difference.get('afterBlob') and difference['afterBlob'].get('path')]
	deletedFiles = [difference['beforeBlob']['path'] for difference in commitChanges['differences'] if difference.get('beforeBlob') and difference['beforeBlob'].get('path') and difference['beforeBlob']['path'] not in updatedFiles]
	invalidateFiles = []

	# Iterate over the updated files and upload them to the S3 bucket
	for filePath in updatedFiles:
		try:
			# Get the file content from CodeCommit
			codecommitFile = codecommit.get_file(
				repositoryName = repoName,
				commitSpecifier = lastCommitId,
				filePath = filePath
			)
		except Exception as e:
			sns.publish(TopicArn = topicArn, Message = 'Error getting file content from CodeCommit: ' + str(e), Subject = 'Sync Error')
			return {
				'statusCode': 500,
				'body': 'Error getting file content from CodeCommit: ' + str(e)
			}

		# Set Content-Type based on file extension
		contentType = contentTypes.get(filePath.split('.')[-1], 'text/plain')
		# For html files, remove de html extension
		filePath = filePath.replace('.html', '')
		# Add the file to the list of files to invalidate
		invalidateFiles.append('/' + filePath)

		try:
			# Use the put_object method of S3 to upload the file to the bucket
			s3.put_object(Bucket = bucketName, Key = filePath, Body = codecommitFile['fileContent'], ContentType = contentType)
		except Exception as e:
			sns.publish(TopicArn = topicArn, Message = 'Error uploading file to S3: ' + str(e), Subject = 'Sync Error')
			return {
				'statusCode': 500,
				'body': 'Error uploading file to S3: ' + str(e)
			}

	# Iterate over the deleted files and delete them from the S3 bucket
	for filePath in deletedFiles:
		# For html files, remove de html extension
		filePath = filePath.replace('.html', '')
		# Add the file to the list of files to invalidate
		invalidateFiles.append('/' + filePath)

		try:
			# Use the delete_object method of S3 to delete the file from the bucket
			s3.delete_object(Bucket = bucketName, Key = filePath)
		except Exception as e:
			sns.publish(TopicArn = topicArn, Message = 'Error deleting file from S3: ' + str(e), Subject = 'Sync Error')
			return {
				'statusCode': 500,
				'body': 'Error deleting file from S3: ' + str(e)
			}

	try:
		# Invalidate the modified files in CloudFront cache
		cloudfront.create_invalidation(
			DistributionId = distributionId,
			InvalidationBatch = {
				'Paths': {
					'Quantity': len(invalidateFiles),
					'Items': invalidateFiles
				},
				'CallerReference': str(time.time()).replace('.', '')
			}
		)
	except Exception as e:
		sns.publish(TopicArn = topicArn, Message = 'Error invalidating files in CloudFront: ' + str(e), Subject = 'Sync Error')
		return {
			'statusCode': 500,
			'body': 'Error invalidating files in CloudFront: ' + str(e)
		}

	# Return a 200 response
	return {
		'statusCode': 200,
		'body': 'All correct!'
	}

Thanks for reading! Catch you in the next one.