I find using SSH in Terraform quite problematic: you need to distribute a private SSH key to anybody that will launch the Terraform script, including your CI/CD system. This is a no-go for me: it adds the complexity to manage SSH keys, including their rotation. There is a huge issue on the Terraform repo on GitHub about this functionality, and the most voted solution is indeed connecting via SSH to run a check:
1provisioner "remote-exec" {2 inline = [3 "cloud-init status --wait"4 ]5}
AWS Systems Manager Run Command
The idea of using cloud-init status --wait
is indeed quite good. The only problem is how do we ask Terraform to run such command. Luckily for us, AWS has a service, AWS SSM Run Command that allow us to run commands on an EC2 instance through AWS APIs! In this way, our CI/CD system needs only an appropriate IAM role, and a way to invoke AWS APIs. I use the AWS CLI in the examples below, but you can adapt them to any language you prefer.
Prerequisites
There are some prerequisites to use AWS SSM Run Command: we need to have AWS SSM Agent installed on our instance. It is preinstalled on Amazon Linux 2 and Ubuntu 16.04, 18.04, and 20.04. For any other OS, we need to install it manually: it is supported on Linux, macOS, and Windows.
The user or the role that executes the Terraform code needs to be able to create, update, and read AWS SSM Documents, and run SSM commands. A possible policy could be look like this:
1{ 2 "Version": "2012-10-17", 3 "Statement": [ 4 { 5 "Sid": "Stmt1629387563127", 6 "Action": [ 7 "ssm:CreateDocument", 8 "ssm:DeleteDocument", 9 "ssm:DescribeDocument",10 "ssm:DescribeDocumentParameters",11 "ssm:DescribeDocumentPermission",12 "ssm:GetDocument",13 "ssm:ListDocuments",14 "ssm:SendCommand",15 "ssm:UpdateDocument",16 "ssm:UpdateDocumentDefaultVersion",17 "ssm:UpdateDocumentMetadata"18 ],19 "Effect": "Allow",20 "Resource": "*"21 }22 ]23}
If we already know the name of the documents, or the instances where we want to run the commands, it is better to lock down the policy specifying the resources, accordingly to the principle of least privilege.
Last but not least, we need to have the AWS CLI installed on the system that will execute Terraform.
The Terraform code
After having set up the prerequisites as above, we need two different Terraform resources. The first will create the AWS SSM Document with the command we want to execute on the instance. The second one will execute such command while provisioning the EC2 instance.
The AWS SSM Document code will look like this:
1resource "aws_ssm_document" "cloud_init_wait" { 2 name = "cloud-init-wait" 3 document_type = "Command" 4 document_format = "YAML" 5 content = <<-DOC 6 schemaVersion: '2.2' 7 description: Wait for cloud init to finish 8 mainSteps: 9 - action: aws:runShellScript10 name: StopOnLinux11 precondition:12 StringEquals:13 - platformType14 - Linux15 inputs:16 runCommand:17 - cloud-init status --wait18 DOC19}
We can refer such document from within our EC2 instance resource, with a local provisioner:
1resource "aws_instance" "example" { 2 ami = "my-ami" 3 instance_type = "t3.micro" 4 5 provisioner "local-exec" { 6 interpreter = ["/bin/bash", "-c"] 7 8 command = <<-EOF 9 set -Ee -o pipefail10 export AWS_DEFAULT_REGION=${data.aws_region.current.name}11 12 command_id=$(aws ssm send-command --document-name ${aws_ssm_document.cloud_init_wait.arn} --instance-ids ${self.id} --output text --query "Command.CommandId")13 if ! aws ssm wait command-executed --command-id $command_id --instance-id ${self.id}; then14 echo "Failed to start services on instance ${self.id}!";15 echo "stdout:";16 aws ssm get-command-invocation --command-id $command_id --instance-id ${self.id} --query StandardOutputContent;17 echo "stderr:";18 aws ssm get-command-invocation --command-id $command_id --instance-id ${self.id} --query StandardErrorContent;19 exit 1;20 fi;21 echo "Services started successfully on the new instance with id ${self.id}!"22 23 EOF24 }25}
From now on, Terraform will wait for cloud-init to complete before marking the instance ready.
Conclusion
AWS Session Manager, AWS Run Commands, and the others tools in the AWS Systems Manager family are quite powerful, and in my experience they are not widely use. I find them extremely useful: for example, they also allows connecting via SSH to the instances without having any port open, included the 22! Basically, they allow managing and running commands inside instances only through AWS APIs, with a lot of benefits, as they explain:
Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys. Session Manager also allows you to comply with corporate policies that require controlled access to instances, strict security practices, and fully auditable logs with instance access details, while still providing end users with simple one-click cross-platform access to your managed instances.
Do you have any questions, feedback, critics, request for support? Leave a comment below, or drop me an email at [email protected].
Ciao,
R.
Comments