Automating a simple Velociraptor deployment in AWS (Part 2)

2023-04-15

Find part 1 of this series here

Find part 3 of this series here

Ok, so this post is waaay overdue. It was meant to be a quick followup to part one which was posted months ago. Sometimes life just gets in the way, but it is here now!

Installs

I briefly covered pre-reqs in the last post but let's just list what you should have installed here so it is all smooth sailing:

Terraform Install Guide
AWS CLI Install Guide
Ansible Install Guide

AWS Setup

IAM Account

Terraform will need to use an account to manage resources so in your AWS console go to IAM dashboard and select User groups from the left-hand menu. Create a new group and name it however you like, I went with "dfir-users". Scroll down to the permissions policies section and grant the following:

AmazonEC2FullAccess
AmazonElasticFileSystemFullAccess
AmazonRoute53FullAccess

With the group and permissions set, from the left-hand menu go into User and then click the add users button. Again choose whatever name you wish, I went with "dfir-terraform". Click next and then select the group you just created. Continue through to the review screen and click create user.

Now go back into the user you just created and go to the Security Credentials tab. Scroll down to Access keys and click the create access key button. Choose "Command Line Interface" and continue.

Screenshot of IAM user screen highlighting Security Credentials tab

Now in your terminal, configure the AWS CLI to use the access keys you just generated for authentication.

aws configure

Go through the prompts entering the relevant information and you're done!

Static Components

As noted in the part 1 blog post, there are some components of the deployment that are costless and make sense to be left standing. So let's see how to manually create them in the AWS console.

In all of these, make sure you're setting them up in the same region!

Virtual Private Cloud (and others)

Go to the VPC dashboard and click on Create VPC. In the menu select the VPC and more option and go through the inputs, choose whatever tag and CIDR blocks you wish, just make sure you select 1 public subnet otherwise an Internet Gateway won't be set up.

VPC and related components creation screen part 1

VPC and related components creation screen part 2

This takes care of the VPC, the Subnet, the Internet Gateway, the Route Table and the Network Security Group.

Route53 Hosted Zone

In the Route 53 dashboard, go to Hosted zones on the left-hand menu. Click on Create hosted zone and go through the set up steps - you will need your own domain for this and the ability to configure it's configured name servers. If you don't have a domain you can still follow along but you'll need to make minor changes at some points to use the public IP of the EC2 instance rather than a domain.

Terraform

I originally combined both the Terraform and Ansible scripts, by having Terraform's remote-exec provisioner run the Ansible script on the server after all the infrastructure had finished being stood up but I've now leaned more toward keeping them separate. The user will first run the Terraform script to create all the infrastructure, and then run the Ansible script to configure the server. It is an extra action required on behalf of the user which may be heresy in the automation world but it feels more comfortable to me to have each stage separated... and it's my script so there. You can easily change this back to how I originally had it with a bit of documentation reading anyway.

Before getting started on the bulk of the code, create some top level files which we'll be adding bits into as we go along.

touch main.tf
touch outputs.tf
touch variables.tf
touch terraform.tfvars

You can probably guess the purpose of the first two files. For the latter two; variables.tf contains a list of all the variables that will be fed into the script, their type and default values. terraform.tfvars contains a subset of those variables, basically any variables that the user wants to set themselves and overwrite the default values. Think along the lines of a username or subdomain name.

⚠️Note:⚠️ I recommend adding terraform.tfvars to the .gitignore file if you're going to be creating a public repo for this, just in case you put any sensitive info in there and push it!

Now the first thing to do is to declare in main.tf that we will be using the official AWS provider and initialise it with your chosen region.

main.tf

terraform {
    required_providers {
        aws = {
            source = "hashicorp/aws"
            version = "~> 4.0"
        }
    }
}

provider "aws" {
    region = var.aws_region
}

Notice that we've used a variable when setting the region in the provider setup. We will need to declare this variable in variables.tf.

variables.tf

variable "aws_region" {
    description = "The region within AWS that resources are to be deployed to"
    type = string
    default = "eu-west-2"
}

In case we ever want multiple deployments on the go at the same time, we'll also set a deployment name variable and use it to tag resources.

variables.tf

variable "deployment_name" {
    description = "Unique name/id to use for this deployment"
    type = string
}

terraform.tfvars

deployment_name = "demo"

We'll be splitting the bulk of the script code into modules to keep things neat, so before continuing just create the directory structure and empty files.

mkdir -p modules/app
mkdir modules/networking
mkdir modules/routing
for i in $(ls -1 modules); do touch modules/$i/main.tf; touch modules/$i/outputs.tf; touch modules/$i/variables.tf; done

Networking

Ok, let's begin! This is the baby module. Since we are leaving the VPC, subnet etc up permanently we don't have to create them which we would ordinarily do in this module. Instead we're just going to grab a handle to those components so we can add stuff to them.

The AWS provider can grab handles for these components using their name tags, so we will be passing those in through variables. Declare variables for the VPC and Subnet name tags in both the main variables file and the networking module variables file.

variables.tf

variable "vpc_name_tag" {
    description = "The name tag of the VPC"
    type = string
    default = "dfir-vpc"
}

variable "subnet_name_tag" {
    description = "The name tag of the Subnet"
    type = string
    default = "dfir-subnet-public1-eu-west-2a"
}

modules/networking/variables.tf

variable "vpc_name_tag" {
    description = "The name tag of the VPC"
    type = string
}

variable "subnet_name_tag" {
    description = "The name tag of the Subnet"
    type = string
}

Notice that only the main variables.tf has a value for these. There is no need to set a default value in the networking module file since they will be passed in. If you're not sure where these values came from, you should have seen them when you set the components up earlier. You can also go to the VPC dashboard and see them there too.

In main.tf (the top-level one) instruct it to run the networking module and pass in the variables we just created.

main.tf

module "networking" {
    source = "./modules/networking"
    vpc_name_tag = var.vpc_name_tag
    subnet_name_tag = var.subnet_name_tag
}

In /modules/networking/main.tf create two data sources which will point to the VPC and Subnet.

/modules/networking/main.tf

data "aws_vpc" "dfir-vpc" {
    tags = {
        Name = "${var.vpc_name_tag}"
    }
}

data "aws_subnet" "dfir-subnet" {
    tags = {
        Name = "${var.subnet_name_tag}"
    }
}

A quick note about syntax here: "data" declares a data source, the string after it is the type of data source (consult the AWS provider documentation to see a full list), and the third bit is the name we give to the data source which allows us to reference it within the rest of the script.

Finally, declare some outputs in modules/networking/outputs.tf. These will basically create extra variables that we can pass in to other parts of the script.

/modules/networking/outputs.tf

output "dfir_vpc_id" {
    value = data.aws_vpc.dfir-vpc.id
}

output "dfir_subnet_id" {
    value = data.aws_subnet.dfir-subnet.id
}

Great, this module is finished! We've retrieved handles to the VPC and Subnet and are ready to use them when creating new resources.

App

The big one. In this module we will handle creating the EFS (Elastic File Storage) that will be used to store all the Velociraptor data, the EC2 which is the server that Velociraptor will be running on and appropriate network security group rules which will limit which IPs can connect to the EC2.

Firstly we need to decide on the size of EC2 we want to provision. There are loads to choose from for all kinds of architecture and use-case. Have a look through the instance types documentation if you're interested and want to pick your own. If not, I've found that t3a.large works well. You could get away with a less powerful one too but I've noticed bits of lag on smaller ones and since I never leave it up long anyway the difference in cost doesn't add up to much. Once the decision is made go ahead and put it in a variable.

variables.tf

variable "ec2_size" {
    description = "The size of the EC2 instance to provision"
    type = string
    default = "t3a.large"   # 2 core, 8gb ram, AMD EPYC 7000 series
}

/modules/app/variables.tf

variable "ec2_size" {
    description = "The size of the EC2 instance to provision"
    type = string
}

Secondly, decide the AMI you will be using. The AMI is the machine image of the server, basically the Operating System. The official help docs can guide you through this. Or you can copy me; I am using Ubuntu 22.04 LTS amd64. Two things to note if picking your own: make sure the architecture matches up with your chosen EC2 instance, and that the AMI is in the right region for you. Now... you guessed it - it's variable making time.

variables.tf

variable "ec2_ami" {
    description = "The AMI to install on the EC2"
    type = string
    default = "ami-09744628bed84e434"   # Ubuntu 22.04 LTS amd64 (eu-west-2), 20230325
}

/modules/app/variables.tf

variable "ec2_ami" {
    description = "The AMI to install on the EC2"
    type = string
}

We're also going to be passing in the VPC and Subnet data sources we grabbed from the Networking module, as well as the deployment name so we need to declare variables for these.

/modules/app/variables.tf

variable "dfir_vpc_id" {
    description = "The id of the VPC to attach resources to"
    type = string
}

variable "dfir_subnet_id" {
    description = "The id of the Subnet to place resources in"
    type = string
}

Now, just as we did with the Networking module we need to tell the top-level main.tf to call the App module and pass in the values for these variables.

main.tf

module "app" {
    source = "./modules/app"
    ec2_size = var.ec2_size
    ec2_ami = var.ec2_ami
    dfir_vpc_id = module.networking.dfir_vpc_id
    dfir_subnet_id = module.networking.dfir_subnet_id
    deployment_name = var.deployment_name
}

The first thing we'll do here is not create the EC2, but instead generate a new SSH key-pair. This will allow access to the server through SSH. I have the key output into a .ssh directory inside the project folder (and .gitignore has an entry to ignore this dir just in case), but you can change it to put the key wherever you like.

Security concerns: you may not like the idea of generating a private key in this way. This is understandable, especially if you're doing this in a professional environment. Feel free to generate a key manually and use that instead. This will require some modification to the script but I'm sure you can read the docs and figure it out.

/modules/app/main.tf

resource "tls_private_key" "dfir-priv-key" {
    algorithm = "ED25519"
}

resource "aws_key_pair" "dfir-pub-key" {
    key_name = "dfir-${var.deployment_name}-"
    public_key = tls_private_key.dfir-priv-key.public_key_openssh
}

resource "local_sensitive_file" "priv-key-out" {
    filename = "./.ssh/${var.deployment_name}.pem"
    file_permission = "0600"
    content = tls_private_key.dfir-priv-key.private_key_openssh
}

Let's deal with more security stuff. We do not want our server to be open to the whole internet, so we'll use some security group rules to allow administrative access only from approved IPs. We'll also restrict the callback listener port to approved IPs - these will be the public IPs of the systems we'll install clients on i.e. the targets of investigation.

6.1 Declare variables to hold the approved IPs (take care to set your own IPs here, not the placeholder values). You can specify multiple ranges in a comma-separated list.

variables.tf

variable "admin_ips" {
    description = "IP ranges for administrative access"
    type = list(string)
}

variable "client_ips" {
    description = "IP ranges client connections will originate from"
    type = list(string)
}

/modules/app/variables.tf

variable "admin_ips" {
    description = "IP ranges for administrative access"
    type = list(string)
}

variable "client_ips" {
    description = "IP ranges client connections will originate from"
    type = list(string)
}

terraform.tfvars

client_ips = ["0.0.0.0/0"]
admin_ips = ["127.0.0.1/32"]

6.2 Pass the variables into the app module and create the security group and rules

main.tf

    # add these lines to the app module block
    admin_ips = var.admin_ips
    client_ips = var.client_ips

/modules/app/main.tf

resource "aws_security_group" "dfir-ec2-secgrp" {
    vpc_id = var.dfir_vpc_id
    tags = {
        Name = "dfir-${var.deployment_name}-EC2-SG"
    }

    egress {
        description = "Allow all outbound traffic from EC2"
        protocol = "-1"
        from_port = 0
        to_port = 0
        cidr_blocks = ["0.0.0.0/0"]
        ipv6_cidr_blocks = ["::/0"]
    }

    ingress {
        description = "Allow inbound SSH from admin IPs"
        protocol = "tcp"
        from_port = 22
        to_port = 22
        cidr_blocks = var.admin_ips
    }

    ingress {
        description = "Allow inbound Velociraptor GUI access from admin IPs"
        protocol = "tcp"
        from_port = 9500
        to_port = 9500
        cidr_blocks = var.admin_ips
    }

    ingress {
        description = "Allow inbound Velociraptor client connections from client IPs"
        protocol = "tcp"
        from_port = 9501
        to_port = 9501
        cidr_blocks = var.client_ips
    }
}

Feel free to modify the ruleset to fit your purposes.

Create a second security group for the EFS that will prevent anything other than the EC2 from touching it.

/modules/app/main.tf

resource "aws_security_group" "dfir-efs-secgrp" {
    vpc_id = var.dfir_vpc_id
    tags = {
        Name = "dfir-${var.deployment_name}-EFS-SG"
    }

    ingress {
        description = "Allow inbound NFS traffic from members of the EC2 security group"
        protocol = "tcp"
        from_port = 2049
        to_port = 2049
        security_groups = [aws_security_group.dfir-ec2-secgrp.id]
    }
}

The last step before creating the actual EC2; we now need to create the EFS that will be holding all the velociraptor data. This is pretty straightforward, we create an EFS resource and then a mount target associated with it which we later use to mount the EFS on the EC2.

/modules/app/main.tf

resource "aws_efs_file_system" "dfir-efs" {
    encrypted = true
    tags = {
        Name = "dfir-${var.deployment_name}-EFS"
    }
}

resource "aws_efs_mount_target" "dfir-efs-mount" {
    file_system_id = aws_efs_file_system.dfir-efs.id
    subnet_id = var.dfir_subnet_id
    security_groups = [aws_security_group.dfir-efs-secgrp.id]
}

Now we can bring it all together by creating the EC2 resource! Note that at the top of this code block we use the depends_on property to inform Terraform that it should wait until the EFS mount point is ready before attempting to create the EC2. Usually Terraform can figure out the order in which things need to be created, but for tasks where resources can be created but there is a delay until they're fully ready, it sometimes needs some help.

/modules/app/main.tf

resource "aws_instance" "dfir-ec2" {
    depends_on = [
        aws_efs_mount_target.dfir-efs-mount
    ]
    tags = {
        Name = "dfir-${var.deployment_name}-EC2"
    }

    ami = var.ec2_ami
    instance_type = var.ec2_size
    subnet_id = var.dfir_subnet_id
    vpc_security_group_ids = [aws_security_group.dfir-ec2-secgrp.id]
    key_name = aws_key_pair.dfir-pub-key.key_name
    associate_public_ip_address = true
}

Ok so that is the infrastructure set up, but we'll want to think about users and configure access. By default AWS will give us a user called 'ubuntu' (on an ubuntu VM anyway) that we can access with our SSH key. If you don't care about changing this to a custom username skip to step 12, otherwise follow along. First create a new file with the following contents.

/modules/app/users.yaml

#cloud-config
# Add user to the system and populate authorized_keys
users:
  - name: velociraptor
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - ${dfir-pub-key}

Now we can go back to the EC2 resource and use the templatefile function to pass this script in to the user_data property. Add the following line to your ec2 creation block.

/modules/app/main.tf

    user_data = templatefile("${path.module}/users.yaml", { dfir-pub-key = tls_private_key.dfir-priv-key.public_key_openssh})

Finally we need to define some outputs which will be used to configure a DNS record and help ansible mount the EFS to the EC2

/modules/app/outputs.tf

output "ec2_public_ip" {
    value = aws_instance.dfir-ec2.public_ip
}

output "efs_dns_name" {
    value = aws_efs_file_system.dfir-efs.dns_name
}

Thus the App module is complete! We've generated a unique key pair, defined traffic rules, created an encrypted auto-scaling file store and stood up a server that will run Velociraptor.

Routing

The final module of our Terraform script. It will be quite small for all it does is take the public IP of the EC2 instance and create a DNS record in our hosted zone to point to it.

Create yet another variable, passing in the domain of your hosted zone in Route 53.

variables.tf

variable "dfir_domain" {
    description = "The base domain name to be used for deployment (must have a hosted zone in Route53). A subdomain will be created pointing to the EC2 server"
    type = string
}

/modules/routing/variables.tf

variable "dfir_domain" {
    description = "The base domain name to be used for deployment (must have a hosted zone in Route53). A subdomain will be created pointing to the EC2 server"
    type = string
}

terraform.tfvars

dfir_domain = "yourdomain.com"

Declare two extra variables in the routing module to receive a subdomain and the EC2 public IP

/modules/routing/variables.tf

variable "dfir_subdomain" {
    description = "The subdomain to point at the EC2"
    type = string
}

variable "ec2_public_ip" {
    description = "The public IPv4 address of the EC2"
    type = string
}

In the top-level main.tf, combine the deployment name with domain name to create the subdomain string. Then call the routing module and pass in the necessary variables.

main.tf

locals {
    dfir_subdomain = "dfir-${var.}"
}

module "routing" {
    source = "./modules/routing"
    dfir_domain = var.dfir_domain
    dfir_subdomain = local.dfir_subdomain
    ec2_public_ip = module.app.ec2_public_ip
}

Inside the routing main.tf, gather a data source pointing at the Route53 hosted zone and then create a DNS record associating the subdomain and public IP

/modules/routing/main.tf

data "aws_route53_zone" "dfir-zone" {
    name = "${var.dfir_domain}"
}

resource "aws_route53_record" "dfir-dns" {
    zone_id = data.aws_route53_zone.dfir-zone.zone_id
    name = "${var.dfir_subdomain}"
    type = "A"
    ttl = "300"
    records = [var.ec2_public_ip]
}

Outputs and Execution

All of our Terraform code is now complete! Well, except for outputting some useful information both to the user and for Ansible (the next stage). Make two final edits, this time to the top-level main.tf and outputs.tf

main.tf

resource "local_file" "ansible-vars" {
    filename = "./ansible/vars/from_terraform.yaml"
    file_permission = "0664"
    content = "domain: ${local.dfir_subdomain}\nefs_dns_name: ${module.app.efs_dns_name}"
}

resource "local_file" "ansible-inventory" {
    filename = "./ansible/hosts"
    file_permission = "0664"
    content = "${local.dfir_subdomain}"
}

outputs.tf

output "Velociraptor_Public_IP" {
    value = module.app.ec2_public_ip
}

output "Velociraptor_Server_Domain" {
    value = local.dfir_subdomain
}

Now go ahead and run the script! It takes about 80-90 seconds to complete, although most of that is waiting for the EFS mount target to be ready! It's like magic.

terraform init
terraform apply

To decomission all of the resources again, it is just as easy.

terraform destroy

This is the end of part two. I will leave the Ansible script for actually installing Velociraptor for part 3, since this was getting quite long already.