Search This Blog

Saturday, January 11, 2020

Deploying Splunk Multi-Site using Google Cloud Platform.....Using Terraform!

Use Case

In my earlier post, I talked about how to go about setting up a Splunk multi-site configuration. Taking it forward using the same premise, I'm going to set up a Terraform template. The Terraform template would allow one to setup and tear down the whole infrastructure as a need basis. It would be an ideal way to test before actually moving it to PROD. So lets get started!

Preparation

In my earlier post, I used the Splunk image as the base for the Cluster Master, Search Head and Indexers. I will be using the same again but to help with automating the setup of the infrastructure I need to make some changes.

In my earlier post I had to accept the license and then create the user id and password for the Splunk instance. This can be automated by creating the user-seed.conf file as detailed here. In this case I've create the following $SPLUNK_HOME/etc/system/local/user-seed.conf  file-


[user_info]
USERNAME = admin
PASSWORD = passw0rd

To help start the Splunk instance without having to stop to accept the agreement we need to accept the following command -


splunk start --accept-license

We will add this in the startSplunk.sh script which is part of the startup script for the VM instances. In addition the startup script will look something like this and I will explain what most of them are used for-


FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else
    while $boCheck ; do
        sleep 10
 if test -f "$CONFFILE"; then
            boCheck=false
            ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
            PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

            sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
            /opt/splunk/bin/splunk start --accept-license;
            echo "Splunk started" > /root/conf/checkStatus;
            if [[ "$HOSTNAME" == *"idx-"* ]]; then
                cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
            fi
            if [[ "$HOSTNAME" == *"sh-"* ]]; then
       cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"clus-"* ]]; then
       cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"east"* ]]; then
       sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
            else
       sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
            fi 
        echo "Splunk started" > /root/conf/checkStatus;
    fi
    done
fi   

Now lets walk through what has been created in here.


FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

The FILE variable will set the checkStatus file. After the initial configuration is done we don't need to walk through the whole file. The script will check if the file is created and will just run the splunk start command. The boCheck variable is to check if the CONFFILE is copied from the local machine where Terraform will be run to the instances created. While my initial run I noticed that the CONFILE would be copied but after the machine has been created and this would lead to missteps. 


if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else

fi
This if block checks if the checkStatus has been created-this is after the infrastructure has been created by Terraform and I'm restarting the VMs.


while $boCheck ; do
    sleep 10
    if test -f "$CONFFILE"; then
        boCheck=false

    fi
done

This WHILE block works with IF block to check if CONFFILE has been copied on the VM. I had to use a while loop that allowed me to wait for the CONFFILE to be copied on to the VMs. If the CONFFILE is not copied then it would sleep for 10 seconds and check again. This is necessary as I did not want to stop and start the VM after the file is copied and the SSH can be unreliable. The IF loop  if finds that the file is available will set the boolean to false to avoid running the while loop again in the script.

The contents of the CONFILE would look like -


Zone:us-central1-b
Project:splunk-261402
ClusterMaster:clus-mstr-0

The Zone denotes the Cluster Master location, the Project the name of the project for GCP and the hostname of the Cluster Master is in ClusterMaster.


ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
/opt/splunk/bin/splunk start --accept-license;
echo "Splunk started" > /root/conf/checkStatus;
    if [[ "$HOSTNAME" == *"idx-"* ]]; then
        cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
    fi
    if [[ "$HOSTNAME" == *"sh-"* ]]; then
 cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
    fi
    if [[ "$HOSTNAME" == *"clus-"* ]]; then
 cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
    fi
    if [[ "$HOSTNAME" == *"east"* ]]; then
 sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
    else
 sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
    fi 
    echo "Splunk started" > /root/conf/checkStatus;
fi

Subsequent lines in the script will prepare the configurations that go in the individual machines that are designated as Cluster Master, Search Head or Indexer.
Finally it will write the "Splunk started" to the checkStatus file to denote that alls done and subsequently when the startup script is run only run the 'splunk start' command.

The complete startSplunk.sh script
FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else
    while $boCheck ; do
        sleep 10
 if test -f "$CONFFILE"; then
            boCheck=false
            ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
            PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

            sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
            /opt/splunk/bin/splunk start --accept-license;
            echo "Splunk started" > /root/conf/checkStatus;
            if [[ "$HOSTNAME" == *"idx-"* ]]; then
                cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
            fi
            if [[ "$HOSTNAME" == *"sh-"* ]]; then
       cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"clus-"* ]]; then
       cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"east"* ]]; then
       sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
            else
       sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
            fi 
        echo "Splunk started" > /root/conf/checkStatus;
    fi
    done
fi

Terraform Script, Template and Directory structure

Before we go any forward, I'd recommend you read through this qwiklabs labs about Terraform. I've used this labs as the reference  to create this script and project. I have created the tfnet folder and the instance folder in that, so the directory structure looks like this -
directory structure

The main.tf file would look something like this - 

main.tf

variable "instance_name" {}
variable "instance_zone" {}
variable "instance_type" { default = "n1-standard-1" }
variable "instance_subnetwork" {}
variable "disk_size" {}
variable "instance_startup_script" {}
variable "import_file" {}
variable "import_dest" {}
variable "instance_count" {}

resource "google_compute_instance" "vm_instance" {
  count = "${var.instance_count}"
  name = "${var.instance_name}-${count.index}"
  #RESOURCE properties go here
  zone         = "${var.instance_zone}"
  machine_type = "${var.instance_type}"

  tags = ["splunk"]

  boot_disk {
    initialize_params {
        image = "splunk-image"
        size  = "${var.disk_size}"
        }
  }

  network_interface {
    subnetwork = "${var.instance_subnetwork}"
    access_config {
      # Allocate a one-to-one NAT IP to the instance
    }
  }

  metadata = {
    ssh-keys = "root:${file("/home/user/.ssh/root-ssh-key.pub")}"
  }

  provisioner "file" {
    source = "${var.import_file}"
    destination = "${var.import_dest}"
    connection {
        type = "ssh"
        user = "root"
        private_key = "${file("/home/user/.ssh/root-ssh-key-insecure")}"
        host = "${self.network_interface.0.access_config.0.nat_ip}"
        agent = false
      }
    }
  metadata_startup_script = "${var.instance_startup_script}"
}

In this main.tf is a template for creating a VM. The count dictates how many VM need to be created of the same configuration, the name of the VM, the zone they will reside in and the machine type. The instance has the 'splunk' tag assigned to it but it would be a good practice to have a variable assigned. The boot disk size is dictated by variable passed in. The network interface is specified by the management.tf.
Finally, the last bit is about using the configuration to copy the ClusMstr.conf from the directory to the VM that will be created. During this exercise I came across issues with connecting to the VM and transferring the file. The major issue was the key had to unencrypted, so I had to use the openssl to do that.  

management.tf

# Create managementnet network
resource "google_compute_network" "splunk" {
  name                    = "splunk"
  auto_create_subnetworks = false
}

# Create managementsubnet-us subnetwork
resource "google_compute_subnetwork" "splunk-subnet-us-central" {
  name          = "splunk-subnet-us-central"
  region        = "us-central1"
  network       = "${google_compute_network.splunk.self_link}"
  ip_cidr_range = "10.130.0.0/20"
}

resource "google_compute_subnetwork" "splunk-subnet-us-east" {
  name          = "splunk-subnet-us-east"
  region        = "us-east1"
  network       = "${google_compute_network.splunk.self_link}"
  ip_cidr_range = "10.140.0.0/20"
}

resource "google_compute_firewall" "splunk-firewall" {
  name    = "splunk-firewall"
  network = google_compute_network.splunk.name

  allow {
    protocol = "icmp"
  }

  allow {
    protocol = "tcp"
    ports    = ["80", "8080", "22", "8089", "8000" ]
  }
  source_ranges = ["0.0.0.0/0"]
  target_tags = ["splunk"]
}

# Add the managementnet-us-vm instance
 module "idx-ctr-us-vm" {
  source              = "./instance"
  instance_name       = "idx-ctr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 3
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

module "idx-east-us-vm" {
  source              = "./instance"
  instance_name       = "idx-east"
  instance_zone       = "us-east1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-east.self_link}"
  disk_size           = 25
  instance_count      = 3
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

 module "sh-ctr-us-vm" {
  source              = "./instance"
  instance_name       = "sh-ctr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 1
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

 module "clus-mstr-us-vm" {
  source              = "./instance"
  instance_name       = "clus-mstr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 1
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
 }

Coming to the management.tf, we are creating the 'splunk' network and it contains us-central1 and us-east1 subnet. We setup the firewall rules to allow the access for the Splunk ports i.e 8000, 8089 and 8080 from everywhere. 

The module instances will use the instance's main.tf as the teamplate to create the VMs. As part of this we are also copying the ClusMstr.conf file containing the Cluster Master details from the local location where the Terraform script is executed to the VMs to setup the server.conf file for the Indexers.

Execution...

Now that all of our configurations are ready, lets see how it turns out and it starts by executing the init command first -


terraform init

The output should be similar to -
terraform init output

Follow that up with running the following command to initiate the plan phase -
terraform plan


terraform plan created




























Execute the apply command to start the creation of the resources-
terraform apply

Waiting for input for terraform apply









































Enter the value as 'yes' and return. It will start the execution to create the resources -
terraform apply finished

The VMs are created along with all the complimentary resources. 

All the resources are created and up

























Trying to check the Indexers cluster management and it may not show up. 

First time around it does not work



Trying to check the Indexers cluster management and it may not show up the first time around so restart all the VMs we have created. 

After restart use the new External IP

Now that the VMs have been restarted, use the newly assigned IP address for the Cluster Master and lets see how it looks!

All running as required















To close out you can destroy the infrastructure using the following command - 

terraform destroy


Summary

A couple of items of note, having firewall rule to allow connectivity from 0.0.0.0/0 is not recommended and as noted in earlier post it would be good to use a separate persistent disk rather than using the boot disk. 
As we have seen in this post we can use Terraform to stand up the complete Splunk infrastructure.