Search This Blog

Saturday, January 11, 2020

Deploying Splunk Multi-Site using Google Cloud Platform.....Using Terraform!

Use Case

In my earlier post, I talked about how to go about setting up a Splunk multi-site configuration. Taking it forward using the same premise, I'm going to set up a Terraform template. The Terraform template would allow one to setup and tear down the whole infrastructure as a need basis. It would be an ideal way to test before actually moving it to PROD. So lets get started!

Preparation

In my earlier post, I used the Splunk image as the base for the Cluster Master, Search Head and Indexers. I will be using the same again but to help with automating the setup of the infrastructure I need to make some changes.

In my earlier post I had to accept the license and then create the user id and password for the Splunk instance. This can be automated by creating the user-seed.conf file as detailed here. In this case I've create the following $SPLUNK_HOME/etc/system/local/user-seed.conf  file-


[user_info]
USERNAME = admin
PASSWORD = passw0rd

To help start the Splunk instance without having to stop to accept the agreement we need to accept the following command -


splunk start --accept-license

We will add this in the startSplunk.sh script which is part of the startup script for the VM instances. In addition the startup script will look something like this and I will explain what most of them are used for-


FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else
    while $boCheck ; do
        sleep 10
 if test -f "$CONFFILE"; then
            boCheck=false
            ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
            PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

            sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
            /opt/splunk/bin/splunk start --accept-license;
            echo "Splunk started" > /root/conf/checkStatus;
            if [[ "$HOSTNAME" == *"idx-"* ]]; then
                cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
            fi
            if [[ "$HOSTNAME" == *"sh-"* ]]; then
       cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"clus-"* ]]; then
       cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"east"* ]]; then
       sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
            else
       sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
            fi 
        echo "Splunk started" > /root/conf/checkStatus;
    fi
    done
fi   

Now lets walk through what has been created in here.


FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

The FILE variable will set the checkStatus file. After the initial configuration is done we don't need to walk through the whole file. The script will check if the file is created and will just run the splunk start command. The boCheck variable is to check if the CONFFILE is copied from the local machine where Terraform will be run to the instances created. While my initial run I noticed that the CONFILE would be copied but after the machine has been created and this would lead to missteps. 


if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else

fi
This if block checks if the checkStatus has been created-this is after the infrastructure has been created by Terraform and I'm restarting the VMs.


while $boCheck ; do
    sleep 10
    if test -f "$CONFFILE"; then
        boCheck=false

    fi
done

This WHILE block works with IF block to check if CONFFILE has been copied on the VM. I had to use a while loop that allowed me to wait for the CONFFILE to be copied on to the VMs. If the CONFFILE is not copied then it would sleep for 10 seconds and check again. This is necessary as I did not want to stop and start the VM after the file is copied and the SSH can be unreliable. The IF loop  if finds that the file is available will set the boolean to false to avoid running the while loop again in the script.

The contents of the CONFILE would look like -


Zone:us-central1-b
Project:splunk-261402
ClusterMaster:clus-mstr-0

The Zone denotes the Cluster Master location, the Project the name of the project for GCP and the hostname of the Cluster Master is in ClusterMaster.


ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
/opt/splunk/bin/splunk start --accept-license;
echo "Splunk started" > /root/conf/checkStatus;
    if [[ "$HOSTNAME" == *"idx-"* ]]; then
        cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
    fi
    if [[ "$HOSTNAME" == *"sh-"* ]]; then
 cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
    fi
    if [[ "$HOSTNAME" == *"clus-"* ]]; then
 cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
    fi
    if [[ "$HOSTNAME" == *"east"* ]]; then
 sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
    else
 sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
    fi 
    echo "Splunk started" > /root/conf/checkStatus;
fi

Subsequent lines in the script will prepare the configurations that go in the individual machines that are designated as Cluster Master, Search Head or Indexer.
Finally it will write the "Splunk started" to the checkStatus file to denote that alls done and subsequently when the startup script is run only run the 'splunk start' command.

The complete startSplunk.sh script
FILE=/root/conf/checkStatus
boCheck=true
CONFFILE=/root/conf/ClusMstr.conf

if test -f "$FILE"; then
    /opt/splunk/bin/splunk start
else
    while $boCheck ; do
        sleep 10
 if test -f "$CONFFILE"; then
            boCheck=false
            ZONE=`cat /root/conf/ClusMstr.conf |grep Zone | awk -F':' '{print $2}'` ;echo $ZONE
            PROJECT=`cat /root/conf/ClusMstr.conf |grep Project | awk -F':' '{print $2}'` ;echo $PROJECT

            sed -i "s/clus-mstr-0/clus-mstr-0.$ZONE.c.$PROJECT.internal/g" /root/conf/Indexer.txt 
            /opt/splunk/bin/splunk start --accept-license;
            echo "Splunk started" > /root/conf/checkStatus;
            if [[ "$HOSTNAME" == *"idx-"* ]]; then
                cat /root/conf/Indexer.txt >> /opt/splunk/etc/system/local/server.conf 
            fi
            if [[ "$HOSTNAME" == *"sh-"* ]]; then
       cat /root/conf/SearchHead.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"clus-"* ]]; then
       cat /root/conf/ClusterMstr.txt >> /opt/splunk/etc/system/local/server.conf
            fi
            if [[ "$HOSTNAME" == *"east"* ]]; then
       sed  -i '/serverName/a site = site2' /opt/splunk/etc/system/local/server.conf
            else
       sed  -i '/serverName/a site = site1' /opt/splunk/etc/system/local/server.conf
            fi 
        echo "Splunk started" > /root/conf/checkStatus;
    fi
    done
fi

Terraform Script, Template and Directory structure

Before we go any forward, I'd recommend you read through this qwiklabs labs about Terraform. I've used this labs as the reference  to create this script and project. I have created the tfnet folder and the instance folder in that, so the directory structure looks like this -
directory structure

The main.tf file would look something like this - 

main.tf

variable "instance_name" {}
variable "instance_zone" {}
variable "instance_type" { default = "n1-standard-1" }
variable "instance_subnetwork" {}
variable "disk_size" {}
variable "instance_startup_script" {}
variable "import_file" {}
variable "import_dest" {}
variable "instance_count" {}

resource "google_compute_instance" "vm_instance" {
  count = "${var.instance_count}"
  name = "${var.instance_name}-${count.index}"
  #RESOURCE properties go here
  zone         = "${var.instance_zone}"
  machine_type = "${var.instance_type}"

  tags = ["splunk"]

  boot_disk {
    initialize_params {
        image = "splunk-image"
        size  = "${var.disk_size}"
        }
  }

  network_interface {
    subnetwork = "${var.instance_subnetwork}"
    access_config {
      # Allocate a one-to-one NAT IP to the instance
    }
  }

  metadata = {
    ssh-keys = "root:${file("/home/user/.ssh/root-ssh-key.pub")}"
  }

  provisioner "file" {
    source = "${var.import_file}"
    destination = "${var.import_dest}"
    connection {
        type = "ssh"
        user = "root"
        private_key = "${file("/home/user/.ssh/root-ssh-key-insecure")}"
        host = "${self.network_interface.0.access_config.0.nat_ip}"
        agent = false
      }
    }
  metadata_startup_script = "${var.instance_startup_script}"
}

In this main.tf is a template for creating a VM. The count dictates how many VM need to be created of the same configuration, the name of the VM, the zone they will reside in and the machine type. The instance has the 'splunk' tag assigned to it but it would be a good practice to have a variable assigned. The boot disk size is dictated by variable passed in. The network interface is specified by the management.tf.
Finally, the last bit is about using the configuration to copy the ClusMstr.conf from the directory to the VM that will be created. During this exercise I came across issues with connecting to the VM and transferring the file. The major issue was the key had to unencrypted, so I had to use the openssl to do that.  

management.tf

# Create managementnet network
resource "google_compute_network" "splunk" {
  name                    = "splunk"
  auto_create_subnetworks = false
}

# Create managementsubnet-us subnetwork
resource "google_compute_subnetwork" "splunk-subnet-us-central" {
  name          = "splunk-subnet-us-central"
  region        = "us-central1"
  network       = "${google_compute_network.splunk.self_link}"
  ip_cidr_range = "10.130.0.0/20"
}

resource "google_compute_subnetwork" "splunk-subnet-us-east" {
  name          = "splunk-subnet-us-east"
  region        = "us-east1"
  network       = "${google_compute_network.splunk.self_link}"
  ip_cidr_range = "10.140.0.0/20"
}

resource "google_compute_firewall" "splunk-firewall" {
  name    = "splunk-firewall"
  network = google_compute_network.splunk.name

  allow {
    protocol = "icmp"
  }

  allow {
    protocol = "tcp"
    ports    = ["80", "8080", "22", "8089", "8000" ]
  }
  source_ranges = ["0.0.0.0/0"]
  target_tags = ["splunk"]
}

# Add the managementnet-us-vm instance
 module "idx-ctr-us-vm" {
  source              = "./instance"
  instance_name       = "idx-ctr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 3
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

module "idx-east-us-vm" {
  source              = "./instance"
  instance_name       = "idx-east"
  instance_zone       = "us-east1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-east.self_link}"
  disk_size           = 25
  instance_count      = 3
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

 module "sh-ctr-us-vm" {
  source              = "./instance"
  instance_name       = "sh-ctr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 1
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
}

 module "clus-mstr-us-vm" {
  source              = "./instance"
  instance_name       = "clus-mstr"
  instance_zone       = "us-central1-b"
  instance_subnetwork = "${google_compute_subnetwork.splunk-subnet-us-central.self_link}"
  disk_size           = 25
  instance_count      = 1
  instance_startup_script = "/root/startSplunk.sh"
  import_file = "/home/user/tfnet/ClusMstr.conf"
  import_dest = "/root/conf/ClusMstr.conf"
 }

Coming to the management.tf, we are creating the 'splunk' network and it contains us-central1 and us-east1 subnet. We setup the firewall rules to allow the access for the Splunk ports i.e 8000, 8089 and 8080 from everywhere. 

The module instances will use the instance's main.tf as the teamplate to create the VMs. As part of this we are also copying the ClusMstr.conf file containing the Cluster Master details from the local location where the Terraform script is executed to the VMs to setup the server.conf file for the Indexers.

Execution...

Now that all of our configurations are ready, lets see how it turns out and it starts by executing the init command first -


terraform init

The output should be similar to -
terraform init output

Follow that up with running the following command to initiate the plan phase -
terraform plan


terraform plan created




























Execute the apply command to start the creation of the resources-
terraform apply

Waiting for input for terraform apply









































Enter the value as 'yes' and return. It will start the execution to create the resources -
terraform apply finished

The VMs are created along with all the complimentary resources. 

All the resources are created and up

























Trying to check the Indexers cluster management and it may not show up. 

First time around it does not work



Trying to check the Indexers cluster management and it may not show up the first time around so restart all the VMs we have created. 

After restart use the new External IP

Now that the VMs have been restarted, use the newly assigned IP address for the Cluster Master and lets see how it looks!

All running as required















To close out you can destroy the infrastructure using the following command - 

terraform destroy


Summary

A couple of items of note, having firewall rule to allow connectivity from 0.0.0.0/0 is not recommended and as noted in earlier post it would be good to use a separate persistent disk rather than using the boot disk. 
As we have seen in this post we can use Terraform to stand up the complete Splunk infrastructure.   

Tuesday, December 31, 2019

Deploying Splunk Multi-Site using Google Cloud Platform

Use Case

We have been using Splunk as our enterprise monitoring tool for some time and have it running as a multisite venture that gives us redundancy as well as scale ability. With the advent and recent adoption of public cloud, I was keen to try out if there was an ask to move this on-premise infrastructure to cloud, how would it work. This blog post is an effort to document my output for a wider audience.
I used my personal Google account to setup the solution in Google Cloud. Google offers $300 of cloud or 1 year whichever comes first when you register for Google Cloud as an individual.

Preparation

To replicate our architecture, I have used us-central as site1 which will house the Cluster Master, Search Head and Indexers for site1 while us-east is used as site2 and will be used as secondary site. I have used Ubuntu 19.10 as the base OS which has the Splunk Enterprise installed in the /opt directory. To maintain consistency, I created a custom image and would be used as a base image for all the Splunk Servers.

To run this PoC, I created a separate Splunk VPC that would house the whole architecture. The command to create the VPC is -


gcloud compute networks create splunk \
    --subnet-mode=auto \
    --bgp-routing-mode=global

It will appear in VPC network as follows -

VPC Networks for Splunk








To manage the infrastructure from Splunk's management console(port-8000) as well as to allow Universal Forwarders(port-9997) to send the data the firewall needs to be opened. The VMs will be managed over the ssh(port-22). Just to be safe I've open the replication port 8080 and Cluster Master port-8089-as well. 
The command to open these ports is-

gcloud compute firewall-rules create splunk-allow \
    --project splunk-261402 --network splunk \
    --allow tcp:22,tcp:8089,tcp:8080,tcp:9997,tcp:8000,icmp \
    --source-ranges 0.0.0.0/0 --target-tags=splunk 

Google Doc Reference to create firewall

The rule will appear in firewalls as -

Firewall Rules for Splunk VPC

 

Creating Cluster Master, Indexers and Search Head for Splunk on site1 and site 2 

We have our VPC available, the next steps would be to create the VMs that will function as Search Head, Cluster Master and Indexers across the two sites.

Create the Search Head in us-central region and zone us-central1-b

gcloud compute instances create sh-ctr \
 --image-project splunk-261402 --zone=us-central1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard  
Create the Cluster Master for Indexers in us-central region and zone us-central1-b

gcloud compute instances create idx-mstr \
 --image-project splunk-261402 --zone=us-central1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard
Create the three indexers in us-east region and zone us-east1-b


gcloud compute instances create [INSTANCE_NAME] \
  --image-family [IMAGE_FAMILY] \
  --image-project [IMAGE_PROJECT] \
  --create-disk image=[DISK_IMAGE],image-project=[DISK_IMAGE_PROJECT],size=[SIZE_GB],type=[DISK_TYPE]
gcloud compute instances create idx-east-1 \
 --image-project splunk-261402 --zone=us-east1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard

gcloud compute instances create idx-east-2 \
 --image-project splunk-261402 --zone=us-east1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard

gcloud compute instances create idx-east-3 \
 --image-project splunk-261402 --zone=us-east1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard

Create the three indexers in us-central region and zone us-central1-b

gcloud compute instances create idx-ctr-1 \
 --image-project splunk-261402 --zone=us-central1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard

gcloud compute instances create idx-ctr-2 \
 --image-project splunk-261402 --zone=us-central1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard

gcloud compute instances create idx-ctr-3 \
 --image-project splunk-261402 --zone=us-central1-b \
 --image=splunk-image --subnet=splunk \
 --boot-disk-size=30 --boot-disk-type=pd-standard
Google Doc Reference to create VMs

After having all the above VM commands run we should have our Search Head, Cluster Master and Indexers in site1 and site2 created as follows -

Splunk VMs


Make a note of the cluster master internal IP address as we will be using that to configure the cluster. Now should we try to connect them via ssh we will not be able to as these VMs aren't tied to a firewall rule that allows then access to connectivity. 

No Network tags as yet.


To do that we will assign the 'splunk' tag to them and it will allow them to access to the firewall rule we have created.


gcloud compute instances add-tags sh-ctr \
    --zone us-central1-b \
    --tags splunk
gcloud compute instances add-tags idx-mstr \
    --zone us-central1-b \
    --tags splunk
gcloud compute instances add-tags idx-ctr-1 \
    --zone us-central1-b \
    --tags splunk
gcloud compute instances add-tags idx-ctr-2 \
    --zone us-central1-b \
    --tags splunk
gcloud compute instances add-tags idx-ctr-3 \
    --zone us-central1-b \
    --tags splunk
gcloud compute instances add-tags idx-east-1 \
    --zone us-east1-b \
    --tags splunk
gcloud compute instances add-tags idx-east-2 \
    --zone us-east1-b \
    --tags splunk
gcloud compute instances add-tags idx-east-3 \
    --zone us-east1-b \
    --tags splunk

Google Doc Reference to create Tags


After a successful execution of the commands, each of the machines will have a tag 'splunk' associated with it and we should be able to connect them via ssh-

Network tags assigned and access available now

 Putting it all together! 

Now that we have all of our VMs created, we need to put them all together so that it works as a cluster. This would involve changing  the server.conf file in the /opt/splunk/etc/system/local directory to let each other know how to communicate with the Cluster Master and Search Head. The server.conf file is not created until the splunk instance is started and the license agreement is accepted, so go ahead and start it and accept the agreement. Splunk will ask for a user and a password to access the management console and administration. To keep it simple, I've created the user admin and the password as passw0rd for all the splunk instances on all the VMs.

In the general section, add a line to indicate the site at which the splunk machine is running. In this PoC we are running the Cluster Master, Search Head and 3 Indexers, idx-ctr-1, idx-ctr-2 and idx-ctr-3 in site1 from us-central1-b and 3 Indexers idx-ctr-1, idx-ctr-2 and idx-ctr-3 in site2. 

The details on how to setup is in detail on Splunk's website here.

The Cluster Master's server.conf contains the major details related to how the cluster is setup.   

Cluster-Master server.conf

Splunk Doc Reference for Cluster Master server.conf

Since we are running the Search Head in site1 the general section will indicate it as site1.
Search Head server.conf

Splunk Doc Reference for Search Head server.conf

Since we are running the Indexers on two sites, the us-central1-b will have the general section as site1.
site1 Indexer server.conf


And the Indexers on us-east1-b will have the general section as site2.
site2 Indexer server.conf
Indexers View






Now we have all the items configured, restart the instances and they auto connect. To check the status, use the external IP address of Cluster Master and using the management port 8000 access it. Using the user id as admin and password as passw0rd, check the monitoring console. The Indexers will list themselves in the Peers and we should be having six of them. 



The indexes created in cluster will be in the indexes tab -


Indexes View



Search Head and Cluster Master view
The Search Head and Cluster Master will be visible in the Search Heads tab.


There you have it, we are all set and out Splunk infrastructure is now setup in Cloud and ready to ingest data!!

Additional Items

While creating this architecture, I had to start the splunkd instance multiple times as the machines would get rebooted. To make it easy, I created a script-startSplunk.sh-and had it in the home directory of root and turned it as executable. Use this script as a startup script and have it as part of the metadata for the VM instance. This will start the splunkd instance when the VM is started as well.

The content of the script-
Startup script








The command to have the startup script assigned to the VM. 


gcloud compute instances add-metadata idx-mstr \
    --zone us-central1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata sh-ctr \
    --zone us-central1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-ctr-1 \
    --zone us-central1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-ctr-2 \
    --zone us-central1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-ctr-3 \
    --zone us-central1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-east-1 \
    --zone us-east1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-east-2 \
    --zone us-east1-b \
    --metadata startup-script=/root/startSplunk.sh
gcloud compute instances add-metadata idx-east-3 \
    --zone us-east1-b \
    --metadata startup-script=/root/startSplunk.sh
Google Doc Reference to Startup Script

Summary

This particular PoC showcases how we can run a Splunk multisite environment in Google Cloud. Some things to keep in mind should one prepare to move this to Google Cloud -

  • In this scenario I have used the disk storage that are attached to the VMs as boot image. Ideally in PROD environment, the VMs would have persistent disks storing the Splunk indexed data. This would allow for redundancy as the disks can be detached from an instance and attached to another instances.
  • I have used in this case, a VPC that functions as the substitute for the on-premise network. In a real world scenario, one would be looking at creating a VPN which will have the current on-prem scaling out to cloud. Over a period of time the on premise architecture will slowly diminish and most of the Splunk architecture will work out of cloud.
  • For the brevity of the solution, I have not included the deployment server architecture which might be part of the architecture in some cases. In that case, assume a VM in either of the site that functions as a Deployment Server.


To actually run this solution in Google Cloud, I'd suggest reading this white paper from Splunk before starting out. This paper mentions the suggested machine types according to the scale of the deployment to run.