Google Cloud Platform example data flow using GCE and GCS

This article will show a basic data flow example from Google Cloud Platform. It will be utilizing the GCE (Google Cloud Compute Engine) and GCS (Google Cloud Storage) It will be using a script to automate all the processes. This data flow will do the following:

Setup template for creating GCP Instance machine (Debian Linux)
- This will setup template for startup script to use
- Name of the storage bucket
- Scopes/permissions
- Setup Service Account
When the template is ready, it will create a GCP Instance inside that GCE Host Machine
When the machine starts:
- Run the startup script
  - This start up script will download the stack driver agent
  - Install stack driver agent
  - stack driver will gather data from syslog
- Get SSH keys for ssh permission in Medata Services

So the workflow looks like this:

1. Browser -> Connects to GCP Console then you can access:

Start up script using GCP Shell -> editor -> Get the script file. Copy and ready for past.

2. Create Project in GCP. Then select that project. It will enable: –

GCS – Google Cloud Storage –
SD – stack drive (It will NOT enable GCE)

3. GCS -> Create a new bucket – enter bucket name.

4. Enable GCE – GCP Console -> Go to VM Instances -> this will enable GCE API.

5. Create Instance in GCE then configure: (see screenshot below for example)

Use the default service account under I”dentity and API access: -> Service accounts
Paste the script under “Management” -> “Automation” -> startup script
Pass the reference of the bucket that we created under “Metadata” -> Key and Value. Key = lab-logs-bucket (this is also mentioned in the startup script) and value “gs://mybucketname/”
Set scopes to write to GCS

Startup script:

#! /bin/bash

#
# Echo commands as they are run, to make debugging easier.
# GCE startup script output shows up in "/var/log/syslog" .
#
set -x


#
# Stop apt-get calls from trying to bring up UI.
#
export DEBIAN_FRONTEND=noninteractive


#
# Make sure installed packages are up to date with all security patches.
#
apt-get -yq update
apt-get -yq upgrade


#
# Install Google's Stackdriver logging agent, as per
# https://cloud.google.com/logging/docs/agent/installation
#
curl -sSO https://dl.google.com/cloudagents/install-logging-agent.sh
bash install-logging-agent.sh


#
# Install and run the "stress" tool to max the CPU load for a while.
#
apt-get -yq install stress
stress -c 8 -t 120


#
# Report that we're done.
#

# Metadata should be set in the "lab-logs-bucket" attribute using the "gs://mybucketname/" format.
log_bucket_metadata_name=lab-logs-bucket
log_bucket_metadata_url="https://metadata.google.internal/computeMetadata/v1/instance/attributes/${log_bucket_metadata_name}"
worker_log_bucket=$(curl -H "Metadata-Flavor: Google" "${log_bucket_metadata_url}")

# We write a file named after this machine.
worker_log_file="machine-$(hostname)-finished.txt"
echo "Phew!  Work completed at $(date)" >"${worker_log_file}"

# And we copy that file to the bucket specified in the metadata.
echo "Copying the log file to the bucket..."
gsutil cp "${worker_log_file}" "${worker_log_bucket}"

John Mark Causing

[email protected]

Bacolod City, Philippines

+639393497018

John Mark Causing

[email protected]

Bacolod City, Philippines

+639393497018

Google Cloud Platform example data flow using GCE and GCS

Recent Posts

Categories

Do you want to say hello? Let me know!

Facebook

Linkedin

[email protected]

+639393497018

Github