Writing a basic script

This guide walks through writing a basic transform script using the gluegun package

Setup the environment

Once we enter the Jupyter workspace, create a new transformation script by opening the etl folder and creating an etl.ipynb notebook

Read the data

For this example, we're going to grab the Accounts from Salesforce, rename them to Customers and add two columns: a Customer Address and SyncSource

To start, we will import pandas and the gluegun Python package

import pandas as pd
import gluegun

Now let's setup where our data is coming from and where we'd like to export it - the file structure for hotglue transformations is described in more detail in the Transformations section.

# This is where input from Salesforce will be
baseInputDir = ROOT_DIR + '/sync-output'
# This is where hotglue expects finished output from the job
outputCsv = ROOT_DIR + '/etl-output/OutputData.csv'
# Keep track of output dataframes here
outputDataFrames = {}

Now, let's read our input into a variable inputDataFrames. The gluestick package has a utility for doing this

# Input dataframes will contain data as {"Opportunity": data, "OpportunityLineItem": data}
inputDataFrames = gluegun.read_csv_folder(baseInputDir)

Here's an idea of what the data looks like,

Basic transformations

Let's start by renaming our Accountsfrom Salesforce to Customers and filtering out the deleted accounts

# We'll start by filtering out deleted Accounts (selecting where IsDeleted == False)
inputDataFrames['Account'] = inputDataFrames['Account'][(inputDataFrames['Account']['IsDeleted'] == False)]
# We will use this to make our customer object
outputDataFrames['Customer'] = inputDataFrames['Account']
customer = outputDataFrames['Customer']

From here, we can now add our new field Customer Address

# Add a field we need "Customer Address" by combining the following
customer['Customer Address'] = customer['BillingStreet'].map(str) + ', ' \
+ customer['BillingCity'].map(str) + ', ' \
+ customer['BillingState'].map(str) + ', ' \
+ customer['BillingPostalCode'].map(lambda x: x if type(x) is str else "{:.0f}".format(x)) + ', ' \
+ customer['BillingCountry'].map(str)

And then set our other new field SyncSource to Salesforce

# Add a field so our system knows this customer was synced from Salesforce
outputDataFrames['Customer']['Sync Source'] = 'SalesForce'

Write the output

Finally, we can save our output using the pandas built-in to_csv function.

# Write output file now, ready for our application to consume

Deploy the script

Now that our transformation script is ready, we will deploy it through Jupyter by selecting the Deploy ETL button

Confirm you want to push your new script to production

Once your script is deployed, you will see a success message similar to the following