Debugging a script

You should read this guide if you fit any of the following:

  1. A job failed with the status ETL_FAILED and you're not sure why

  2. You're customizing the Python script for a source, and aren't sure where to start

Access the job logs

If you're debugging why a specific job failed, you can learn more about what caused a job to fail by reading the logs.

To start, navigate to the job that failed (this could be under a specific tenant)

Jobs for tenant, with failed job highlighted

From here, you should click the job to access the logs:

Job logs

These logs are updated in real time as a job runs, and can be downloaded as a txt file by selecting the download icon in the top right.

Launch Jupyter

Before you can debug the script, you have to open it! In hotglue, we manage virtual JupyterLab workspaces for you to edit and update your Python scripts.

To do this, following the guide: Start a Jupyter workspace

Clone a job

Now that you've launched Jupyter, we can clone a job. This is especially useful if you're debugging a specific job that failed.

Before cloning a job, make sure you've launched this Jupyter workspace under the correct tenant. Generally, you should launch the Jupyter workspace in the admin view, unless you wish to have a specific script for one user.

Inside of Jupyter, select the hotglue tab and press clone job.

You'll see a list of jobs, and press select to clone the data.

Once the data is cloned, you'll see a success message as below, and the data will be populated in the sync-output folder.

You can preview the data that was cloned in Jupyter by opening the sync-output folder.

Upload job data

Alternatively, if you'd rather upload a specific job from the hotglue interface (for example, from a specific user), you can download the job data, and upload that to the Jupyter workspace.

Download job data for failed job

In the screenshot above, there is a job that failed with status ETL_FAILED – by pressing "Download job data", you will clone the data that job used as a ZIP file. You can upload the contained sync-output folder to Jupyter to debug the script.

Test the script

Now that your data is cloned, you can run your transform script normally and it should pick up the data automatically.

If your script fails, you will see the error in the notebook (etl.ipynb). From there, you can change your script and run against the data again.

When your script works correctly, you should see the output files in the etl-output folder:

File system after running script against a cloned job

If you open the etl-output folder you'll see the files your transform script output

That's all there is to testing your script in hotglue!