Introducing the HubSpot connector for AWS Glue

December 2, 2024

25

Most corporations have adopted a various set of software program as a service (SaaS) platforms to assist varied purposes. The fast adoption has enabled them to rapidly streamline operations, improve collaboration, and acquire extra accessible, scalable options for managing their important knowledge and workflows.

Extra corporations have realized there is a chance to combine, improve, and current this SaaS knowledge to enhance inner operations and acquire precious insights on their knowledge. Utilizing AWS Glue, a serverless knowledge integration service, corporations can streamline this course of, integrating knowledge from inner and exterior sources right into a centralized AWS knowledge lake. From there, they will carry out significant analytics, acquire precious insights, and optionally push enriched knowledge again to exterior SaaS platforms.

This publish introduces the new HubSpot managed connector for AWS Glue, and demonstrates how one can combine HubSpot knowledge into your present knowledge lake on AWS. By consolidating HubSpot knowledge with knowledge out of your AWS accounts and from different SaaS companies, you may improve, analyze, and optionally write the info again to HubSpot, making a seamless and built-in knowledge expertise.

Resolution overview

On this instance, we use AWS Glue to extract, rework, and cargo (ETL) knowledge out of your HubSpot account right into a transactional knowledge lake on Amazon Easy Storage Service (Amazon S3), utilizing Apache Iceberg format. We register the schema within the AWS Glue Information Catalog to make your knowledge discoverable. Subsequently, we use Amazon Athena to validate that the HubSpot knowledge has been efficiently loaded to Amazon S3. The next diagram illustrates the answer structure.

The next are key parts and steps within the integration:

Configure your HubSpot account and app to allow entry to your HubSpot knowledge.
Put together for knowledge motion by securely storing your HubSpot OAuth credentials in AWS Secrets and techniques Supervisor, creating an S3 bucket to retailer your ingested knowledge, and creating an AWS Id and Entry Administration (IAM) function for AWS Glue.
Create an AWS Glue job to extract and cargo knowledge from HubSpot to Amazon S3. AWS Glue establishes a safe connection to HubSpot utilizing OAuth for authorization and TLS for knowledge encryption in transit. AWS Glue additionally helps the flexibility to use complicated knowledge transformations, enabling environment friendly knowledge integration and preparation to fulfill your wants.
Schema and different metadata shall be registered within the AWS Glue Information Catalog, a centralized metadata repository for all of your knowledge property. This helps simplify schema administration, and likewise makes the info discoverable by different companies.
Run the AWS Glue job to extract knowledge from HubSpot and write it to Amazon S3 utilizing Iceberg format. Apache Iceberg is an open supply, high-performance open desk format designed for large-scale analytics, offering transactional consistency and seamless schema evolution. Though we use Iceberg on this instance, AWS Glue gives strong assist for varied knowledge codecs, together with different transactional codecs akin to Apache Hudi and Delta Lake.
The information loaded to Amazon S3 shall be organized into partitioned folders to optimize for question efficiency and administration. Amazon S3 may also retailer the AWS Glue scripts, logs, and different momentary knowledge required throughout the ETL course of.
Lastly, Amazon Athena shall be used to question the info loaded from HubSpot to Amazon S3, validating that every one adjustments within the supply system have been captured efficiently.
Optionally, HubSpot can repeatedly synchronize HubSpot knowledge to Amazon S3 and analyze knowledge updates over time.

Arrange your HubSpot account

This instance requires you to create a HubSpot public app for AWS Glue in a HubSpot Developer account, and join it to an related HubSpot account. A HubSpot public app is a kind of integration that may be put in in your HubSpot accounts or listed within the HubSpot Market. On this instance, you create a HubSpot app for the AWS Glue integration, and set up it in a brand new take a look at account. Though HubSpot calls it a public app, it is not going to be listed of their Market and can solely have entry to your take a look at account.

For those who don’t have already got one, join a free HubSpot developer account.
Log in to your HubSpot developer account, the place you’ll see choices to create apps and take a look at accounts.
Select Create a take a look at account and comply with the directions.

HubSpot take a look at accounts have Enterprise variations of the HubSpot Advertising and marketing, Gross sales, and Service Hubs together with pattern knowledge, so you may take a look at most HubSpot instruments, create CRM knowledge, and entry it by way of APIs with Glue. For extra details about making a take a look at account, consult with Create a developer take a look at account.

Create a HubSpot app

Full the next steps to create a HubSpot app:

Change again to your HubSpot developer account, and select Create an app.
Fill within the App Data part with the identify AWS Glue and a short description.
Select the Auth tab.
For Redirect URLs, enter the redirect URL for AWS Glue within the kind: https://<area>.console.aws.amazon.com/gluestudio/oauth.

Make sure you substitute <area> along with your AWS Glue working AWS Area. As an illustration, the code for the US East (N. Virginia) Area is us-east-1, so the AWS Glue redirect URL is https://us-east-1.console.aws.amazon.com/gluestudio/oauth.

Within the Scopes part, select Add new scope and choose the next permissions:
- automation
- content material
- crm.lists.learn
- crm.lists.write
- crm.objects.corporations.learn
- crm.objects.corporations.write
- crm.objects.contacts.learn
- crm.objects.contacts.write
- crm.objects.customized.learn
- crm.objects.customized.write
- crm.objects.offers.learn
- crm.objects.offers.write
- crm.objects.homeowners.learn
- crm.schemas.customized.learn
- e-commerce
- varieties
- oauth
- sales-email-read
- tickets
Evaluate the Scopes and Redirect URL settings, then select Create app.
Navigate again to your app Auth tab.
Pay attention to the values for Consumer ID, Consumer secret, and Set up URL (OAuth). You have to these later to attach your AWS Glue occasion.

Choose or create an Amazon S3 bucket the place your HubSpot knowledge will reside

Choose an present Amazon S3 bucket in your account, or create a brand new bucket to retailer your HubSpot knowledge, in addition to scripts, logs, and so forth. For this instance, the bucket identify will comply with the format aws-glue-hubspot-<account>-<area>, the place <account> is the AWS account quantity and <area> is the working Area. The account shall be configured with all defaults: public entry disabled, versioning disabled, and server-side encryption with Amazon S3 managed keys (SSE-S3).

For those who use AWSGlueServiceRole in your IAM function as proven on this instance, it is going to present entry to S3 buckets with names beginning with aws-glue-.

Create an IAM function for AWS Glue

Create an IAM function with permissions for the AWS Glue job. AWS Glue will assume this function when calling different companies in your behalf.

On the IAM console, select Roles within the navigation pane.
Select Create function.
For Trusted entity kind¸ select AWS service.
For Use case, select Glue.
Add the next AWS managed insurance policies to the function:
1. AWSGlueServiceRole for accessing associated companies akin to Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage permits entry to S3 buckets with names beginning with aws-glue-.
2. SecretsManagerReadWrite for learn/write entry to AWS Secrets and techniques Supervisor.
Give the function a reputation, for example AWSGlueServiceRole_blog.

For extra data, see Getting began with AWS Glue and Create an IAM function for AWS Glue.

Create a AWS Secrets and techniques Supervisor secret

AWS Secrets and techniques Supervisor is used to securely retailer your HubSpot OAuth credentials. Full the next steps to create a secret:

On the AWS Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
Select Retailer a brand new secret.
For Secret kind, choose Different kind of secret.
Beneath Kay/worth pairs, enter the HubSpot shopper secret with the important thing USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET.
Select Subsequent.

Enter the key identify, akin to HubSpot-Weblog, an outline, and proceed.
Depart the key rotation as default, and select Subsequent.
Evaluate the key configuration, and select Retailer.

Create an AWS Glue connection

Full the next steps to create an AWS Glue connection to your HubSpot account:

On the AWS Glue console, select Information connections within the navigation pane.
Select Create connection.
For Information sources, seek for and choose HubSpot.
Select Subsequent.

On the Configure connection web page, fill within the required data:
1. For IAM service function, select the service function created beforehand. On this instance, we use the function AWSGlueServiceRole_blog.
2. For Authentication URL, depart as default.
3. For Consumer Managed Consumer Utility ClientId, enter the OAuth shopper ID from HubSpot.
4. For AWS Secret, select the OAuth shopper secret identify configured beforehand in AWS Secrets and techniques Supervisor.
5. Select Subsequent.

Select Check Connection to validate the connection to HubSpot.
This may deliver up a brand new HubSpot connection window. Make sure you choose your HubSpot take a look at account (not your developer account) to check the connection.
If that is your first connection try, you can be redirected to a different web page the place you’re requested to verify the entry stage granted to AWS Glue. Select Join App.

If profitable, the HubSpot window will shut and your AWS connection window will say Connection take a look at profitable.

Beneath Set properties, for Title, enter a reputation (for instance, HubSpot_Connection_blog).
Select Subsequent.
Beneath Evaluate and create, evaluation your settings after which create the connection.

Create a database in AWS Glue Information Catalog

Full the next steps to create a database in AWS Glue Information Catalog to arrange your HubSpot knowledge:

On the AWS Glue console, select Databases within the navigation pane.
Create a brand new database.
Enter a reputation (for instance, hubspot).
You’ll be able to depart the placement discipline clean.
Select Create database.

Create an AWS Glue ETL job

Now that you’ve an AWS Glue knowledge connection to your HubSpot account, you may create an AWS Glue ETL job to ingest HubSpot knowledge into your AWS knowledge lake. AWS Glue supplies each visible and code-based interfaces to simplify knowledge integration, relying in your experience. On this instance, we use the Script interface to ingest HubSpot knowledge into the Amazon S3 location. Full the next steps:

On the AWS Glue console, select ETL jobs within the navigation pane.
Select the Script editor.
Select Spark because the engine, and add the next script.

The AWS Glue Spark job reads the HubSpot knowledge and merges it into the S3 bucket in Iceberg format.

On the Job particulars tab, present the next data:
For Title, enter a reputation, akin to HubSpot_to_S3_blog.
For Description, enter a significant description of the job.
For IAM Function, select the IAM function you created beforehand (for this publish, AWSGlueServiceRole_blog).

Develop Superior properties.
Beneath Connections, enter your HubSpot connection from the earlier part (for this publish, HubSpot_Connection_blog).

Beneath Job parameters, enter the next parameters:

- For --conf, enter spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
- For --datalake-formats, enter iceberg
- For --db_name, enter the AWS Glue database to retailer your knowledge lake (for this publish, hubspot)
- For --table_name, enter the HubSpot desk to be ingested (for this publish, firm)
- For --s3_bucket_name, enter the place the ingested Iceberg desk is saved, on this case aws-glue-hubspot-<account>-<area>
- For --connection_name, enter the AWS Glue connection identify created, on this case HubSpot_Connection_blog

Select Save to save lots of the job, then select Run.

Relying on the quantity of knowledge in your HubSpot account, the job can take a couple of minutes to finish. After a profitable job run, you may select Run particulars to see the job specs and logs.

Use Athena to question knowledge

Athena is an interactive and serverless question service that makes it easy to research knowledge straight in Amazon S3 utilizing normal SQL. On this instance, we question the outcomes of the HubSpot knowledge ingested into Amazon S3.

On the Athena console, select Question editor.
For Database, select hubspot, and you must see your firm desk.
Choose entries from the hubspot.firm desk to view the info captured from hubspot.

You’ll be able to strive varied queries on the HubSpot knowledge, akin to:

-- get pattern of dataset
SELECT * FROM "hubspot"."firm" restrict 10;

-- get corporations income
SELECT * FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

-- get variety of corporations with income
SELECT COUNT(*) AS companies_count FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

Over time, your HubSpot knowledge might change. You’ll be able to rerun your ETL job periodically, and the Iceberg knowledge lake desk will successfully seize your adjustments. You’ll be able to confirm by including, eradicating, and altering corporations in your HubSpot database, after which rerun the ETL job. Your knowledge lake ought to match your newest HubSpot knowledge. With this functionality, you may schedule the ETL job to run as typically as you want.

Extending the HubSpot connector with AWS companies

The HubSpot connector for AWS Glue supplies a robust basis for constructing complete knowledge pipelines and analytics workflows. By integrating HubSpot knowledge into your AWS setting, you should utilize further companies like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker to additional course of, rework, and analyze the info. This lets you assemble subtle, end-to-end knowledge architectures that unlock the total worth of your HubSpot knowledge, with out the necessity to handle complicated infrastructure. The seamless integration between these AWS companies makes it easy to construct scalable analytics pipelines tailor-made to your particular necessities.

Issues

You’ll be able to arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the info is repeatedly synchronized between HubSpot and Amazon S3. It’s also possible to combine the ETL jobs with different AWS companies, together with AWS Step Capabilities, Amazon MWAA (Amazon Managed Workflows for Apache Airflow), AWS Lambda, Amazon EventBridge , and Amazon Bedrock to create a extra superior knowledge processing pipeline.

By default, the HubSpot connector doesn’t import deleted data. Nonetheless, you may set the IMPORT_DELETED_RECORDS choice to true to import all data, together with the deleted ones.

Clear up

To keep away from incurring fees, clear up the assets used on this publish out of your AWS account, together with the AWS Glue jobs, HubSpot connection, AWS Secrets and techniques Supervisor secret, IAM function, and Amazon S3 bucket.

Conclusion

With the introduction of the AWS Glue connector for HubSpot, integrating HubSpot knowledge with data from different knowledge sources has grow to be extra streamlined than ever. This characteristic allows you to arrange ongoing knowledge integration from HubSpot to AWS, offering a unified view of knowledge from throughout platforms and enabling extra complete analytics. The serverless nature of AWS Glue means there is no such thing as a infrastructure administration required, and also you solely pay for the assets consumed. By following the steps outlined on this publish, you may be sure that up-to-date knowledge from HubSpot is captured within the your knowledge lake, permitting groups to make sooner data-driven selections and uncover complicated insights from throughout knowledge sources.

To study extra concerning the AWS Glue connector for HubSpot, consult with Connecting to HubSpot in AWS Glue. This information walks by way of your complete course of, from establishing the connection to operating the info switch circulate. For extra data on AWS Glue, go to AWS Glue.

Concerning the Authors

Eric Bomarsi is a Senior Options Architect within the ISV group at AWS, the place he focuses on constructing scalable options for giant clients. As a member of the AWS analytics neighborhood, he helps clients get strategic insights from their knowledge. Exterior of labor, he enjoys enjoying ice hockey and touring along with his household.

Annie Nelson is a Senior Options Architect at AWS. She is an information fanatic who enjoys downside fixing and tackling complicated architectural challenges with clients.

Kartikay Khator is a Options Architect inside International Life Sciences at AWS, the place he dedicates his efforts to growing modern and scalable options that cater to the evolving wants of shoppers. His experience lies in harnessing the capabilities of AWS analytics companies. Extending past his skilled pursuits, he finds pleasure and success on this planet of operating and mountaineering. Having already accomplished a number of marathons, he’s at the moment making ready for his subsequent marathon problem.

Kamen Sharlandjiev is a Sr. Huge Information and ETL Options Architect, Amazon MWAA and AWS Glue ETL professional. He’s on a mission to make life simpler for purchasers who’re going through complicated knowledge integration and orchestration challenges. His secret weapon? Totally managed AWS companies that may get the job accomplished with minimal effort. Observe Kamen on LinkedIn to maintain updated with the most recent Amazon MWAA and AWS Glue options and information!