Amazon DynamoDB, a serverless NoSQL database, has been a go-to resolution for over a million prospects to construct low-latency and high-scale functions. As knowledge grows, organizations are always in search of methods to extract helpful insights from operational knowledge, which is commonly saved in DynamoDB. Nevertheless, to benefit from this knowledge in Amazon DynamoDB for analytics and machine studying (ML) use circumstances, prospects typically construct customized knowledge pipelines—a time-consuming infrastructure activity that provides little distinctive worth to their core enterprise.
Beginning immediately, you should utilize Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse to run analytics and ML workloads in only a few clicks with out consuming your DynamoDB desk capability. Amazon SageMaker Lakehouse unifies all of your knowledge throughout Amazon S3 knowledge lakes and Amazon Redshift knowledge warehouses, serving to you construct highly effective analytics and AI/ML functions on a single copy of information.
Zero-ETL is a set of integrations that eliminates or minimizes the necessity to construct ETL knowledge pipelines. This zero-ETL integration reduces the complexity of engineering efforts required to construct and keep knowledge pipelines, benefiting customers operating analytics and ML workloads on operational knowledge in Amazon DynamoDB with out impacting manufacturing workflows.
Let’s get began
For the next demo, I have to arrange zero-ETL integration for my knowledge in Amazon DynamoDB with an Amazon Easy Storage Service knowledge lake managed by Amazon SageMaker Lakehouse. Earlier than organising the zero-ETL integration, there are stipulations to finish. If you wish to be taught extra on tips on how to arrange, check with this Amazon DynamoDB documentation web page.
With all of the stipulations accomplished, I can get began with this integration. I navigate to the AWS Glue console and choose Zero-ETL integrations beneath Knowledge Integration and ETL. Then, I select Create zero-ETL integration.
Right here, I’ve choices to pick my knowledge supply. I select Amazon DynamoDB and select Subsequent.
Subsequent, I have to configure the supply and goal particulars. Within the Supply particulars part, I choose my Amazon DynamoDB desk. Within the Goal particulars part, I specify the S3 bucket that I’ve arrange within the AWS Glue Knowledge Catalog.
To arrange this integration, I want an IAM function that grants AWS Glue the mandatory permissions. For steering on configuring IAM permissions, go to the Amazon DynamoDB documentation web page. Additionally, if I haven’t configured a useful resource coverage for my AWS Glue Knowledge Catalog, I can choose Repair it for me to routinely add the required useful resource insurance policies.
Right here, I’ve choices to configure the output. Below Knowledge partitioning, I can both use DynamoDB desk keys for partitioning or specify customized partition keys. After finishing the configuration, I select Subsequent.
As a result of I choose the Repair it for me checkbox, I have to evaluation the required modifications and select Proceed earlier than I can proceed to the following step.
On the following web page, I’ve the flexibleness to configure knowledge encryption. I can use AWS Key Administration Service (AWS KMS) or a customized encryption key. Then, I assign a reputation to the mixing and select Subsequent.
On the final step, I have to evaluation the configurations. After I’m completely happy, I select Subsequent to create the zero-ETL integration.
After the preliminary knowledge ingestion completes, my zero-ETL integration will probably be prepared to be used. The completion time varies relying on the scale of my supply DynamoDB desk.
If I navigate to Tables beneath Knowledge Catalog within the left navigation panel, I can observe extra particulars together with Schema. Below the hood, this zero-ETL integration makes use of Apache Iceberg to rework associated to knowledge format and construction in my DynamoDB knowledge into Amazon S3.
Lastly, I can inform that every one my knowledge is accessible in my S3 bucket.
This zero-ETL integration considerably reduces the complexity and operational burden of information motion, and I can subsequently deal with extracting insights quite than managing pipelines.
Accessible now
This new zero-ETL functionality is accessible within the following AWS Areas: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Hong Kong, Singapore, Sydney, Tokyo), Europe (Frankfurt, Eire, Stockholm).
Discover tips on how to streamline your knowledge analytics workflows utilizing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse. Be taught extra tips on how to get began on the Amazon DynamoDB documentation web page.
Completely happy constructing!
— Donnie