Scaling RISE with SAP knowledge and AWS Glue

November 29, 2024

30

Clients usually wish to increase and enrich SAP supply knowledge with different non-SAP supply knowledge. Such analytic use instances might be enabled by constructing a knowledge warehouse or knowledge lake. Clients can now use the AWS Glue SAP OData connector to extract knowledge from SAP. The SAP OData connector helps each on-premises and cloud-hosted (native and SAP RISE) deployments. Through the use of the AWS Glue OData connector for SAP, you possibly can work seamlessly together with your knowledge on AWS Glue and Apache Spark in a distributed vogue for environment friendly processing. AWS Glue is a serverless knowledge integration service that makes it simpler to find, put together, transfer, and combine knowledge from a number of sources for analytics, machine studying (ML), and software growth.

AWS Glue OData connector for SAP makes use of the SAP ODP framework and OData protocol for knowledge extraction. This framework acts in a provider-subscriber mannequin to allow knowledge transfers between SAP programs and non-SAP knowledge targets. The ODP framework helps full knowledge extraction and alter knowledge seize via the Operational Delta Queues (ODQ) mechanism. As a supply for knowledge extraction for SAP, you should utilize SAP knowledge extractors, ABAP CDS views, SAP BW, or BW/4 HANA sources, HANA data views in SAP ABAP sources, or any ODP-enabled knowledge sources.

SAP supply programs can maintain historic knowledge, and may obtain fixed updates. For that reason, it’s vital to allow incremental processing of supply modifications. This weblog publish particulars how one can extract knowledge from SAP and implement incremental knowledge switch out of your SAP supply utilizing the SAP ODP OData framework with supply delta tokens.

Resolution overview

Instance Corp desires to research the product knowledge saved of their SAP supply system. They wish to perceive their present product providing, particularly the variety of merchandise that they’ve in every of their materials teams. It will embody becoming a member of knowledge from the SAP materials grasp and materials group knowledge sources from their SAP system. The fabric grasp knowledge is accessible on incremental extraction, whereas the fabric group is simply obtainable on a full load. These knowledge sources must be mixed and obtainable to question for evaluation.

Conditions

To finish the answer introduced within the publish, begin by finishing the next prerequisite steps:

Configure operational knowledge provisioning (ODP) knowledge sources for extraction within the SAP Gateway of your SAP system.
Create an Amazon Easy Storage Service (Amazon S3) bucket to retailer your SAP knowledge.
In an AWS Glue Knowledge Catalog, create a database referred to as sapgluedatabase.

Create an AWS Identification and Entry Administration (IAM) function for the AWS Glue extract, rework, and cargo (ETL) job to make use of. The function should grant entry to all sources utilized by the job, together with Amazon S3 and AWS Secrets and techniques Supervisor. For the answer on this publish, identify the function GlueServiceRoleforSAP. Use the next insurance policies:

AWS managed insurance policies:

Inline coverage:

{
       "Model": "2012-10-17",
       "Assertion": [
              {
                      "Sid": "VisualEditor0",
                      "Effect": "Allow",
                      "Action": [
                             "s3:PutObject",
                             "s3:GetObjectAcl",
                             "s3:GetObject",
                             "s3:GetObjectAttributes",
                             "s3:ListBucket",
                             "s3:DeleteObject",
                             "s3:PutObjectAcl"],
                      "Useful resource": [
                             "arn:aws:s3:::<S3-BUCKET-NAME>",
                             "arn:aws:s3:::<S3-BUCKET-NAME>/*"
                      ]
              }
       ]
}

Create the AWS Glue connection for SAP

The SAP connector helps each CUSTOM (that is SAP BASIC authentication) and OAUTH authentication strategies. For this instance, you’ll be connecting with BASIC authentication.

Use the AWS Administration Console for AWS Secrets and techniques Supervisor to create a secret referred to as ODataGlueSecret in your SAP supply. Particulars in AWS Secrets and techniques Supervisor ought to embody the weather within the following code. You will have to enter your SAP system username instead of <your SAP username> and its password instead of <your SAP username password>.
```
{
   "basicAuthUsername": "<your SAP username>",
   "basicAuthPassword": "<your SAP username password>",
   "basicAuthDisableSSO": "True",
   "customAuthenticationType": "CustomBasicAuth"
}
```
Create the AWS Glue connection GlueSAPOdata in your SAP system by deciding on the brand new SAP OData knowledge supply.
Configure the reference to the suitable values in your SAP supply.
1. Utility host URL: The host should have the SSL certificates for the authentication and validation of your SAP host identify.
2. Utility service path: /sap/opu/odata/iwfnd/catalogservice;v=2;
3. Port quantity: Port variety of your SAP supply system.
4. Consumer quantity: Consumer variety of your SAP supply system.
5. Logon language: Logon language of your SAP supply system.
Within the Authentication part, choose CUSTOM because the Authentication Kind.
Choose the AWS Secret created within the previous steps: SAPODataSecret.
Within the Community Choices part enter the VPC, subnet and safety group used for the connection to your SAP system. For extra data on connecting to your SAP system, see Configure a VPC in your ETL job.

Create an ETL job to ingest knowledge from SAP

Within the AWS Glue console, create a brand new Visible Editor AWS Glue job.

Go to the AWS Glue console.
Within the navigation pane below ETL Jobs select Visible ETL.
Select Visible ETL to create a job within the Visible Editor.
For this publish, edit the default identify to be Materials Grasp Job and select Save.

In your Visible Editor canvas, choose your SAP sources.

Select the Visible tab, then select the plus signal to open the Add nodes menu. Seek for SAP and add the SAP OData Supply.
Select the node you simply added and identify it Materials Grasp Attributes.
1. For SAP OData connection, choose the GlueSAPOData connection.
2. Choose the fabric attributes, service and entity set out of your SAP supply.
3. For Entity Identify and Sub Entity Identify, choose SAP OData entity out of your SAP supply.
4. From the Fields, choose Materials, Created on, Materials Group, Materials Kind, Previous Matl quantity, GLUE_FETCH_SQ, DELTA_TOKEN and DML_STATUS.
5. Enter restrict 100 within the filter part, to restrict the information for design time.

Observe that this service helps delta extraction, so Incremental switch is the default chosen choice.

After the AWS Glue service function particulars have been chosen, the information preview is accessible. You possibly can alter the preview to incorporate the three new obtainable fields, that are:

glue_fetch_sq: This can be a sequence area, generated from the EPOC timestamp within the order the report was obtained and is exclusive for every report. This can be utilized if you might want to know or set up the order of modifications within the supply system.
delta_token: All information could have this area worth clean, apart from the final handed report, which is able to include the worth for the ODQ token to seize any modified information (CDC). This report is just not a transactional report from the supply and is simply there for the aim of passing the delta token worth.
dml_status: It will present UPDATED for all newly inserted and up to date information from the supply and DELETED for information which were deleted from supply.

For delta enabled extraction, the final report handed will include the worth DELTA_TOKEN and the delta_token area might be crammed as talked about above.

Add one other SAP ODATA supply connection to your canvas, and identify this node Materials Group Textual content.
1. Choose the fabric group service and entity set out of your SAP supply
2. For Entity Identify and Sub Entity Identify, choose the SAP OData entity out of your SAP supply

Observe that this service helps full extraction, so Full switch is the default chosen choice. You may as well preview this dataset.

When previewing the information, discover the language key. SAP passes all languages, so add a filter of SPRAS = ‘E’ to solely extract English. Observe this makes use of the SAP inner worth of the sector.
Add a rework node to the canvas Change Schema rework after the Materials Group Textual content.
- Rename the fabric group area in goal key to matkl2, so it’s completely different than your first supply.
- Below Drop, choose ;spras, odq_changemode, odq_entitycntr, dml_status, delta_token and glue_fetch_sq.
Add a be part of rework to your canvas, bringing collectively each supply datasets.
1. Make sure the node dad and mom of each Materials Grasp Attributes and Change Schema have been chosen
2. Choose the Be a part of sort of Left be part of
3. Choose the be part of circumstances as the important thing fields from every supply
  - Below Materials Grasp Attributes, choose matkl
  - Below Change Schema, choose matkl2

You possibly can preview the output to make sure the right knowledge is being returned. Now, you’re able to retailer the end result.

Add the S3 bucket goal, to your canvas.
1. Make sure the node dad and mom is Be a part of
2. For format, choose Parquet.
3. For S3 Goal Location, browse to the S3 bucket you created within the conditions and add materialmaster/ to the S3 goal location.
4. For the Knowledge Catalog replace choices, choose Create a desk within the Knowledge Catalog and on subsequent runs, replace the schema and add new partitions.
5. For Database, choose the identify of the AWS Glue database created earlier sapgluedatabase.
6. For Desk identify, enter materialmaster.
Select Save to avoid wasting your job. Your job ought to appear to be the next determine.

Clone your ETL job and make it incremental

After your ETL job has been created, it’s able to clone and embody incremental knowledge dealing with utilizing delta tokens.

To do that, you have to to change the job script straight. You’ll modify the script so as to add a press release which retrieves the final delta token (to be saved on the job tag) and add the delta token worth to the to the request (or execution of the job), which is able to allow the Delta Enabled SAP OData Service when retrieving the information on the following job run.

The primary execution of the job is not going to have a delta token worth on the tag; subsequently, the decision might be an preliminary run and the delta token will subsequently be saved within the tags for future executions.

Go to the AWS Glue console.
Within the navigation pane below ETL Jobs select Visible ETL.
Choose the Materials Grasp Job, select Actions and choose Clone job.
Change the identify of the job to Materials Grasp Job Delta, then select the Script tab.
You’ll want to add an extra python library that may handle storing and retrieving the Delta Tokens for every job execution. To do that, navigate to the Job Particulars tab, scroll down and develop the Superior Properties part. Within the Python library path add the next path:
s3://aws-blogs-artifacts-public/artifacts/BDB-4789/sap_odata_state_management.zip

Now select the Script tab and select Edit script on the highest proper nook. Select Affirm to substantiate that your job might be script-only.

Apply the next modifications to the script to allow the delta token.

7. Import the SAP OData state administration library courses you added in step 5 above, by including the next code to row 8.
```
from sap_odata_state_management.state_manager import StateManagerFactory, StateManagerType, StateType
```
The following few steps will retrieve and persist the delta token within the job tags so it may be accessed by the next job execution. The delta token is added to the request again to the SAP supply, so the incremental modifications are extracted. If there is no such thing as a token handed, the load will run as an preliminary load and the token might be endured for the following run which is able to then be a delta load.To initialize the sap_odata_state_management library, extract the connection choices right into a variable and replace them utilizing the state supervisor. Do that by including the next code to line 16 (after the job.init assertion).

Yow will discover the <key of MaterialMasterAttributes node> and the <entityName for Materials Attribute> within the present generated script below # Script generated for node Materials Grasp Attributes. You’ll want to change with the suitable values.

key = "<key of MaterialMasterAttributes node>"
state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG, state_type=StateType.DELTA_TOKEN, choices={"job_name": args['JOB_NAME'], "logger": glueContext.get_logger()}
)
choices = {
    "connectionName": "GlueSAPOData",
    "entityName": "<entityName for Materials Attribute>",
    "ENABLE_CDC": "true"
}
connector_options = state_manager.get_connector_options(key)
choices.replace(connector_options)

9. Remark out the present script generated for node Materials Grasp Attributes by including a #, and add the next alternative snippet.

<key of MaterialMasterAttributes node> = glueContext.create_dynamic_frame.from_options(connection_type="sapodata", connection_options=choices, transformation_ctx="<key of MaterialMasterAttributes node>")

To extract the delta token from the dynamic body and persist it within the job tags, add the next code snippet simply above the final line in your script (earlier than job.commit())
```
state_manager.update_state(key, <key of MaterialMasterAttributes node>.toDF())
```

That is what your remaining script ought to appear to be:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from sap_odata_state_management.state_manager import StateManagerFactory, StateManagerType, StateType

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

key = "MaterialMasterAttributes_node1730873953236"
state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG, state_type=StateType.DELTA_TOKEN, choices={"job_name": args['JOB_NAME'], "logger": glueContext.get_logger()}
)
choices = {
    "connectionName": "GlueSAPOData",
    "entityName": "/sap/opu/odata/sap/ZMATERIAL_ATTR_SRV/EntityOf0MATERIAL_ATTR",
    "ENABLE_CDC": "true"
}

# Script generated for node Materials Group Textual content
MaterialGroupText_node1730874412841 = glueContext.create_dynamic_frame.from_options(connection_type="sapodata", connection_options={"ENABLE_CDC": "false", "connectionName": "GlueSAPOData", "FILTER_PREDICATE": "SPRAS = 'E'", "ENTITY_NAME": "/sap/opu/odata/sap/ZMATL_GROUP_SRV/EntityOf0MATL_GROUP_TEXT"}, transformation_ctx="MaterialGroupText_node1730874412841")

# Script generated for node Materials Grasp Attributes
#MaterialMasterAttributes_node1730873953236 = glueContext.create_dynamic_frame.from_options(connection_type="sapodata", connection_options={"ENABLE_CDC": "true", "connectionName": "GlueSAPOdata", "FILTER_PREDICATE": "restrict 100", "SELECTED_FIELDS": "MATNR,MTART,MATKL,BISMT,ERSDA,DML_STATUS,DELTA_TOKEN,GLUE_FETCH_SQ", "ENTITY_NAME": "/sap/opu/odata/sap/ZMATERIAL_ATTR_SRV/EntityOf0MATERIAL_ATTR"}, transformation_ctx="MaterialMasterAttributes_node1732755261264")
MaterialMasterAttributes_node1730873953236 = glueContext.create_dynamic_frame.from_options(connection_type="sapodata", connection_options=choices, transformation_ctx="MaterialMasterAttributes_node1730873953236")

# Script generated for node Change Schema
ChangeSchema_node1730875214894 = ApplyMapping.apply(body=MaterialGroupText_node1730874412841, mappings=[("matkl", "string", "matkl2", "string"), ("txtsh", "string", "txtsh", "string")], transformation_ctx="ChangeSchema_node1730875214894")

# Script generated for node Be a part of
MaterialMasterAttributes_node1730873953236DF = MaterialMasterAttributes_node1730873953236.toDF()
ChangeSchema_node1730875214894DF = ChangeSchema_node1730875214894.toDF()
Join_node1730874996674 = DynamicFrame.fromDF(MaterialMasterAttributes_node1730873953236DF.be part of(ChangeSchema_node1730875214894DF, (MaterialMasterAttributes_node1730873953236DF['matkl'] == ChangeSchema_node1730875214894DF['matkl2']), "left"), glueContext, "Join_node1730874996674")

# Script generated for node Amazon S3
AmazonS3_node1730875848117 = glueContext.write_dynamic_frame.from_options(body=Join_node1730874996674, connection_type="s3", format="json", connection_options={"path": "s3://sapglueodatabucket", "compression": "snappy", "partitionKeys": []}, transformation_ctx="AmazonS3_node1730875848117")
state_manager.update_state(key, MaterialMasterAttributes_node1730873953236.toDF())
job.commit()

Select Save to avoid wasting your modifications.
Select Run to run your job. Observe that there are presently no tags in your job particulars.
Wait in your job run to be efficiently accomplished. You possibly can see the standing on the Runs tab.
After your job run is full, you’ll discover on the Job Particulars tab {that a} tag has been added. The following job run will learn this token and run a delta load.

Question your SAP knowledge supply knowledge

The AWS Glue job run has created an entry within the Knowledge Catalog enabling you to question the information instantly.

Go to the Amazon Athena console.
Select Launch Question Editor.
Ensure you have an applicable workgroup assigned, or create a workgroup if required.
Choose the sapgluedatabase and run a question (corresponding to the next) to start out analyzing your knowledge.
```
choose matkl, txtsh, depend(*)
from materialmaster
group by 1, 2
order by 1, 2;
```

Clear up

To keep away from incurring fees, clear up the sources used on this publish out of your AWS account, together with the AWS Glue jobs, SAP OData connection, Glue Knowledge Catalog entry, Secrets and techniques Supervisor secret, IAM function, the contents of the S3 bucket, and the S3 bucket.

Conclusion

On this publish, we confirmed you methods to create a serverless incremental knowledge load course of for a number of SAP knowledge sources. The strategy used AWS Glue to incrementally load the information from a SAP supply utilizing SAP ODP delta tokens after which load the information into Amazon S3.

The serverless nature of AWS Glue implies that there is no such thing as a infrastructure administration, and also you pay just for the sources consumed whereas your jobs are working (plus storage value for outputs). As organizations more and more turn into extra knowledge pushed, this SAP connector can present an environment friendly, value efficient, performant, safe strategy to embody SAP supply knowledge in your huge knowledge and analytic outcomes. For extra data see AWS Glue.

In regards to the authors

Allison Quinn is a Sr. ANZ Analytics Specialist Options Architect for Knowledge and AI primarily based in Melbourne, Australia working intently with Monetary Service clients within the area. Allison labored over 15 years with SAP merchandise earlier than concentrating her Analytics technical specialty on AWS native providers. She’s very obsessed with all issues knowledge, and democratizing in order that clients of all kinds can drive enterprise profit.

Pavol is an Innovation Resolution Architect at AWS, specializing in SAP cloud adoption throughout EMEA. With over 20 years of expertise, he helps international clients migrate and optimize SAP programs on AWS. Pavol develops tailor-made methods to transition SAP environments to the cloud, leveraging AWS’s agility, resiliency, and efficiency. He assists purchasers in modernizing their SAP landscapes utilizing AWS’s AI/ML, knowledge analytics, and software providers to boost intelligence, automation, and efficiency.

Partha Pratim Sanyal is a Software program Growth Engineer with AWS Glue in Vancouver, Canada, specializing in Knowledge Integration, Analytics, and Connectivity. With in depth backend growth experience, he’s devoted to crafting impactful, customer-centric options. His work focuses on constructing options that empower customers to effortlessly analyze and perceive their knowledge. Partha’s dedication to addressing advanced person wants drives him to create intuitive and value-driven experiences that elevate knowledge accessibility and insights for patrons.

Diego is an skilled Enterprise Options Architect with over 20 years’ expertise throughout SAP applied sciences, specializing in SAP innovation and knowledge and analytics. He has labored each as accomplice and as a buyer, giving him a whole perspective on what it takes to promote, implement, and run programs and organizations. He’s obsessed with expertise and innovation, specializing in buyer outcomes and delivering enterprise worth.

Luis Alberto Herrera Gomez is a Software program Growth Engineer with AWS Glue in Vancouver, specializing in backend engineering, microservices, and cloud computing. With 7-8 years of expertise, together with roles as a backend and full-stack developer for a number of startups earlier than becoming a member of Amazon and AWS; Luis focuses on growing scalable and environment friendly cloud-based functions. His experience in AWS applied sciences allows him to design high-performance programs that deal with advanced knowledge processing duties. Luis is obsessed with leveraging cloud computing to fixing difficult enterprise issues.

Scaling RISE with SAP knowledge and AWS Glue

Resolution overview

Conditions

Create the AWS Glue connection for SAP

Create an ETL job to ingest knowledge from SAP

Clone your ETL job and make it incremental

Question your SAP knowledge supply knowledge

Clear up

Conclusion

In regards to the authors

Related Articles

Diligent Robotics CEO discusses street to 1M hospital deliveries, future Moxi locations

Cisco’s Frontier in Cybersecurity Options

The AI Agency Turning 1M Actual-Time Information Sources Into Actionable Intelligence

LEAVE A REPLY Cancel reply

Latest Articles

Diligent Robotics CEO discusses street to 1M hospital deliveries, future Moxi locations

Cisco’s Frontier in Cybersecurity Options

The AI Agency Turning 1M Actual-Time Information Sources Into Actionable Intelligence

The Largest AI for Biology But Writes Genomes From Scratch

Contamination detection instrument merges artificial biology and nanotech for ultrasensitive water testing