Migrating your information warehouse workloads is without doubt one of the most difficult but important duties for any group. Whether or not the motivation is the expansion of what you are promoting and scalability necessities or decreasing the excessive license and {hardware} value of your current legacy programs, migrating will not be so simple as transferring recordsdata. At Databricks, our Skilled Providers crew (PS), has labored with a whole bunch of shoppers and companions on migration initiatives and have a wealthy report of profitable migrations. This weblog put up will discover greatest practices and classes discovered that any information skilled ought to think about when scoping, designing, constructing, and executing a migration.
5 phases for a profitable migration
At Databricks, we have now developed a five-phase course of for our migration initiatives based mostly on our expertise and experience.
Earlier than beginning any migration venture, we start with the discovery section. Throughout this section, we intention to grasp the explanations behind the migration and the challenges of the present legacy system. We additionally spotlight the advantages of migrating workloads to the Databricks Information Intelligence Platform. The invention section entails collaborative Q&A periods and architectural discussions with key stakeholders from the client, Databricks. Moreover, we use an automatic discovery profiler to achieve insights into the legacy workloads and estimate the consumption prices of the Databricks Platform to calculate TCO discount.
After finishing the invention section, we transfer on to a extra in-depth evaluation. Throughout this stage, we make the most of automated analyzers to guage the complexity of the present code and procure a high-level estimate of the hassle and price required. This course of offers helpful insights into the structure of the present information platform and the functions it helps. It additionally helps us refine the scope of the migration, get rid of outdated tables, pipelines, and jobs, and start contemplating the goal structure.
Within the migration technique and design section, we are going to finalize the main points of the goal structure and the detailed design for information migration, ETL, saved process code translation, and Report and BI modernization. At this stage, we can even map out the expertise between the supply and goal property. As soon as we have now finalized the migration technique, together with the goal structure, migration patterns, toolings, and chosen supply companions, Databricks PS, together with the chosen SI accomplice, will put together a migration Assertion of Work (SOW) for the Pilot (Part I) or a number of phases for the venture. Databricks has a number of licensed Migration Brickbuilder SI companions who present automated tooling to make sure profitable migrations. Moreover, Databricks Skilled Providers can present Migration Assurance companies together with an SI accomplice.
After the assertion of labor (SOW) is signed, Databricks Skilled Providers (PS) or the chosen Supply Associate carries out a manufacturing pilot section. On this section, a clearly outlined end-to-end use case is migrated to Databricks from the legacy platform. The information, code, and experiences are modernized to Databricks utilizing automated instruments and code converter accelerators. Finest practices are documented, and a Dash retrospective captures all the teachings discovered to determine areas for enchancment. A Databricks onboarding information is created to function the blueprint for the remaining phases, that are usually executed in parallel sprints utilizing agile Scrum groups.
Lastly, we progress to the full-fledged Migration execution section. We repeat our pilot execution method, integrating all the teachings discovered. This helps in establishing a Databricks Heart of Excellence (CoE) throughout the group and scaling the groups by collaborating with buyer groups, licensed SI companions, and our Skilled Providers crew to make sure migration experience and success.
Classes discovered
Suppose Large, Begin Small
It is essential through the technique section to completely perceive what you are promoting’s information panorama. Equally vital is to check a couple of particular end-to-end use circumstances through the manufacturing pilot section. Regardless of how effectively you propose, some points could solely come up throughout implementation. It is higher to face them early to search out options. An effective way to decide on a pilot use case is to begin with the tip aim – for instance, decide a reporting dashboard that is vital for what you are promoting, determine the information and processes wanted to create it, after which attempt creating the identical dashboard in your goal platform as a check. This provides you with a good suggestion of what the migration course of will contain.
Automate the invention section
We start by utilizing questionnaires and interviewing the database directors to grasp the scope of the migration. Moreover, our automated platform profilers scan by means of the information dictionaries of databases and hadoop system metadata to offer us with precise data-driven numbers on CPU utilizations, % ETL vs % BI utilization, utilization patterns by numerous customers, and repair principals. This info could be very helpful in estimating the Databricks prices and the ensuing TCO Financial savings. Code complexity analyzers are additionally helpful as they supply us with the variety of DDLs, DMLs, Saved procedures, and different ETL jobs to be migrated, together with their complexity classification. This helps us decide the migration prices and timelines.
Leverage Automated Code Converters
Using automated code conversion instruments is crucial to expedite migration and reduce bills. These instruments assist in changing legacy code, equivalent to saved procedures or ETL, to Databricks SQL. This ensures that no enterprise guidelines or features carried out within the legacy code are neglected as a result of lack of documentation. Moreover, the conversion course of usually saves builders over 80% of growth time, enabling them to promptly assessment the transformed code, make essential changes, and deal with unit testing. It’s essential to make sure that the automated tooling can convert not solely the database code but additionally the ETL code from legacy GUI-based platforms.
Past Code Conversion—Information Issues Too
Migrations usually create a deceptive impression of a clearly outlined venture. Once we take into consideration migration, we normally deal with changing code from the supply engine to the goal. Nevertheless, it is vital to not overlook different particulars which can be essential to make the brand new platform usable.
For instance, it’s essential to finalize the method for information migration, just like code migration and conversion. Information migration will be successfully achieved by utilizing Databricks LakeFlow Join the place relevant or by selecting considered one of our CDC Ingestion accomplice instruments. Initially, through the growth section, it might be essential to hold out historic and catch-up hundreds from the legacy EDW, whereas concurrently constructing the information ingestion from the precise sources to Databricks. Moreover, it is very important have a well-defined orchestration technique utilizing Databricks Workflows, Delta Stay Tables, or related instruments. Moreover, your migrated information platform ought to align along with your software program growth and CI/CD practices earlier than the migration is taken into account full.
Do not ignore governance and safety
Governance and safety are different parts which can be usually neglected when designing and scoping a migration. No matter your current governance practices, we advocate utilizing the Unity Catalog at Databricks as your single supply of reality for centralized entry management, auditing, lineage, and information discovery capabilities. Migrating and enabling the Unity Catalog will increase the hassle required for the entire migration. Additionally, discover the distinctive capabilities that a few of our Governance companions present.
Information Validation and Person Testing is crucial for profitable migration
It’s essential for the success of the venture to have correct information validation and energetic participation from enterprise Topic Matter Specialists (SMEs) throughout Person Acceptance Testing section. The Databricks migration crew and our licensed System Integrators (SIs) use parallel testing and information reconciliation instruments to make sure that the information meets all the information high quality requirements with none discrepancies. Robust alignment with executives ensures well timed and targeted participation of enterprise SMEs throughout user-acceptance testing, facilitating a fast transition to manufacturing and settlement on decommissioning older programs and experiences as soon as the brand new system is in place.
Make It Actual – operationalize and observe your migration
Implement good operational greatest practices, equivalent to information high quality frameworks, exception dealing with, reprocessing, and information pipeline observability controls, to seize and report course of metrics. It will assist determine and report any deviations or delays, permitting for speedy corrective actions. Databricks options like Lakehouse Monitoring and our system billing tables assist in observability and FinOps monitoring.
Belief the consultants
Migrations will be difficult. There’ll at all times be tradeoffs to stability and sudden points and delays to handle. You want confirmed companions and options for the folks, course of, and expertise elements of the migration. We advocate trusting the consultants at Databricks Skilled Providers and our licensed migration companions, who’ve intensive expertise in delivering high-quality migration options in a well timed method. Attain out to get your migration evaluation began.