In Cloudera deployments on public cloud, one of many key configuration parts is the DNS. Get it improper and your deployment could change into wholly unusable with customers unable to entry and use the Cloudera information providers. If the DNS is ready up much less perfect than it could possibly be, connectivity and efficiency points could come up. On this weblog, we’ll take you thru our tried and examined finest practices for organising your DNS to be used with Cloudera on Azure.
To get began and provide you with a really feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed providers getting used:
- AKS cluster: information warehouse, information engineering, machine studying, and Knowledge move
- MySQL database: information engineering
- Storage account: all providers
- Azure database for PostgreSQL DB: information lake and information hub clusters
- Key vault: all providers
Typical buyer governance restrictions and the impression
Most Azure customers use personal networks with a firewall as egress management. Most customers have restrictions on firewalls for wildcard guidelines. Cloudera assets are created on the fly, which suggests wildcard guidelines could also be declined by the safety group.
Most Azure customers use hub-spoke community topology. DNS servers are often deployed within the hub digital community or an on-prem information heart as an alternative of within the Cloudera VNET. Which means if DNS isn’t configured accurately, the deployment will fail.
Most Cloudera prospects deploying on Azure enable the usage of service endpoints; there’s a smaller set of organizations that don’t enable the usage of service endpoints. Service endpoint is a less complicated implementation to permit assets on a personal community to entry managed providers on Azure Cloud. If service endpoints usually are not allowed, firewall and personal endpoints would be the different two choices. Most cloud customers don’t like opening firewall guidelines as a result of that can introduce the chance of exposing personal information on the web. That leaves personal endpoints the one possibility, which will even introduce extra DNS configuration for the personal endpoints.
Connectivity from personal community to Azure managed providers
Firewall to Web
Route from firewall to Azure managed service endpoint on the web immediately.
Service endpoint
Azure offers service endpoints for assets on personal networks to entry the managed providers on the web with out going via the firewall. That may be configured at a subnet degree. Since Cloudera assets are deployed in numerous subnets, this configuration have to be enabled on all subnets.
The DNS information of the managed providers utilizing service endpoints might be on the web and managed by Microsoft. The IP tackle of this service might be a public IP, and routable from the subnet. Please confer with the Microsoft documentation for element.
Not all managed providers assist providers endpoint. In a Cloudera deployment situation, solely storage accounts, PostgreSQL DB, and Key Vault assist service endpoints.
Thankfully, most customers enable service endpoints. If a buyer doesn’t enable service endpoints, they should go together with a personal endpoint, which is analogous to what must be configured within the following content material.
Non-public Endpoint
There’s a community interface with a personal IP tackle created with a personal endpoint, and there’s a personal hyperlink service related to a particular community interface, in order that different assets within the personal community can entry this service via the personal community IP tackle.
The important thing right here is for the personal assets to discover a DNS resolve for that personal IP tackle. There are two choices to retailer the DNS file:
- Azure managed public DNS zones will at all times be there, however they retailer various kinds of IP addresses for the personal endpoint. For instance:
- Storage account personal endpoint—the general public DNS zone shops the general public IP tackle of that service.
- AKS API server personal endpoint—the general public DNS zone shops the personal IP of that service.
- Azure Non-public DNS zone: The DNS information might be synchronized to the Azure Default DNS of LINKED VNET.
Non-public endpoint is eligible to all Azure managed providers which can be utilized in Cloudera deployments.
As a consequence, for storage accounts, customers both use service endpoints or personal endpoints. As a result of the general public DNS zone will at all times return a public IP, the personal DNS zone turns into a compulsory configuration.
For AKS, these two DNS alternate options are each appropriate. The challenges of personal DNS zones might be mentioned subsequent.
Challenges of personal DNS zone on Azure personal community
Essential Assumptions
As talked about above for the everyday situation, most Azure customers are utilizing a hub-and-spoke community structure, and deploy customized personal DNS on hub VNET.
The DNS information might be synchronized to Azure default DNS of linked VNET.
Easy Structure Use Circumstances
One VNET situation with personal DNS zone:
When a personal endpoint is created, Cloudera on Azure will register the personal endpoint to the personal DNS zone. The DNS file might be synchronized to Azure Default DNS of linked VNET.
If customers use customized personal DNS, they’ll configure conditional ahead to Azure Default DNS for the area suffix of the FQDN.
Hub-and-spoke VNET with Azure default DNS:
With hub-spoke VNET and Azure default DNS, that’s nonetheless acceptable. The one downside is that the assets on the un-linked VNET will be unable to entry the AKS. However since AKS is utilized by Cloudera, that doesn’t pose any main points.
The Problem Half
The preferred community structure amongst Azure customers is hub-spoke community with customized personal DNS servers deployed both on hub-VNET or on-premises community.
Since DNS information usually are not synchronized to the Azure Default DNS of the hub VNET, the customized personal DNS server can’t discover the DNS file for the personal endpoint. And since the Cloudera VNET is utilizing the customized personal DNS server on hub VNET, the Cloudera assets on Cloudera VNET will go to a customized personal DNS server for DNS decision of the FQDN of the personal endpoint. The provisioning will fail.
With the DNS server deployed within the on-prem community, there isn’t Azure default DNS related to the on-prem community, so the DNS server couldn’t discover the DNS file of the FQDN of the personal endpoint.
Configuration finest practices
In opposition to the background
Possibility 1: Disable Non-public DNS Zone
Use Azure managed public DNS zone as an alternative of a personal DNS zone.
- For information warehouse: create information warehouses via the Cloudera command line interface with the parameter “privateDNSZoneAKS”: set to”None.”
- For Liftie-based information providers: the entitlement “LIFTIE_AKS_DISABLE_PRIVATE_DNS_ZONE” have to be set. Clients can request this entitlement to be set both via a JIRA ticket or have their Cloudera resolution engineer to make the request on their behalf.
The only real disadvantage of this selection is that it doesn’t apply to information engineering, since that information service will create and use a MySQL personal DNS zone on the fly. There’s at current no choice to disable personal DNS zones for information engineering.
Possibility 2: Pre-create Non-public DNS Zones
Pre-create personal DNS zones and hyperlink each Cloudera and hub VNETs to them.
The benefit of this method is that each information warehouse and Liftie-based information providers assist pre-created personal DNS zones. There are nonetheless additionally just a few drawbacks:
- For Liftie, the personal DNS zone must be configured when registering the atmosphere. As soon as previous the atmosphere registration stage, it can’t be configured.
- DE will want a personal DNS zone for MySQL and it doesn’t assist pre-configured personal DNS zones.
- On-premises networks can’t be linked to a personal DNS zone. If the DNS server is on an on-prem community, there are not any workable options.
Possibility 3: Create DNS Server as a Forwarder.
Create a few DNS servers (for HA consideration) with load balancer in Cloudera VNET, and configure conditional ahead to Azure Default DNS of the Cloudera VNET. Configure conditional ahead from the corporate customized personal DNS server to the DNS server within the Cloudera subnet.
The disadvantage of this selection is that extra DNS servers are required, which results in extra administration overhead for the DNS group.
Possibility 4: Azure-Managed DNS Resolve
Create a devoted /28 subnet in Cloudera VNET for Azure personal DNS resolver inbound endpoint. Configure conditional ahead from customized personal DNS to the Azure personal DNS resolver inbound endpoint.
Abstract
Bringing all issues collectively, think about these finest practices for organising your DNS with Cloudera on Azure:
- For the storage account, key vault, postgres DB
- Use service endpoints as the primary selection.
- If service endpoint isn’t allowed, pre-create personal DNS zones and hyperlink to the VNET the place the DNS server is deployed. Configure conditional forwards from customized personal DNS to Azure default DNS.
- If the customized personal DNS is deployed within the on-premises community, use Azure DNS resolver or one other DNS server as DNS forwarder on the Cloudera VNET. Conditional ahead the DNS lookup from the personal DNS to the resolver endpoint.
- For the information warehouse, DataFlow, or machine studying information providers
- Disable the personal DNS zone and use the general public DNS zone as an alternative.
- For the information engineering information service
- Configure the Azure DNS resolver or one other DNS server as a DNS forwarder on the Cloudera VNET. Conditional ahead the DNS lookup from the personal DNS to the resolver endpoint. Please confer with Microsoft documentation for the main points of organising an Azure DNS Non-public Resolver.
For extra background studying on community and DNS specifics for Azure, take a look at our documentation for the assorted information providers: DataFlow, Knowledge Engineering, Knowledge Warehouse, and Machine Studying. We’re additionally blissful to debate your particular wants; in that case please attain out to your Cloudera account supervisor or get in contact.