Innovation and Business Driven Outcomes

Utilizing Azure HDI as the Gateway to Cloud Development
A Low-Friction Way to Build Out Cloud-Based Workflows Within the Big Data Ecosystem
by Chris Herrera

Supporting Active Data Pipelines and New Application Development


Consider the situation where you have an active, on-premise data pipeline, possibly with Kafka, HBase, and Spark. You also have development teams who are looking to leverage the data in your pipeline to create new applications that are going to expose web front-ends, mobile apps, and ML-based classification algorithms.


Your infrastructure team could stand up a Kubernetes cluster and allow the app dev teams to develop their own microservices, but then you would likely need to stand up an API gateway, a DevOps pipeline, and a monitoring solution.

Budget and Resources Are Always a Factor


By this time, you might start wondering what is all this going to cost and how am I going to on-board the necessary resources to make this happen. Not to mention, how do you make sure that you can provide the autonomy required for the app dev teams to make their technical choices without causing chaos to your overall infrastructure?


Enter Azure and HDI


In our experience, this is the lowest friction way to for an enterprise to rapidly adopt a culture of rapid innovation that does not stretch the infrastructure or ops teams to their limits, especially when the enterprise has a large legacy in Windows-based development for their app dev teams.


I’m not saying that you have to migrate, whole hog, to "the cloud" all in one go. Essentially you can stand up a Kafka cluster in HDI and treat is as Infrastructure as Code (IaC) via ARM templates utilizing the Azure Resource Manager. This allows you to define a Kafka cluster with, for example, an OMS agent preinstalled and MirrorMaker installed via a custom script extension.

Low Impact Rapid Innovation on Your Data Pipeline


Once this is up and running you are able to then take an extended view of your real time data pipeline. This allows your app dev teams to start working with AppServices, AzureML, and other fit-for-purpose solutions such as time series insights.


Utilizing the same architectural principles and several of the same systems that your ops/infrastructure teams are already using, you can safely provide a mechanism for innovation on your data pipeline, without the concern of managing additional infrastructure.


Some Additional Considerations


While this all seems great, there are a few additional items that have to be taken into consideration if you want to block your cluster from the Internet. For example, you will want a script that can either utilize az cli or PowerShell to enumerate your list of NICs and find the head node for your cluster, and then determine the internal FQDN of the head node that is running Ambari. If you are not blocking internet access (meaning exposing the cluster via the HDI gateway via the HTTPS://, then a lot of this becomes easier.


Additionally, at this moment you are giving up certain features such as a schema registry (Azure does not currently offer one), and not all services are able to be domain joined. However, if this is not an issue it is an easy way to onboard new development efforts.


Azure and Kubernetes


Azure is also adopting Kubernetes as their container orchestration of choice via AKS. While in initial preview, they are certainly aiming to remove the headache of managing the Kubernetes master, while still allowing for all the goodness that K8s offers developers and ops engineers alike.


At the time of this writing I would caution the wide adoption of AKS in the enterprise as it is not GA and missing some critical features, such as the ability to be deployed in an existing VNET (a non-starter for many large organizations).


Enabling Infrastructure Teams


Additionally, with the ability to peer on premise and Azure VNETs (via VPN or express route), and thus allowing your infrastructure teams to manage the network endpoints in much the same way as they do today, it is a natural extension of their work. This becomes a little murkier in the world of managed services where certain network security groups do need to be defined in order to allow the management of the managed service.


Ease Into a Full DevOps Pipeline


Last but certainly not least, the DevOps capabilities provided by VSTS and the CI/release manager that is built in offer a great way for developers, especially those with experience in TFS, to be on boarded into a full DevOps pipeline.


On the monitoring side, OMS does provide a suitable solution for the ops/infra/dev teams to ensure their apps/systems/clusters are still functional and operating at peak performance. The developer experience is also second to none with HDInsight Tools that are in Visual Studio, IntelliJ, and Eclipse. This allows your development team to use the environment with which they are comfortable, without sacrificing efficiency.

A Low-Friction Way to Build Out Cloud-Based Workflows within the Big Data Ecosystem


Overall, no solution will ever be perfect, but as stated, especially for enterprises that have a history of development on Microsoft platforms, especially TFS and Visual Studio, and admins with experience in Active directory and basic network engineering knowledge Azure provides a low friction way to build out your cloud-based workflows, without disrupting ongoing operations. This is especially the case if your solution also includes Big Data ecosystem components such as Kafka, Hive, HBase/Phoenix, Spark.

Need Help with HDI and DevOps?

If you’d like additional assistance in this area, we offer a range of enablement workshops and consulting service packages as part of our consulting service offerings and would be glad to work through your specifics in this area.

. . .


Feel free to share on other channels and be sure and keep up with all new content from Hashmap at

Chris Herrera is a Senior Enterprise Architect at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers. You can follow Chris on Twitter @cherrera2001 and connect with him on LinkedIn at

  • Hashmap on LinkedIn
  • Tweet Hashmap
  • Hashmap on Facebook
  • Hashmap Stories
  • Data Rebels on Tap on Spotify
  • Hashmap on YouTube
  • Hashmap on Instagram

© 2020 by Hashmap, Inc.