
HASHMAP SOLUTION PROFILE
NATIONAL PETROLEUM & EXPLORATION COMPANY
Migrating Hadoop to GCP BigQuery to Lower Costs and Provide Self-Serve Data Consumption
CHALLENGE
-
Migrate Hadoop based data warehouse to company standard DW platform of GCP and BigQuery
-
Replicate the existing Hive environment to the cloud in GCP/BigQuery while keeping the data pipelines sustainable and minimizing rework
-
Minimize the use of cloud-vendor specific tooling (be cloud-native)
-
Provide end-user self-service consumption via Spotfire hosted in GCP
-
Lower overall costs for the data platform and services
-
Leverage in house GCP and BigQuery skills
APPROACH
-
Developed a Kubernetes (GKE) focused solution using
-
Hashmap custom kubernetes operator for data ingestion
-
Kubernetes hosted dbt (data build tool) job execution to replace legacy Spark ETL transformations
-
-
Migrated on-premises Spotfire to GCP as IaaS (GCE)
-
Infrastructure managed using Terraform (IaC)
-
Use of Gitlab for source control and CI/CD
-
Mentoring and coaching during migration, featuring hands-on demos and quick micro POCs across a variety of dimensions showcasing GCP and BigQuery
-
Complete documentation
OUTCOME
-
Utilized modern compute solution designs, DataOps principles and practices, and hybridization of legacy Hadoop tooling (where necessary) and modern computing solutions
-
Provided an end-to-end solution using project-level trust boundary resource segregation, automated testing, and deployment, infrastructure managed by Terraform IaC, utilization of GCP managed compute resources (BigQuery and GKE) for data warehousing and data movement + transformation using dbt, data storage on GCS, and hosting of 3rd party enterprise BI tooling (Spotfire).
-
Stackdriver was used for all of the monitoring and alerting needs and CloudSQL for all metadata management.
SOLUTION
-
Google BigQuery
-
Google Kubernetes Engine
-
Google Compute Engine
-
Google Cloud Storage
-
Stackdriver
-
CloudSQL
-
Google Container Registry
-
dbt
-
Docker
-
Spotfire
-
Terraform
-
Gitlab
-
Hashmap Consulting Services: Assessment, Design, and Architecture Enablement Services, Cloud Modernization and Migration Services, Data Engineering and DataOps Enablement