True MLOps on GCP Is Mostly Not Vertex AI
- Production MLOps on GCP rarely relies solely on Vertex AI; teams use core GCP services like Cloud Composer, Cloud Build, and BigQuery for robust workflows.
- Vertex AI excels at managed training, pipelines, model registry, endpoints, and evaluations but lacks full CI/CD, governance, security, and cost control.
- Real stack includes: Cloud Build (CI/CD), Artifact Registry (images), Terraform (IaC), Secret Manager, Cloud Monitoring/Logging, BigQuery (metadata/drift).
- Architecture layers: Source control → CI/CD → Data (BigQuery/GCS) → Vertex AI execution → Deployment (Cloud Run/GKE) → Observability.
- Reasons to avoid heavy Vertex AI: vendor lock-in, cost opacity, limited flexibility for custom auth, traffic control, multi-cloud needs.
- Alternatives: Cloud Composer (orchestration), Compute Engine/Cloud Batch (training), GCS+MLflow (registry), Cloud Run (serving) for portability and efficiency.
