
Finding the Needle in the Cost Haystack: Anomaly Detection with BQML
Cloud cost anomalies can be elusive. Unexpected spikes in usage, misconfigurations, and billing changes often blend into the noise of normal spending patterns. Detecting these anomalies before they turn into expensive surprises is a challenge for Cloud Cost Management (CCM).
At Harness, we leverage Google BigQuery ML (BQML) to automatically detect cloud cost anomalies using time-series forecasting models. Unlike traditional anomaly detection methods that require external ML pipelines, BQML enables in-database machine learning, allowing us to run anomaly detection directly on cloud cost data stored in BigQuery.
Why Use BQML for Cost Anomaly Detection?
Traditional anomaly detection approaches often require complex data movement to ML platforms like Prophet, ARIMA, or SARIMA. With BQML, we simplify this process by running machine learning models natively in Google BigQuery, eliminating unnecessary data transfer overhead. We currently use Prophet for anomaly detection. However, BQML eliminates the formatting overhead required by Python-based ML frameworks like Prophet.
Additionally, BQML integrates seamlessly within BigQuery, eliminating the need for external setup, dependencies, or infrastructure — a requirement when using Prophet. This makes BQML a more efficient and scalable choice for in-database anomaly detection.
Key Advantages of BQML for Anomaly Detection
- SQL-Based Machine Learning — Train and deploy ML models using standard SQL queries
- No Data Movement — Analyze cost anomalies directly in BigQuery
- Scalable for Large Datasets — Optimized for handling millions of cost records efficiently
- Automated Forecasting & Detection — Supports scheduled model retraining for continuous monitoring

BQML workflow
How BQML Detects Cost Anomalies
1. Preparing Cloud Cost Data in BigQuery
At Harness Cloud Cost Management, we have built a comprehensive cost tracking infrastructure that captures daily cloud spending across AWS, GCP, and Azure at the resource level. Our system aggregates expenses by cloud provider, account, and specific services, creating a unified view of your entire cloud footprint. This granular approach not only powers our anomaly detection engine but also enables deeper spending analysis. By consolidating usage data at the resource level, we’ve streamlined cost monitoring, making it easier to spot unusual patterns and take proactive steps to optimize your cloud investments.
2. Training the ARIMA_PLUS Model
We use BQML’s ARIMA_PLUS model, which is specifically designed for time-series forecasting and anomaly detection. After evaluating multiple BQML model options, we found ARIMA_PLUS to be the most effective for cloud cost anomaly detection.
Why is ARIMA_PLUS is the right model for Daily Cloud Cost Data?
ARIMA_PLUS excels at handling the unique characteristics of cloud cost data:
- Seasonal Patterns — Cloud costs often follow regular patterns (daily, weekly, monthly billing cycles) that ARIMA_PLUS can automatically detect and incorporate into its predictions.
- Trend Components — Many cloud services show gradual increases or decreases in cost over time as usage patterns evolve. ARIMA_PLUS captures these trends effectively.
- Irregular Spikes — The model can distinguish between expected variations (like monthly billing) and truly anomalous cost events.
- Automated Parameter Selection — ARIMA_PLUS automatically determines the optimal parameters for your specific cost patterns, reducing the need for manual tuning.
Comparison with Other BQML Time Series Models

For our cloud cost anomaly detection needs, ARIMA_PLUS with a data size of 16 months, provides the best balance of accuracy, automation, and interpretability.
Understanding ARIMA_PLUS and max_order in BQML
ARIMA Model Components in ARIMA_PLUS
BQML’s ARIMA_PLUS model applies a mix of autoregression (AR), differencing (I), and moving averages (MA) to detect anomalies in time-series data. The max_order parameter plays a crucial role in controlling the model's complexity.
When you set max_order = 2, you're defining an upper limit on ARIMA's p, d, and q parameters to keep the model efficient while still capturing key cost patterns.
Breaking Down ARIMA Parameters

How max_order = 2 Affects ARIMA_PLUS in BQML
Setting max_order = 2 allows only certain combinations of ARIMA models:
- p, d, q values are restricted to {0, 1, or 2}
- Reduces overfitting risks by limiting model complexity
- Ensures the model generalizes well to new cost data trends
Example Configurations with max_order = 2
- ARIMA(1,1,2) with Seasonality = 12 (Good for monthly billing cycles)
- ARIMA(2,0,1) with Seasonality = 4 (Works for quarterly financial reporting)
Eliminating False Positives with Seasonality Handling
One of the biggest challenges in anomaly detection is false positives — cases where the model flags expected cost fluctuations as anomalies.
Example: Monthly Billing Spikes
Consider a cloud service that is billed at the start of every month.
- Without seasonality detection, the model might incorrectly flag this expected spike as an anomaly every month.
- With seasonality detection enabled in ARIMA_PLUS, the model learns the recurring billing pattern and distinguishes normal fluctuations from true anomalies.
This ensures that regular monthly charges are recognized as expected behavior, reducing false positives and improving anomaly detection accuracy.
How Long Does It Take to Learn Normal Spikes?
BQML’s ARIMA_PLUS model typically requires 2–3 full seasonal cycles (e.g., 2–3 months for monthly patterns) to accurately distinguish normal vs. abnormal cost fluctuations. For instance, with a service billed at the start of every month, the model needs to observe this pattern for about 2–3 months before it can reliably identify it as expected behavior rather than an anomaly.
3. Detecting Anomalies in Cost Data
Once the model is trained, we use ML.DETECT_ANOMALIES to identify suspicious cost spikes.
SELECT *
FROM ML.DETECT_ANOMALIES(
MODEL `ccm-play.BillingReport.cost_anomaly_model`
)
WHERE is_anomaly = TRUE;Detected anomalies are processed through customer-configured Anomaly Preferences to minimize duplication and ensure accurate tracking.
Anomaly Preferences
A newly detected anomaly will only surface if no other anomaly has occurred in the last N days. If an anomaly exists within the last N days, the system applies customer-defined thresholds to determine whether the new anomaly is distinct:
- % Change: Must exceed X%.
- Absolute $ Increase: Must be at least $Y.
If these thresholds are met, the anomaly is logged as a new anomaly. If the thresholds are not met, the system updates the duration of the existing anomaly instead of creating a duplicate.
4. Forecasting the Cost
We use ML.FORECAST on the model to predict costs for a given day:
SELECT forecast_timestamp, forecast_value, prediction_interval_lower_bound, prediction_interval_upper_bound
FROM ML.FORECAST (MODEL `project.dataset.anomaly_model`,
STRUCT(15 AS horizon, 0.98 AS confidence_level))- horizon = 15 → Predicts values 15 days into the future.
- confidence_level = 0.98 → Sets a 98% confidence interval, meaning predictions are expected to fall within this range 98% of the time.
5. Retraining
BQML does not support incremental training. Instead, we retrain the model every Sunday using the last 487 days of data, which ensures it stays updated with the latest daily cost trends.
Performance & Real-World Results
Training Time vs Data Size
We tested the worst-case scenario with a 35GB dataset of cost data to evaluate model training performance:
- 30-day model → ~2 minutes to train
- 487-day model → ~17 minutes, but significantly improves anomaly detection accuracy
On average, training took only a few minutes for normal-sized cost data.
Detected Anomalies
In production, our model flagged anomalies accurately in the cost spikes of many resources.

Cost of Creating and Training the Model
BQML on-demand pricing is $312.50 per TiB. In the worst-case scenario, we train the model using 16 months of data, where the dataset size is 35.39 GB.
Cost Calculation: 35390000000 * 312.5 / 1099511627776 = ~$10.80
BQML training costs are automatically labeled within Google Cloud Billing, allowing for cost tracking and analysis. These labels help identify and attribute expenses associated with BQML model training within your Google Cloud project.
BQML Cost Tagging Details: The cost associated with BQML is tagged with the following predefined labels:
- Key: bigquery.googleapis.com/bqml
- Value: bqml_arima_plus_training
Conclusion: Smarter Cost Monitoring with BQML
With BQML’s ARIMA_PLUS, we can efficiently detect cost anomalies while minimizing false positives.
- Seasonality handling improves accuracy for cyclical cost patterns
- max_order = 2 balances model complexity and performance
- Fully automated anomaly detection with BigQuery SQL
Next Steps
We are continuously improving our anomaly detection pipeline by:
- Comparing BQML vs Prophet models under a feature flag
- Enhancing detection for Kubernetes cost anomalies using BQML
- Detecting anomalies in real time as cost data is ingested

