Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

All this author’s posts

Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

Cloud cost anomalies can be elusive. Unexpected spikes in usage, misconfigurations, and billing changes often blend into the noise of normal spending patterns. Detecting these anomalies before they turn into expensive surprises is a challenge for Cloud Cost Management (CCM).

At Harness, we leverage Google BigQuery ML (BQML) to automatically detect cloud cost anomalies using time-series forecasting models. Unlike traditional anomaly detection methods that require external ML pipelines, BQML enables in-database machine learning, allowing us to run anomaly detection directly on cloud cost data stored in BigQuery.

Why Use BQML for Cost Anomaly Detection?

Traditional anomaly detection approaches often require complex data movement to ML platforms like Prophet, ARIMA, or SARIMA. With BQML, we simplify this process by running machine learning models natively in Google BigQuery, eliminating unnecessary data transfer overhead. We currently use Prophet for anomaly detection. However, BQML eliminates the formatting overhead required by Python-based ML frameworks like Prophet.

Additionally, BQML integrates seamlessly within BigQuery, eliminating the need for external setup, dependencies, or infrastructure — a requirement when using Prophet. This makes BQML a more efficient and scalable choice for in-database anomaly detection.

Key Advantages of BQML for Anomaly Detection

SQL-Based Machine Learning — Train and deploy ML models using standard SQL queries
No Data Movement — Analyze cost anomalies directly in BigQuery
Scalable for Large Datasets — Optimized for handling millions of cost records efficiently
Automated Forecasting & Detection — Supports scheduled model retraining for continuous monitoring

BQML workflow

How BQML Detects Cost Anomalies

1. Preparing Cloud Cost Data in BigQuery

At Harness Cloud Cost Management, we have built a comprehensive cost tracking infrastructure that captures daily cloud spending across AWS, GCP, and Azure at the resource level. Our system aggregates expenses by cloud provider, account, and specific services, creating a unified view of your entire cloud footprint. This granular approach not only powers our anomaly detection engine but also enables deeper spending analysis. By consolidating usage data at the resource level, we’ve streamlined cost monitoring, making it easier to spot unusual patterns and take proactive steps to optimize your cloud investments.

2. Training the ARIMA_PLUS Model

We use BQML’s ARIMA_PLUS model, which is specifically designed for time-series forecasting and anomaly detection. After evaluating multiple BQML model options, we found ARIMA_PLUS to be the most effective for cloud cost anomaly detection.

Why is ARIMA_PLUS is the right model for Daily Cloud Cost Data?

ARIMA_PLUS excels at handling the unique characteristics of cloud cost data:

Seasonal Patterns — Cloud costs often follow regular patterns (daily, weekly, monthly billing cycles) that ARIMA_PLUS can automatically detect and incorporate into its predictions.
Trend Components — Many cloud services show gradual increases or decreases in cost over time as usage patterns evolve. ARIMA_PLUS captures these trends effectively.
Irregular Spikes — The model can distinguish between expected variations (like monthly billing) and truly anomalous cost events.
Automated Parameter Selection — ARIMA_PLUS automatically determines the optimal parameters for your specific cost patterns, reducing the need for manual tuning.

Comparison with Other BQML Time Series Models

For our cloud cost anomaly detection needs, ARIMA_PLUS with a data size of 16 months, provides the best balance of accuracy, automation, and interpretability.

Understanding ARIMA_PLUS and max_order in BQML

ARIMA Model Components in ARIMA_PLUS

BQML’s ARIMA_PLUS model applies a mix of autoregression (AR), differencing (I), and moving averages (MA) to detect anomalies in time-series data. The max_order parameter plays a crucial role in controlling the model's complexity.

When you set max_order = 2, you're defining an upper limit on ARIMA's p, d, and q parameters to keep the model efficient while still capturing key cost patterns.

Breaking Down ARIMA Parameters

How max_order = 2 Affects ARIMA_PLUS in BQML

Setting max_order = 2 allows only certain combinations of ARIMA models:

p, d, q values are restricted to {0, 1, or 2}
Reduces overfitting risks by limiting model complexity
Ensures the model generalizes well to new cost data trends

Example Configurations with max_order = 2

ARIMA(1,1,2) with Seasonality = 12 (Good for monthly billing cycles)
ARIMA(2,0,1) with Seasonality = 4 (Works for quarterly financial reporting)

Eliminating False Positives with Seasonality Handling

One of the biggest challenges in anomaly detection is false positives — cases where the model flags expected cost fluctuations as anomalies.

Example: Monthly Billing Spikes

Consider a cloud service that is billed at the start of every month.

Without seasonality detection, the model might incorrectly flag this expected spike as an anomaly every month.
With seasonality detection enabled in ARIMA_PLUS, the model learns the recurring billing pattern and distinguishes normal fluctuations from true anomalies.

This ensures that regular monthly charges are recognized as expected behavior, reducing false positives and improving anomaly detection accuracy.

How Long Does It Take to Learn Normal Spikes?

BQML’s ARIMA_PLUS model typically requires 2–3 full seasonal cycles (e.g., 2–3 months for monthly patterns) to accurately distinguish normal vs. abnormal cost fluctuations. For instance, with a service billed at the start of every month, the model needs to observe this pattern for about 2–3 months before it can reliably identify it as expected behavior rather than an anomaly.

3. Detecting Anomalies in Cost Data

Once the model is trained, we use ML.DETECT_ANOMALIES to identify suspicious cost spikes.

SELECT *
FROM ML.DETECT_ANOMALIES(
  MODEL `ccm-play.BillingReport.cost_anomaly_model`
)
WHERE is_anomaly = TRUE;

Detected anomalies are processed through customer-configured Anomaly Preferences to minimize duplication and ensure accurate tracking.

Anomaly Preferences

A newly detected anomaly will only surface if no other anomaly has occurred in the last N days. If an anomaly exists within the last N days, the system applies customer-defined thresholds to determine whether the new anomaly is distinct:

% Change: Must exceed X%.
Absolute $ Increase: Must be at least $Y.

If these thresholds are met, the anomaly is logged as a new anomaly. If the thresholds are not met, the system updates the duration of the existing anomaly instead of creating a duplicate.

4. Forecasting the Cost

We use ML.FORECAST on the model to predict costs for a given day:

SELECT forecast_timestamp, forecast_value, prediction_interval_lower_bound, prediction_interval_upper_bound 
FROM ML.FORECAST (MODEL `project.dataset.anomaly_model`, 
STRUCT(15 AS horizon, 0.98 AS confidence_level))

horizon = 15 → Predicts values 15 days into the future.
confidence_level = 0.98 → Sets a 98% confidence interval, meaning predictions are expected to fall within this range 98% of the time.

5. Retraining

BQML does not support incremental training. Instead, we retrain the model every Sunday using the last 487 days of data, which ensures it stays updated with the latest daily cost trends.

Performance & Real-World Results

Training Time vs Data Size

We tested the worst-case scenario with a 35GB dataset of cost data to evaluate model training performance:

30-day model → ~2 minutes to train
487-day model → ~17 minutes, but significantly improves anomaly detection accuracy

On average, training took only a few minutes for normal-sized cost data.

Detected Anomalies

In production, our model flagged anomalies accurately in the cost spikes of many resources.

Cost of Creating and Training the Model

BQML on-demand pricing is $312.50 per TiB. In the worst-case scenario, we train the model using 16 months of data, where the dataset size is 35.39 GB.

Cost Calculation: 35390000000 * 312.5 / 1099511627776 = ~$10.80

BQML training costs are automatically labeled within Google Cloud Billing, allowing for cost tracking and analysis. These labels help identify and attribute expenses associated with BQML model training within your Google Cloud project.

BQML Cost Tagging Details: The cost associated with BQML is tagged with the following predefined labels:

Key: bigquery.googleapis.com/bqml
Value: bqml_arima_plus_training

Conclusion: Smarter Cost Monitoring with BQML

With BQML’s ARIMA_PLUS, we can efficiently detect cost anomalies while minimizing false positives.

Seasonality handling improves accuracy for cyclical cost patterns
max_order = 2 balances model complexity and performance
Fully automated anomaly detection with BigQuery SQL

Next Steps

We are continuously improving our anomaly detection pipeline by:

Comparing BQML vs Prophet models under a feature flag
Enhancing detection for Kubernetes cost anomalies using BQML
Detecting anomalies in real time as cost data is ingested

Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

Why Use BQML for Cost Anomaly Detection?

Key Advantages of BQML for Anomaly Detection

How BQML Detects Cost Anomalies

1. Preparing Cloud Cost Data in BigQuery

2. Training the ARIMA_PLUS Model

Why is ARIMA_PLUS is the right model for Daily Cloud Cost Data?

Comparison with Other BQML Time Series Models

Understanding ARIMA_PLUS and max_order in BQML

How max_order = 2 Affects ARIMA_PLUS in BQML

Eliminating False Positives with Seasonality Handling

Example: Monthly Billing Spikes

How Long Does It Take to Learn Normal Spikes?

3. Detecting Anomalies in Cost Data

Anomaly Preferences

4. Forecasting the Cost

5. Retraining

Performance & Real-World Results

Training Time vs Data Size

Detected Anomalies

Cost of Creating and Training the Model

Conclusion: Smarter Cost Monitoring with BQML

Next Steps

Further Reading

Cloudopoly: Master Cloud Spend to Achieve Strategy, Savings, and Scale

Similar Blogs

2025

Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

Finding the Needle in the Cost Haystack: Anomaly Detection with BQML

Why Use BQML for Cost Anomaly Detection?

Key Advantages of BQML for Anomaly Detection

How BQML Detects Cost Anomalies

1. Preparing Cloud Cost Data in BigQuery

2. Training the ARIMA_PLUS Model

Why is ARIMA_PLUS is the right model for Daily Cloud Cost Data?

Comparison with Other BQML Time Series Models

Understanding ARIMA_PLUS and max_order in BQML

How max_order = 2 Affects ARIMA_PLUS in BQML

Eliminating False Positives with Seasonality Handling

Example: Monthly Billing Spikes

How Long Does It Take to Learn Normal Spikes?

3. Detecting Anomalies in Cost Data

Anomaly Preferences

4. Forecasting the Cost

5. Retraining

Performance & Real-World Results

Training Time vs Data Size

Detected Anomalies

Cost of Creating and Training the Model

Conclusion: Smarter Cost Monitoring with BQML

Next Steps

Further Reading

Cloudopoly: Master Cloud Spend to Achieve Strategy, Savings, and Scale

Similar Blogs

the State of

AI in Software Engineering

2025