Arpeely is a data science and AI prediction beast. As a top Google AdX EMEA advertiser and Microsoft partner, we handle over 1 million real-time bidding (RTB) requests every second
with sub-100ms response times. If you’re not from this industry — those are millions S2S requests that each needs lots of AI\ML to run on quickly.
Managing cloud costs at this scale is challenging, and optimizing our GCP spend was a major priority. To maintain efficiency without compromising performance, we optimized resource commitments, storage, machine types and HPA configuration — cutting our GCP bill by $102K monthly. Here’s how we did it:
1. Committed Use Discounts
While most of our services are suitable for running on spot instances, we still had some services using on-demand instances because of their high availability requirements. In order to achieve savings on these machines as well - we made commitments for these services, ensuring stable and predictable costs.This covered core compute resources like CPU, memory, and SSD storage. Additionally, we committed to Cloud SQL and Memorystore (GCP’s managed Redis) to ensure cost efficiency for these essential services. By leveraging committed use discounts (CUDs) for these critical components, we achieved a nearly 60% discount while maintaining reliability and performance.
We followed two key rules of thumb:
1. Optimize first, commit later — Before making commitments, we focused on reducing our actual resource needs. This included optimizing Kubernetes pod requests & limits and adjusting node sizes to avoid committing to excess capacity.
2. Commit no more than 70% — To maintain flexibility and account for future fluctuations, we ensured that our commitments never exceeded 70% of our expected usage.



2. Optimizing Storage Efficiency — BigQuery Archiving
We maximized flexibility for data scientists and production deployment speed; our data is stored in BQ tables. This works great, until you receive the invoice for the table storage. As a data-driven company, we quickly delete PII data, but ad performance data accumulates over time, clogging the BQ storage.. Instead, we built an internal service that automatically archives BigQuery tables older than a configurable threshold, moving them to GCS archive storage. The cost difference is significant — storing 1GB in BigQuery costs ~$0.02 per month, while in GCS Archive, it’s only $0.0012, making it over 16x cheaper. This saved us around $15,000 per month while preserving data accessibility.
The service is fully configurable via our private developer portal, allowing data science teams and compliance managers to adjust retention policies and automate archiving without manual intervention. In case of a disaster, we made it easy to extract needed data back from GCS, ensuring quick recovery. More details on exporting BigQuery data can be found here.


3. HPA — The Cost-Effective Way to Scale
In dynamic large scale systems with fluctuating workloads — such as those found in HFT (High-Frequency Trading)— applications must be capable of scaling dynamically by 10x or even 20x within seconds. However, scaling as a response to these sudden unpredictable spikes can prevent the application from properly warming up, leading to instability and efficiency. For a workload that slowly increases and decreases throughout the day, a designated HPA (Horizontal Pod Autoscaler) can save a lot.
Through multiple iterations, we fine-tuned our HPA to optimize scaling while maintaining stability. Key changes included:
Scale Down Optimization
Percent-based: 10% → 30%, allowing faster but controlled downsizing.
Pod-based: 5 → 10, enabling removal of more pods at once.
Period: 60s → 120s, smoothing transitions.
Scale Up Optimization
Stabilization Window: 600s, preventing overreaction to spikes.
Percent-based: 100% → 30%, avoiding sudden over-provisioning.
Pod-based: 4 → 20, handling load surges better.
Period: 15s → 120s, reducing thrashing impact
These optimizations reduced our average pod count by 30%, cutting costs and avoiding sudden scale spikes while maintaining performance and stability.
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
- value: 10
+ value: 30 # Scale down at most 30% of current replicas at once
- periodSeconds: 60
+ periodSeconds: 120
- type: Pods
- value: 5
+ value: 10 # Scale down at most 10 pods at once
- periodSeconds: 60
+ periodSeconds: 120
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 600
policies:
- type: Percent
- value: 100
+ value: 30 # Scale up at most 30% of current replicas at once
- periodSeconds: 15
+ periodSeconds: 120
- type: Pods
- value: 4
+ value: 20 # Scale up at most 20 pods at once
- periodSeconds: 15
+ periodSeconds: 120
selectPolicy: Max
While it may seem like a simple and slightly arbitrary change, we implemented around 15 adjustments, testing each over a 24-hour period until we found the optimal configuration.



Final Thoughts
Cost savings at scale require the right mix of automation, smart planning, and continuous optimizations. Any company looking to cut cloud costs without sacrificing performance can benefit from these strategies.
This was just the beginning. Our cost optimization journey ultimately led to an additional $40k monthly reduction on top of the 102k, uncovering even more ways to streamline efficiency — like smarter data retention, leveraging BQ & GCS storage types and machine types tailored to workloads.
Stay tuned for our next blog, where we’ll dive deeper into the strategies that made it happen, none of which would have been possible without Arpeely’s incredible DevOps team.

P.S. We’re Hiring 😉
Comments