Datadog Cost Reduction Efforts
When we reviewed Zaxby’s Datadog usage, we found there was a significant opportunity to reduce costs without compromising visibility or operational reliability. Our goal was straightforward: optimize spend while maintaining confidence in monitoring and observability. After analyzing usage patterns and high-cost areas, we successfully reduced Datadog spend by approximately 50%.
Key outcomes:
- Reduced indexed log volumes while preserving critical logs for troubleshooting and compliance
- Optimized RUM retention policies to maintain visibility into essential user behavior
- Reassessed serverless monitoring to retain valuable insights while eliminating unnecessary invocations
- Achieved a 50% overall reduction in Datadog spend
Indexed Logs
Log indexing was the largest contributor to overall spend. As we reviewed pipelines and indexes, we discovered that a significant portion of logs were not adding meaningful value. By refining which logs were indexed and improving tagging for searchability, we were able to preserve the logs that truly mattered while reducing unnecessary volume.
Key actions included:
- Auditing current log indexes and pipelines to see what was being excluded (if anything)
- Analyzing high-volume logs by service to understand major contributors
- Creating and refining log exclusion filters to ensure only essential logs were indexed
- Updating pipelines for improved tagging and searchability, attaching services and environments to specific logs
- Using Metric Explorer to track trends and adjust filters and index configurations to balance observability with cost efficiency
This process clarified which datasets were genuinely valuable for operations and alerting.
RUM Retention
Real User Monitoring (RUM) provided valuable insights, we realized that we were collecting and analyzing every session. After reviewing retention policies across applications, we adjusted settings based on usage frequency and business-criticality.
Steps we took:
- Reviewed all RUM applications and their retention periods
- Adjusted retention policies based on usage frequency and the criticality of monitored experiences
- Validated dashboards and alerts to ensure no key insights were lost during optimization
The dashboards and alerts continued to function as expected, while the storage footprint was significantly reduced. The result was a leaner, more purposeful collection of RUM data that maintained full visibility into user behavior.
Serverless Invocation Audit
Finally, we reviewed serverless monitoring. A few AWS Lambda functions were generating millions of invocations weekly, and we were logging every call. By filtering non-critical invocations, we maintained the integrity of key metrics without any impact on performance.
This adjustment provided a clearer view of Lambda activity and demonstrated how targeted changes can achieve meaningful cost reductions without compromising observability.
Conclusion
Through this cost optimization initiative, Arbory Digital successfully reduced Datadog expenses for Zaxby’s by 50%, all while maintaining visibility and operational reliability. By refining log indexing, adjusting RUM retention, and auditing serverless invocations, we established a sustainable system that balances cost and functionality.
While significant progress has been made, we continue to monitor and refine usage patterns to identify additional efficiencies and ensure that Zaxby’s maintains an optimal observability model.
For more insights, see our other cost-saving efforts in Zaxby’s Customer Spotlight – Arbory Digital.
Podcast Speakers
Like what you heard? Have questions about what’s right for you? We’d love to talk! Contact Us