Monitoring and Observability with AWS ElastiCache
- Patrick Freeman
- Nov 7, 2023
- 1 min read
Updated: Nov 8, 2023
Amazon ElastiCache is a managed in-memory data store and cache service provided by AWS. We want to build alerting around our AWS-EKS cluster which is using ElastiCache & Redis. Day1Ops is proactive with this approach because when an outage occurs we want to be notified right way so we can troubleshoot what went wrong with Redis and try to resolve it as soon as possible.
Solution
The approach we architected for this particular client will send an alert when urgent issues have been detected, or when resource limits are about to be reached.
Alerts & Notifications
It’s necessary to call attention to our metrics even when the system is performing adequately. We are looking to implement the following types of alerts to help mitigate our risk in terms of downtime. The best way to diagnose these issues is to set a priority on the kinds of alerts we are receiving from Grafana:
soft-alerts (low to moderate severity): posting a message in your messaging application such as slack or gchat will give this message high visibility, but won't disrupt anyone in the middle of the work day. It will let us know for example, if we're about to run out of disk space or anything memory or CPU usage related. This is going to give us time to diagnose the situation before something more sever occurs.
high-alerts (high severity): This will also be posted as a notification in the same chat room, this particular alert will require immediate attention. Recipients of this alert, will prioritize this alert and share findings. As well as, solve the issue and validate that things are functioning as designed.
Comentarios