Monitoring is an important aspect of any application and infrastructure. With proper monitoring you get visibility of all services and being able to act promptly when needed. In Linux, Cloud and microservices space, Grafana and Prometheus are among the top chosen tools that delivers the ultimate monitoring setup.
In this short tutorial we will look at the best Prometheus and Grafana monitoring books you can buy to help you master the components. The books listed have not been ranked in any particular order and feel free to check other user’s comments and reviews if you find it hard to choose one to purchase.
1. Prometheus: Up & Running: Infrastructure and Application Performance Monitoring 1st Edition
This practical guide provides application developers, sysadmins, and DevOps practitioners with a hands-on introduction to the most important aspects of Prometheus, including dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.
In this book you’ll be able to:
- Know where and how much to apply instrumentation to your application code
- Identify metrics with labels using unique key-value pairs
- Get an introduction to Grafana, a popular tool for building dashboards
- Learn how to use the Node Exporter to monitor your infrastructure
- Use service discovery to provide different views of your machines and services
- Use Prometheus with Kubernetes and examine exporters you can use with containers
- Convert data from other monitoring systems into the Prometheus format
2. Hands-On Infrastructure Monitoring with Prometheus
This book covers the fundamental concepts of monitoring and explores Prometheus architecture, its data model, and how metric aggregation works. Multiple test environments are included to help explore different configuration scenarios, such as the use of various exporters and integrations.
By the end of this book, you’ll be able to implement and scale Prometheus as a full monitoring system on-premises, in cloud environments, in standalone instances, or using container orchestration with Kubernetes.
What you will learn
- Grasp monitoring fundamentals and implement them using Prometheus
- Discover how to extract metrics from common infrastructure services
- Find out how to take full advantage of PromQL
- Design a highly available, resilient, and scalable Prometheus stack
- Explore the power of Kubernetes Prometheus Operator
- Understand concepts such as federation and cross-shard aggregation
- Unlock seamless global views and long-term retention in cloud-native apps with Thanos
3. Monitoring Microservices and Containerized Applications
This book is a good starting point for developers, architects, and administrators who want to learn about monitoring and management of cloud native and microservices containerized applications.
In this book you’ll:
- Examine the fundamentals of container monitoring
- Get an overview of the architecture for Prometheus and Alert Manager
- Enable Prometheus monitoring for containers
- Monitor containers using Wavefront
- Use the guidelines on container monitoring with enterprise solutions AppDynamics and Wavefront
4. Practical Monitoring: Effective Strategies for the Real World
Practical Monitoring has a unique vendor-neutral approach to monitoring. Rather than discuss how to implement specific tools, Mike teaches the principles and underlying mechanics behind monitoring so you can implement the lessons in any tool.
Practical Monitoring covers such topics as:
- Monitoring antipatterns
- Principles of monitoring design
- How to build an effective on-call rotation
- Getting metrics and logs out of your application
5. Learn Grafana 7.0: A beginner’s guide
In this book you’ll learn:
- Find out how to visualize data using Grafana
- Understand how to work with the major components of the Graph panel
- Explore mixed data sources, query inspector, and time interval settings
- Discover advanced dashboard features such as annotations, templating with variables, dashboard linking, and dashboard sharing techniques
- Connect user authentication to Google, GitHub, and a variety of external services
- Find out how Grafana can provide monitoring support for cloud service infrastructures
6. Site Reliability Engineering: How Google Runs Production Systems
In this book the key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world.
You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.
This book is divided into the following four sections:
- Introduction – Learn what site reliability engineering is and why it differs from conventional IT industry practices
- Principles – Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
- Practices – Understand the theory and practice of an SRE’s day to day work: building and operating large distributed computing systems
- Management – Explore Google’s best practices for training, communication, and meetings that your organization can use.
7. The Site Reliability Workbook: Practical Ways to Implement SRE
This new workbook provides the practical examples from Google’s experiences and case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
You’ll learn:
- How to run reliable services in environments you don’t completely control—like cloud
- Practical applications of how to create, monitor, and run your services via Service Level Objectives
- How to convert existing ops teams to SRE—including how to dig out of operational overload
- Methods for starting SRE from either greenfield or brownfield
We hope you find a book that helps you monitor your Applications and implement better design patterns to guarantee service high availability for business continuity and growth.
Here are some guides you might be interested in: