Successfully Navigating the Complex State of Network Monitoring
By Robert Walden, CIO, Epsilon
The primary purpose of any network monitoring solution is to detect and prevent issues before they become service-impacting events. The secondary (and less desirable) purpose is to detect issues that have caused a service impacting event, so issues can be addressed as quickly as possible. A tertiary, yet key purpose of these monitoring activities is the output information that is used for making constant improvements to network environments, developing trends, ensuring compliance, and forecasting for future capacity. While this sounds simple enough, there are few topics in the realm of information technology as complex, rapidly evolving, and fundamentally important as network management and monitoring.
Every IT manager reading this likely has a network monitoring solution (or solutions) of some kind, running in environments that they must ensure operate at an expected level of service. At whatever level of maturity or sophistication those monitoring solutions are functioning, they are probably being assessed and analyzed to determine how they can be improved. The problem is, the landscape and expectations change faster than most organizations can keep up with and managing and monitoring networks will only continue to get more complex. Ultimately, the most important goal of network monitoring is to provide our clients, customers, and stakeholders with the highest quality, most secure services possible in increasingly complex environments. Of course, this all has to be done while maximizing investments. Addressing the following four challenge areas will help you build the foundation to meet that goal now and into the future.
“Ultimately, the most important goal of network monitoring is to provide our clients, customers, and stakeholders with the highest quality, most secure services possible in increasingly complex environments ”
As basic as it seems, many organizations do not specifically define an integrated monitoring strategy. Often times, monitoring is assumed if a service is in production. The problem here is that those assumptions are almost always incorrect or insufficient. What is being monitored is not clear, and who is doing the monitoring is also not always clearly defined. As the shift from traditional on-premise hosting to cloud hosting continues, the necessity to formalize an integrated monitoring plan is even greater. Create written plans for how you’ll manage the monitoring environment and determine a frequency to update the plans. Your plans should articulate where you are currently, and directionally where you expect to be. Assess what tools and processes your organization employs currently, determine where you expect to be in the next two-to-three years, outline how you expect to organize and train your teams, and prioritize the steps needed to get there. It is crucial that the plan is created collaboratively with key constituents and team members.
Discussing network monitoring in a vacuum is a fruitless endeavor because there is no single definition for what network monitoring encompasses. It is no longer clear where network monitoring stops, and where infrastructure monitoring, security monitoring, and application monitoring begin. While Simple Network Management Protocol (SNMP) and ping are still important elements to a plenary monitoring solution, they are insufficient in the broader context. Adding to the confusion is that many vendors in the monitoring space offer sprawling monitoring solutions that claim to do it all, from network to security monitoring, and application to end user monitoring. Additionally, our stakeholders and customers expect their services to function as expected 100 percent of the time, and from their perspective everything is the network and the network is everything. So, while there is no single vendor solution that will do it all (realistically), from an IT perspective we must design an integrated solution and process that does. We have to attack the challenge from the top with the broader end-services that are being consumed and from the bottom by addressing break down of individual component areas. We must ignore legacy boundaries and embrace the convergence of end-to-end services and the integrated monitoring solutions that are required to effectively provide support into the future.
It is not news to anyone in technology who has a pulse that the landscape is changing rapidly. The majority of today’s networks and solutions are exponentially more complex than they were just five years ago. As we leverage more public IaaS and PaaS cloud services, deploy internal cloud environments, run hybrid solutions, continue to maintain traditional data centers, ingest rapidly increasing amounts of data from more sources, and interconnect and monitor the data networks across all of these environments, the challenges can quickly become overwhelming. The security implications alone are often difficult to digest. However, there are ways to manage the complexity. Focus on jointly defining and communicating monitoring requirements, standards and guidelines with other internal teams, stakeholders and partners. These requirements should be loose enough to allow for flexibility but clear enough to allow processes to flow seamlessly. Integrate and leverage existing network monitoring processes and tools where possible and create new processes where they don’t exist. Finally, it is necessary to create a cross-functional governance construct that provides a communication and validation conduit.
Roles & Responsibilities
As the technical lines of monitoring are blurred and complexity increases, the challenge of assigning organizational and functional ownership of monitoring a service area (and the related roles and responsibilities definition) grows as well. Traditional infrastructure operations and engineering team organizational and functional responsibilities are also blurring, as engineering teams take on more infrastructure-type activities via DevOps structures, especially in third-party public cloud hosting environments. Still, someone (or something) needs to define monitoring thresholds, receive monitoring notifications and establish escalation handoffs. Roles and responsibilities should be clearly defined through a detailed RACI matrix of monitoring activities, which can vary based on operational models required to meet certain solution types. Also, the types of workloads and various environments that are being monitored should be profiled, so ownership is clear. Lastly, clearly communicate the agreed upon assignment of responsibilities to all involved.
IT organizations are under more pressure than ever to deliver higher quality services at lower cost, in increasingly complex and distributed environments. In today’s vastly different universe, our stakeholders, our customers and our shareholders expect that we are making the right decisions and investments to position our organizations for growth and success. Addressing the four challenges outlined above will help ensure you’ve accounted for the evolution of today’s technology organization and established the foundation for making your clients, and network monitoring solutions, successful.