AUTOMATED NETWORK TROUBLESHOOTING
HOME > AUTOMATED NETWORK TROUBLESHOOTING
Introduction
In this documentation we outline the implementation of a troubleshooting automation project for managing a Linux-based network infrastructure. The project aimed to streamline the troubleshooting of escalated network events using a Python script that opens relevant web pages and dashboards, pre-configured with specific time filters, saving engineers' time and ensuring comprehensive investigation during on-call rotations.
Objectives
The objectives of the project are to:
-
Simplify the process of accessing multiple web pages and dashboards during troubleshooting escalations
-
Automate time filtering to reduce manual effort and ensure accurate investigation
-
Improve efficiency and accuracy in handling and resolving network escalations
Approach and Implementation
The project utilizes Python scripting to develop a CLI tool. This tool is designed to be used while troubleshooting a high severity network event, it takes command line arguments that allow an engineer to specify data center or site, event start and end time, and ticket ID. The script then performs the following actions:
-
Argument Parsing: Utilizing the Python "argparse" library, the script parses command line arguments to retrieve the required information, such as data center, time range, and ticket ID.
-
Time Handling: Using the "time" and "datetime" Python libraries, the script validates and processes the time inputs provided by the user. It ensures that the entered start and end times match valid time formats, converting them into datetime objects for further use.
-
Link Generation: The script generates the necessary links for various web pages and dashboards. By concatenating the base URL strings with the provided arguments, such as data center and time range, the script creates fully functional links that make API calls to specific resources within the tools.
-
Array Data Structure: The script organizes the generated links into an array data structure, allowing easy iteration and execution.
-
Opening the Browser: Utilizing the "subprocess" library, the script opens the web browser with the generated links. By invoking the browser subprocess and passing the array of links as arguments, all the required web pages and dashboards are automatically opened for investigation.
Results and Benefits
The implementation of the troubleshooting automation project yielded the following results and benefits:
-
Improved Efficiency: The automated process reduces the time and effort required to access multiple web pages and dashboards, eliminating the need for manual navigation and repetitive searches.
-
Comprehensive Investigation: By pre-configuring the time filters within the links, engineers ensure a focused investigation within the specified time range, minimizing the chances of missing critical information.
-
Time Savings: The automation significantly reduces the time spent on accessing and navigating multiple tools, allowing engineers to focus more on analyzing and resolving network escalations promptly.
-
Consistency and Accuracy: The predefined links guarantee consistent access to the required resources, ensuring that engineers investigate the same data and metrics consistently during different troubleshooting scenarios.
-
Enhanced Collaboration: The streamlined process enables engineers to share investigation links and collaborate more effectively during on-call rotations, improving overall team productivity.
Conclusion
​The troubleshooting automation project successfully addressed the challenge of accessing multiple web pages and dashboards during network escalations. By developing a Python script that generates and opens the necessary links with pre-configured time filters, we were able to improve efficiency, accuracy, and collaboration within the network operations team. The automation saved valuable time, enables comprehensive investigation, and enhances the overall troubleshooting experience for engineers.