In my search for reasons behind the slow adoption of network automation, I was most fascinated to speak to the teams involved in building and operating networks. In essence these can be divided in three groups: the network architect- & design team, the engineering team and network operations team.
The first impression many people have is that network automation belongs to the engineering or operations team, as these are the teams that have the ongoing responsibility to make operational changes in the network, perform OS upgrade to the devices, activate new services etc.. The engineers typically step in when things get more complex or when engineering or troubleshooting skills are required. The engineers’ job is more project based; planning migrations, configuring and testing new services. etc..
Interestingly, the architects & designers are not involved in ‘running’ the network. And this is where things get interesting, as the best discussions arise when speaking to both engineers and designers. In short the discussion looks like this; the designers believe that networks can be automated as long as networks are deployed “as designed”, while engineers (and operations) argue that there are too many exceptions and specifics that needs to be taken care of in a life network. These can’t be thought of in advance and therefore networks can’t be automated. I found this most fascinating as most network designers have a strong background in engineering and are typically the most senior networking guys in the company.
I decided to take a closer look at the whole “design-build-operate” process and figure out what the real problem is. To summarize, the process typically looks like this. It starts with a network architect or designer who identifies the business requirements and produces the network design. The hardware vendor is selected network diagram and detailed design document are produced, typically a visio drawing and 20 to 200 page detailed (word) document.
These design documents are then “handed over” to a network engineer who starts building a lab network and tests the desired functionality. The outcomes of the engineering process are the so-called “golden configurations”. These are to be ‘re-used’ by other engineers and network operation people to roll out the network, new services, locations, racks etc. By ‘re-use’ I mean, taking the generic golden config and make minor changes to it or by copy-pasting missing parameters into it from other sources. These are typically manual activities, such as finding free IP-addresses from an excel sheet, deciding ‘on the spot’ what numbering for vlan’s to use, what port functions to give to specific ports, what service parameters to use etc..
Once rolled out and operational, (other?) network operations & engineering people continue to works on the deployed network, making ongoing changes in the ‘running’ network throughout the entire life cycle. Over time the running configs are not the original deployed ones and changes are made outside the original design criteria using non-standardized parameters. And as a result, the network grows into spaghetti rather remain lasagna.
The overall conclusion is that these processes, from design to operations are not aligned nor enforced, leaving too much room for individual interpretations and ways to deal with situations. Speaking to the companies who have successfully implemented network automation, this is a exactly what has been resolved. They have managed to bridge the gap between design and engineering by enforcing standardized parameters and design rules down to the operational change processes. Typically this starts with own developed scripts and matures into a commercial platform that integrates the companies’ designs, configuration templates and operational processes. With such a system in place one can generate ‘design driven configurations’ and be able to fully align the different groups and enforce changes ‘as designed’.
In the latter process, the life network becomes a representation of what has been designed, prepared and scheduled in the automation platform. The result is that changes can be done with high certainty and accuracy. I have seen up to 99% first time right changes on a large scale and even unattended. For those companies who master automation most successfully, an additional functionality was used, namely the use of ‘scenarios’. These are basically scripts that have been ‘designed’ to deal with those jobs that are not first time right or that have a higher risk factor. A scenario e.g. can be what to do when a change job is not successful; should the process stop with a notification to the operator or should the system automatically roll back to the original situation. These scripts are typically be designed by (lead) engineers.
My overall conclusion is that network automation belongs to either the management team responsible for all three disciplines or to a strong architecture & design team. I have seen several successful implementations, so it can be done! But too often people who are afraid to change their current way of working slow the process down. I have come to realize that it takes innovative professionals with in depth knowledge, persuasive capabilities and good management skills to bridge the gap between all teams, as there are many questions and objections along the way. In my next blog I’ll touch on some of these questions when I talk about myth #4: tools will do the job.