Increasing Automation and Resilience with DevOps and Cloud Computing
By Jan Pilbauer, Executive Director Of Modernization & CIO and Andrew Mccormack, Senior Director, Technology, Payments Canada
DevOps Adds Resilience
IT resilience is the ability to quickly recover and continue to operate in the event of a disruption. It is the product of intelligent IT architecture, one important facet of which is automation. This is important for two reasons: less manual work to do, and less chance for error. For Payments Canada, resilience involves DevOps, an agile relationship between development engineers and IT operations that has them participating together in the entire service lifecycle, from design to development, production and support.
What DevOps boils down to is developers embracing the operational end-state and understanding the production environment in a way that they previously might not have. This leads to efficiencies. The ideal result is a codification of everything in your environment, including the infrastructure, such that it forms part of an information release pipeline so automated that it virtually eliminates the possibility of human error. The closer you get to this end state, the less likely it is that you would ever have to engage your “stand-alone” DR plan.
DevOps is very much a pillar in a technology strategy to modernize our infrastructure operations, application development and software engineering practice. It helps with development, operations, and the business of the organization itself, as well as improving the interplay between them. Its primary goal is to deliver higher quality end-user experience and new services more quickly. As we become more agile and efficient, IT productivity will rise and operating expenses will reduce.
Added resilience is part and parcel of DevOps. You can still benefit from deployment automation (DevOps) processes even if you do not introduce any functional change whatsoever. Consider one of your software applications, for example. The source code is at the top of the stack, and when you unpack that stack there is likely a web server and other dependencies— services required to make the application work. If one of those internal services has vulnerability, your DevOps process can flag it and automatically upgrade to a non-vulnerable version. Since the test and release processes are automated, they enable you to fix vulnerability issues in a matter of hours instead of weeks or months. That’s a vision we all should have for our IT organizations.
We are in the midst of a major modernization effort aimed at enhancing our payments infrastructure
Moving to Low-risk, Incremental Changes
With DevOps in place, you can change the way you modify platforms, services, and applications. A functional automation pipeline enables you to scrap the once-a-year upgrade approach and make very small, lower-risk incremental changes to platforms, services, and applications. The important part is the process. When you fine-tune constantly, you can get really good at managing releases, upgrades and other changes to your environment.
Payments Canada is three quarters of the way through our first iteration of DevOps implementation. To begin, we have focused on smaller applications for which we can control the entire technology stack. With the two applications that we are working on at the moment, we have fully automated the build and test processes so that we can deploy the applications into the staging environment.
From DevOps to Containerization
Staging is fundamental to containerization, our next step in IT modernization. This is a concept that essentially redesigns the way that software artifacts are created and put into production. Traditional software engineering involves development, testing, then rebuilding for the real production environment and hoping that it works. In a nutshell, containers fully describe your entire environment. What gets tested in your development pipeline is what gets put into production. In other words, once you have tested and certified a software artifact, the work is complete.
Containerization is a concept that has revolutionized software engineering. It is a game-changer even from the resilience perspective. Our next milestone is to containerize the two applications that we are working on and make the process a permanent part of our pipeline in 2018.
Cloud for Agility
As DevOps builds agility, so does cloud computing. In the past few years, the cloud environment has really matured. Larger, risk-averse organizations used to shy away from it; now it’s ubiquitous. Cloud computing today consists of powerful technology companies with world class IaaS and PaaS offerings. Their capabilities are tremendous. You can build a system more resilient in the cloud than you could ever hope to in-house. You can run at-scale transactions across multiple databases and maintain transaction integrity, literally around the world, in real time.
To become an agile, proficient operator of applications and services, Payments Canada is looking at how the advanced technology companies are approaching things. This points to cloud computing architecture very, very quickly. As an organization responsible for the clearing and settlement infrastructure, processes and legal framework supporting financial transactions in Canada, we must of course tread carefully. There are many stakeholders who need to be part of our future-state technology architecture discussions, and we have an appetite to go far.
DRP End Game
The end state that we are aiming at is services and systems architected in such a way that resilience is inherent. DevOps, containerization and cloud allow us to eliminate manual processing, human errors, detect and respond automatically to failure conditions and re-create full production quality environments quickly and efficiently.
So while DRP will obviously remain part of Payments Canada’s operating procedures, our goal is make a disaster recovery scenario an even more remote possibility than it is today. This allows DRP work to focus on true disasters or very sophisticated scenarios rather than preventable outages associated with day-to-day operations