Resilience Implementation Structure for Complex IT Systems



By Arvind Mundra

Philadelphia, PA, USA


The expanse of domains, where the principle of project management can be applied, almost overlaps entire work space in the modern world. When someone refers to project management in typical IT projects, most of us presume that the project would be about developing delivering new exciting features to delight the customer-base. From time to time, we come across projects where technology infrastructure is the core focus. The specific goals of such projects may vary but they generally fall into the broader categories of improving the efficiency of infrastructure and alignment for a new solution architecture. Then comes third type of projects where the goals are much more nuanced and contextual (though not less critical) like security, reliability, resilience etc.

These projects are created to strengthen a particular area of operating complex IT systems so that value delivered by typical software development and infrastructure projects reach the customer-base with minimal risks and challenges. Resilience Implementation is one such area for project management, considering the new world of impatient customers with social media to take down any company’s hard-earned reputation in minutes.

Improving Resilience has been key focus of most (if not all) large organizations with complex IT systems. Transforming the tenets of Resilience science into value-adding Resilience Engineering requires a comprehensive and adaptive structure. We need comprehensive structure because without considering all parts of systems and all dimensions (like past, present and future; and IT, human and organization; and strategical, tactical and operational etc.), it will be difficult to achieve meaningful resilience. Also, only an adaptive resilience framework could sustain itself in ever-changing technological and organizational landscape.

The proposed resilience implementation structure is meant to be one of many ways an organization can structure its focus and efforts to embark upon a resilience journey.

Triangle of Resilience

Accounting of the past

Past can sometimes be cruel – It is true of human story and so is of IT systems. Technical debt is broader often-used term which encompasses not only resilience challenges (both unintentional and intentional causes like not understanding system level behaviors and crunching the time-to-market etc.) but also security, operability, scaling and other categories.

The fundamental question that we ask of IT system which is given to us as heritage:

How recoverable our IT systems are in the event of unexpected?

Now it can be difficult to measure as many of IT systems comes with stack of legacy infrastructure which do not lend themselves very well for the purpose of measurement or have capability to allow probes to monitor them (or in some cases, this monitoring itself impacts measurements as it becomes extra process to manage for the system).

Also, it is challenging to decide upon what measurements should we choose and how to add meaning to this measurement quantification. If we take myriad groups/teams that are supporting a large/complex IT system into consideration, a common understanding and association to these quantification measures does not remain a nice-to-have but becomes a must-have.

It is only rationale that this stick of measurement remains simple – not only that it can be applied to different components, sub-systems and systems with equal effectiveness but also it should be easy to keep everyone aligned in terms of understanding. The measurement of resilience of an in-the-hand-of-customer-system needs to be free of external measures like customer traffic, security threats, number of features, in-house development or vendor integration, new or old technology stacks, age of the product/service and nature of hacking attack etc. One or more of these parameters may certainly influence the resilience of the system from one week to next (and from one month to another) but cannot define it.

In a broader context of a complex IT system, following can be a good working definition of resilience –


To read entire article, click here


How to cite this article: Mundra, A. (2019).  Resilience Implementation Structure for Complex IT Systems, PM World Journal, Vol. VIII, Issue I (January). Available online at https://pmworldjournal.net/wp-content/uploads/2019/01/pmwj78-Jan2019-Mundra-resilience-implementation-structure-for-complex-it-systems.pdf


About the Author

Arvind Mundra

Philadelphia, PA, USA




 Arvind Mundra has been working Agile Project Management for last 20 years in various roles. After spending 6 years as C++ developer, Arvind became interested in aligning project management to the maximum value creation from a software development team’s work. For last many years, Arvind has focused on training software development teams on Agile Project Management and coaching them to build the self-sustaining discipline.

Arvind has gone through multiple certifications from both PMI and Scrum Alliance and has contributed to the pilot program of PMI-ACP certification and has been active in local Agile user groups.  Arvind can be contacted at [email protected]

To view other works by Arvind Mundra, visit his author showcase in the PM World Library at https://pmworldlibrary.net/authors/arvind-mundra/