Launching and Scaling a Transformation Organization

Shahir A. Daya
11 min readMar 9, 2020

By Shahir A. Daya

“How do we start?” is a question I get asked a lot when it comes to Digital Transformation. I spend most of my time on client engagements working to solve hard problems at some of the best companies. Having recently had the opportunity to work on a couple of significant transformation projects, I wanted to share the approach that has evolved out of them and the lessons we have learned.

There are many workstreams that comprise a typical digital transformation program. My recent client engagements have been digital transformation programs involving moving from traditional monolithic architectures to modern Cloud Native, Microservices, and Event-Driven Architectures. They have included organizational change and introducing new ways of working. Those are the areas of a digital transformation program that my article will focus on.

I am not suggesting that the approach I describe is the best approach or that it is sure to work for you. Clients are different, have different organizational structures, different cultures, and different forces acting on them so your mileage will vary. If you have your own learnings, it would be great if you share them via comments to this post. I am always learning, trying new ways, and would love to hear about your experiences.

IBM Garage Method for Cloud

At IBM, the Garage Method is our approach to enable business, development, and operations to continuously design, deliver, and validate new solutions. The practices and workflows cover the entire product lifecycle from inception through capturing and responding to customer feedback and market changes.

This methodology drives Enterprise Design Thinking at scale, is built on agile principles for co-located and distributed teams, leverages DevSecOps tools and techniques for continued delivery and operations, fosters digital talent, and enables Site Reliability Engineering (SRE). See Google — Site Reliability Engineering [1] for more on SRE.

The outside-in approach of the IBM Garage Methodology focuses first on business outcomes, not technology, and guides your solution journey from inception to delivery while de-risking innovation. This new way of working enables business, development and operations teams to continuously design, deliver and validate new solutions.

The following are the Core Tenants of the Method:

  1. Understand Business Outcomes
  2. Define an MVP using Design Thinking
  3. Test-Driven Development (TDD)
  4. Pair Programming
  5. Co-Creation
  6. Code in 1-day bits
  7. Deliver continuously
  8. Hypothesis-driven development
  9. Hold Playback
  10. Automate monitoring

To summarize, the IBM Garage Method for Cloud is an integrated method of industry-leading practices to help you scale and transform your industry. It is based on:

For more details on the IBM Garage Method for Cloud see “IBM Garage Methodology” [4] and “IBM Garage Method for Cloud” [5]. The Method is publicly available for anyone to leverage.

The illustration in Figure 1 shows the organizational structure of the SW Engineering functions. Functions such as program management are not shown.

Figure 1: Organization Structure

Walkthrough of the Organizational Structure

Let’s walk through this organizational structure to get a better feel for the roles of the different squads and their relationships. It is also important to understand the sequence in which parts of the organization need to be formed as a transformation program starts up.

Architecture and Platform Engineering Squads

The Architecture Squad comprises of architects that are on the transformation program full time. The squad also has a Chief Architect. You could have two Chief Architects if you are pairing which I have done a few times and it works well to change behavior. Every architect is assigned to 1–2 Build Squads once these are stood up. They provide the necessary architecture support the Build Squad requires.

As a transformation program starts up, reference architecture and a reference implementation need to be established and built. The architecture squad, just like every other squad, has a backlog with prioritized stories that they work from. The architecture stories are focused on the reference architecture/implementation. The architects work on the design of these stories and pass them to the Platform Engineering Squad.

The Platform Engineering Squad is a Build Squad that focuses on building out the reference architecture and the reference implementation that the Build Squads focused on building products will leverage. The Architecture Squad plays the role of Product Owner for the Platform Engineering Squad.

The initial set of architecture stories are created by the architects, however, once the Build Squads are operational, they can submit architecture stories to the Architecture Squads backlog for consideration. If the story is cross-cutting, strategic, or will occur frequently, the Architecture Squad will prioritize it. If it isn’t, it will turn it back to the Build Squad that submitted it and leave it to them to implement.

The architecture backlog is jointly groomed and prioritized by the Chief Architects, the Squad Leads of all the Build Squads, and all the Product Owners.

The output of the Architecture and Platform Engineering Squads include:

  • Code / Templates
  • Architectural Decisions
  • Leading Practices
  • Reference Architecture
  • Reference Implementation
  • Developer Playbook

Build Squads leverage all of these artifacts to hit the ground running. Given that the primary consumers are developers, it makes sense to create them in a tool/location like a Source Code Repository that developers visit frequently. Kyle Gene Brown covers this in his recent article on GitArchitecture [6].

Build Squads

Build Squads are product development squads. They are full-stack squads with a high degree of autonomy and have end-to-end ownership of the product. They own the epics that make up the product they are building from birth until retirement.

Structure of the Build Squad

Figure 2 illustrates the key full-time roles that are part of every build squad. There are roles that are part-time such as a user experience designer and the architect. See “Roles in a squad” [7] and “Build effective squads” [8] for additional perspectives.

Figure 2: Build Squad Structure

In the sections that follow, I am sharing the detailed role responsibilities for these key roles. In some cases the list is long but I wanted to leave you with a more complete list that you could leverage in your own work.

Product Owner (PO)

Every product has 1 backlog and 1 product owner. The responsibilities of the product owner include:

  • Represent the needs of the business (interface) — includes site performance and reliability (operational excellence), and measuring the success of functionality (e.g., analytics)
  • Create and groom user stories
  • Prioritize backlog
  • Accept stories
  • Manages and optimize business outcome across relevant business stakeholders
  • Represents the voice of customer and product P&L
  • With delivery velocity information, plans and manages to release dates by balancing scope and time to achieve target dates. If additional resource capacity is needed, breaks down business domain into logical smaller ‘P&Ls’ to allow for additional POs and Squads whose aggregate can deliver higher velocity
  • Provides business knowledge, including market sizing and segmentation, target customer/user definition, competitive positioning, industry trends, etc.
  • Update and Track Defects Database
  • Interact with UX Team

Squad Lead

The Squad Lead is a pivotal role. They act as an anchor developer and as an Agile coach in a player/coach role. They pair with development team members and mentor the entire team in leading practices. There is 1 full-time squad lead for every squad. The responsibilities of the squad lead include:

  • Deep hands-on software engineering experience
  • Agile expert — Leads Garage Method ceremonies including Inception, Iteration Planning Meetings, standups.
  • Ability to assess and adjust squad talent to improve quality and velocity
  • Hands-on senior software developer and experienced DevSecOps consultant
  • Strong management and communication skills
  • Ability to estimate the level of effort at the squad and mS levels
  • Accountable for delivery including velocity, continuous improvement, practices, and principles.
  • Leads Garage Method ceremonies including Inception, Iteration Planning Meetings, standups.
  • Leads team in evolving agile and technical practices.
  • Supports the Product Owner in running playbacks.
  • Defines squad skill requirements with input from the architect and in conjunction with other Squad Leads.
  • Interviews and selects developers and SREs.
  • Helps articulate and raise team blockers.
  • Identifies and leads strategies for technical interdependencies with other teams/products/services working with the architect.
  • Leads design and coding decisions by the Squad; engages help from Tribe Architects for architectural decisions and UX Designers.
  • Technical lead to developers and makes key technical decisions. Responsible for code quality, test coverage, and adherence to standards.
  • Partners with the PO to lead the work of the squad, serving as the definitive technical and process leader.

Full Stack Developers

We look to staff full stack developers but have found them to be difficult to source. Most have either specialized in front-end development or back-end development. We have found that pairing a front-end dev with a back-end dev helps move all developers closer to being full-stack. There are about 3 pairs of full-stack developers on every build squad. The responsibilities of the full-stack developers include:

  • Ability to communicate, consult and develop high-quality software in a team setting
  • Ability to code in a microservices architecture
  • Preferred front end and back end coding languages
  • Aware of DevSecOps leading practices in code design
  • Aware of security leading practices in code design
  • Writes production-quality code that meets user story requirements. Does not write code for future requirements.
  • Applies leading practices and patterns for design and code. Writes code by pairing with other developers.
  • Writes Automated tests (all levels of Testing Pyramid — unit, functional / integration, end to end, etc.) before writing implementation code
  • Adheres to coding standards and architectural decisions including meeting operational requirements.
  • Delivers code using Continuous Integration (CI).
  • Provides production support for their product/offering.
  • Minimizes technical debt by aligning development to business value and continuously refactoring code as new stories are fulfilled.
  • Articulates and raises blockers in a timely manner to Squad Lead and Product Owner.
  • Proactively develops resiliency and depth of expertise across the team through collaborative pairing.
  • Learns any new technologies used on the project and works independently to fill skill gaps, with the advice/recommendations of the Squad Lead.
  • Actively contributes to Squad interactions and ceremonies.
  • Recognizes the value of continuous improvement and seeks opportunities to improve.
  • Assists with interviews for potential developers

Site Reliability Engineers (SREs)

The SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s). The SRE focus is on the Ops part of the DevSecOps strategy while the full-stack developers focus on the Dev part. There is 1 full-time pair of SREs in every build squad. The responsibilities of the SRE include:

  • Responsible to develop the initial on-call playbook, and to keep it updated as the product evolves and as features are added, removed or changed
  • Key resource responsible for change management on the team, and develops actionable plans to minimize outage risks tied to changes by focusing on progressive rollouts, quick and accurate troubleshooting, and efficient and reliable rollback of changes when required
  • Forecasts demand and plans for adequate (optimal) capacity to satisfy natural product usage cycles and any surge demands, such as marketing campaigns, promotional campaigns, etc. This is done while not exceeding the computational budget agreed for the team
  • Ensures load-shifting is completed as required to address usage variations and scheduled maintenance windows.

Some common questions

Who does Application Maintenance such as production bug fixes?

A Build Squad stays with the product. Once they are done building a product, they are responsible for adaptive maintenance and fixing issues. They have one product backlog and the production issues will be added to the same backlog. The Product Owner will priorities the work of the squad and if the production issue could get prioritized to the top of the backlog if it is a critical issue.

To ensure developers don’t get bored, they can move from one Build Squad to another to give them experience with a different product. We have to be cautious when moving developers as we can’t make wholesale changes without impacting the performance of the squad. The Build Squad could also be responsible for more than one product to ensure they are well utilized.

Who does the Ops in DevSecOps?

In many regulated industries, the dev and ops teams are separate teams to achieve a level of separation of duties. In some cases that is necessary. Many of my clients are seeing the benefit of bringing the dev and ops teams closer together and blurring the boundaries.

One of my clients has Build Squads push to production. Given a Microservices architectural style, being Cloud Native, and having implemented several resiliency patterns, the blast radius is contained should something go wrong. They also leverage Canary deployments so the number of clients impacted would be small. In case they need to roll back, blue-green deployments make that a non-event. They also follow the “you build it, you run it” approach where the Build Squads are on the PagerDuty list when things fail in production.

Lessons Learned

Here are three key lessons learned.

  1. Start with the Architecture and Platform Engineering Squads — in one of my engagements we started the Architecture, Platform Engineering, and Build Squads all at the same time. We found that the Architecture Squad was always struggling to keep up and never really managed to catch up. You want to start the Architecture and Platform Engineering Squads first and give them enough time to build out the Reference Architecture.
  2. Designing the squad is just as important as designing the product — we found that the structure of the squad was extremely important. Keep build squads small — ideally 8–10 people. When squads were larger everything took longer and productivity went down. Start with 3 pairs of developers and increase to 4 only once the squad matures. Also, have an SRE pair as part of the squad.
  3. Give Build Squads end-to-end ownership and accountability. The more ownership they feel and the more control they have the better the product they ship.

Bring your plan to the IBM Garage.
Are you ready to learn more about working with the IBM Garage? We’re here to help. Contact us today to schedule a time to speak with a Garage expert about your next big idea. Learn about our IBM Garage Method, the design, development, and startup communities we work in, and the deep expertise and capabilities we bring to the table.

Schedule a no-charge visit with the IBM Garage.

References

  1. “Google — Site Reliability Engineering”, Landing.google.com, 2019. [Online]. Available: https://landing.google.com/sre/. [Accessed: 29- Nov- 2019]
  2. T. Fernandes, “Spotify Squad framework — Part I”, Medium, 2019. [Online]. Available: https://medium.com/productmanagement101/spotify-squad-framework-part-i-8f74bcfcd761. [Accessed: 29- Nov- 2019]
  3. J. Kamer, “How to Build Your Own Spotify Model”, Medium, 2019. [Online]. Available: https://medium.com/the-ready/how-to-build-your-own-spotify-model-dce98025d32f. [Accessed: 29- Nov- 2019]
  4. ”IBM Garage Methodology”, Ibm.com, 2019. [Online]. Available: https://www.ibm.com/garage/method. [Accessed: 29- Nov- 2019]
  5. ”IBM Garage Method for Cloud”, Ibm.com, 2019. [Online]. Available: https://www.ibm.com/garage/method/cloud. [Accessed: 29- Nov- 2019]
  6. K. Brown, “GitArchitecture — a better way to capture Architectural decisions”, Medium, 2019. [Online]. Available: https://medium.com/@kylegenebrown/gitarchitecture-a-better-way-to-capture-architectural-decisions-b3574a3d604. [Accessed: 29- Nov- 2019]
  7. K. Brown, “Roles in a squad”, Ibm.com, 2019. [Online]. Available: https://www.ibm.com/cloud/garage/content/culture/practice-roles-in-a-squad/. [Accessed: 28- Feb- 2020]
  8. K. Brown and C. Vo, “Build effective squads”, Ibm.com, 2019. [Online]. Available: https://www.ibm.com/cloud/garage/content/culture/practice-building-effective-squads/. [Accessed: 28- Feb- 2020]

Thanks to Kyle Gene Brown and Dhruv Rajput for their helpful review of this article.

--

--

Shahir A. Daya
Shahir A. Daya

Written by Shahir A. Daya

Shahir Daya is CTO at Zafin and Former IBM Distinguished Engineer.

No responses yet