Posts by drtooraj

I love systems

Pseudo Gravity’s First Department

Megan was very excited that she was able to build the first resource in their organization, the IoT hub. At the same time she wanted to follow a pattern that could be easily extended as new departments were added to their organization. As mentioned earlier she sought to build three tools that could help them simplify building new resources. At the same time she wanted to make sure she spent her time on building tools that directly helped building the floating car prototype.

Next morning she met John to show the progress she’d made in setting up the hub which make him really pleased. He also demoed the progress he’d made in setting up 1000 wireless sensor on his Tesla and was ready to connect them to the IoT hub. They looked at different ways to connect the sensors to the hub for a few hours and made several decisions. Just before they wanna leave to grab a bite for lunch, Megan explained to John what she had thought about extensibility. John liked the idea but like Megan wanted both of them to stay focused on building the prototype so they decided to add a friend of Megan called Ali to the team. Ali, a graduate from USC, had been Megan’s coworker at Microsoft and had strong DevOps skills and was an Azure certified architect. Megan called Ali and asked if he could join them for lunch. Ali was able to meet them at lunch and did not take any long to convince him to join PG to lead building the internal tools they needed. They decided to establish a department called “internal” and gave Ali the director of software development in charge of the internal department.

Next day Ali and Megan met in the afternoon at a nearby coffee shop and brainstormed building the internal tools. After a few hours, they came up with the following rules to apply to all the departments:

Governance Rules

These rules specify who can do what and are enforced by building specific groups with strict permissions in VSTS.

  1. Only keep one VSTS project called ITAC since to be able to share the groups made for different departments.
  2. Create a separate repo for each department/asset combination.
  3. For each new department create four groups to manage development, testing, project management, and owning the resources. Groups are named using the pattern [Department][Role]. For example the developers in the internal department will be part of InternalDev group.  
  4. Apply these relationships among the groups:
    1. [ParentDept][Role] is a member of [ChildDept][Role]. This will ensure that the parent department has all the permissions given to its children. For example OrgOwner is a member of InternalOwner giving Megan ownership on all resources in the internal department.
    2. [Department]Owner is a member of all the other groups within the same department to make sure owner has all the permissions that other members of the department have. For example InternalOwner, Ali, would be allowed to write code (inherited from InternalDev), modify the release definition (inherited from InternalPM), and approve pushing release to Prod (inherited from InternalQA).
  5. A person from [Deparment]DEV team has to manually start a deployment to DEV (no CD for IAC) and has to be approved by a member from [Deparment]PM. A member from [Deparment]QA has to pre-approve a QA deployment happen (to make sure she is fine with replacing the existing structure that might be under test). All [Deparment]QA, [Deparment]PM, and [Deparment]Owner have to preapprove a PROD deployment (to make sure the IAC is vetted by the QA and owner is fine to push to production which requires coordination with other departments ahead of deployment).
  6. The following rights are defined for member of the four groups:
    • DEV: write access to repo
    • QA: read access to repo, creating test runs, co-preapproving QA and PROD deployment
    • PM: manage work items, creating build and release definitions, co-preapproving PROD deployment
    • Owner: full permission to all VSTS actions against department repos, co-preapproving PROD deployment.

Figure below shows the VSTS group and permissions:

ITAC Rules

These rules define how to name, build, and release infrastructure resources and applications.

  1. The IAC file follows this naming conversion: [Department].[Solution].Arm for example the infrastructure for CloudOrg is called Internal.CloudOrg.Arm.
  2. IAC provides the same topology for all environments while allowing for variations on size of resources. For example a web app could be using a basic edition of a database in DEV and a standard edition in PROD. To allow this they decided to have a separate parameter file for each environment (similar to having separate configuration files for applications). parameters files are named [ResourceName].parameters.[ENV].json. For example, the web app parameters file in DEV is called WebSiteSQLDatabase.parameters.DEV.json.
  3. Both AAC and IAC can reside in the same solution.
  4. Infrastructure and applications have separate release definitions since they are released with different frequencies (Infra is deployed much less frequently that the application). These definitions follow the following naming convention: [Department]-[Solution]-[Type] where type is either ARM – for infra-  or APP – for application.

Following figures show the ARM templates and parameters and also the corresponding release definition:

This slideshow requires JavaScript.

Ali started applying these rules in VSTS and promised to have a version of the app the could use to visualize their IT organization in the cloud. Megan suggested to call this app CloudOrg since it could be used to visualize their entire organization. Ali liked the name and drew some wireframes to show what he had in mind on CloudOrg. They spent some time discussing various options. It was around 8PM that Megan felt a bit tired and said it has been a long day for me and I am about to leave, how about you? Ali said: well, it is too early for me to stop working! He then rolled up his sleeves and began coding CloudOrg. Megan giggled and said hasta mañana.

 

 

 

 

Startup ITAC

PG needed to build the car balancing model as soon as possible to be able to do a demo to investors who were impatiently waiting to see a real world example. John and Megan had picked Megan’s car for the demo which was a Tesla X. Therefore, the first IT assets Rose decided to build were a set of PaaS IoT services that could be used to collect data from the wireless sensors attached to the Tesla. 

At this stage of the development since there is no infrastructure is involved, one can easily go on the cloud portal and provision the required resources with several clicks but our goal it to build ITAC from the ground up which means that we are going to do everything based on the golden standard we have defined: everything as code. The only exception to the golden rule is setting up the subscription itself that has to be done manually. For BizSpark specifically, the person who applied for it, receives an email with instructions to set the subscription and once done she becomes the global admin. The rest of the employees are added as regular users as needed.

Subscription Management

Megan went ahead and set up the subscription and became the global admin. She then did the following:

  1. Registered PG’s own domain, pseudogravity.io she already bought from GoDaddy.com
  2. Added herself as megan@pseudogravity.io and made her both the global admin and the subscription owner.
  3. Added John as john@pseudogravity.io as a user.

DevOps

Having the users set in Azure. She began building the ITAC for the PG. She did it by creating a free account in VSTS called pseudogravity.vistualstudio.com. Since she was the global admin of the Azure subscription, the VSTS account was automatically connected to the Azure subscription.

She then added the first project to the VSTS account which she called ITAC which would continue to hold the entire definition of PG’s IT in the years to come.

Under ITAC she built a repository called Org to contain the highest level of assets belonging to CTO. Also to control access to this repo she added the following groups and added her and the only user to all groups.

  • OrgOwner: Have full access to all resources within the organization.
  • OrgDEV: Can update the ARM templates.
  • OrgQA: Can approve a release to QA and co-approve a release to Prod.

VSTS looked like below so far:

This slideshow requires JavaScript.

Next step is to set up the release pipeline which normally includes three environments DEV, QA, and Prod. Whenever the ARM templates are updated, a DEV release is automatically triggered. Deploying to QA requires approval from OrgQA to make sure they are ready to test and deploying to Prod requires approval from both OrgQA and OrgOwner – which at this point means Megan would do all of these by role playing across all.

Once Megan finished the above task, she began to actually add the ARM templates for the IoT hub which collected the data from sensors. Megan did some reverse engineering here. She went to the Azure portal first and configured an Azure IoT hub in the portal and then copied and pasted the generated template and parameters json files into a new cloud ARM project she added in Visual Studio. She decided to follow the following conventions to organize ARM projects and release definitions. 

  1. The solution name is called Org.
  2. Each resource group will have a corresponding project in Org. For example all the resources built for collecting and processing balancing data from sensors will be inside a project called balancingData. 
  3. There will be a separate release task for each resource group.
  4. If a resource group contains multiple resources, each resource will be added in a separate template file. A master template is created that is used in the release task and references the other resources via a URL to their VSTS repo location. (To be tested once she added more resources since at this point she only has a single resource, the IoT hub).

The figure below shows what Megan had achieved so far:

This slideshow requires JavaScript.

Megan felt very content that she had built the first piece of what was going to be extended to define their entire IT soon. But before she would want to go to John to break the great news she thought of adding three more things:

  1. Thinking about extensibility of what she had accomplished for Org to the new departments they would add to their organization in the future, she decided to do some research around how to automate building all the necessary pieces for new projects including provisioning the repository, adding all the necessary groups, and the release pipeline since all needed to follow the same pattern as Org did.
  2. She thought or creating a web portal that provided a hierarchical view of their entire organization. What she had in mind was a org-chart tree where she could start at the top, the org, and drill down into other departments and sub departments and view their allocated resources and the associated groups which basically presented their ITAC. She wanted to build this by extracting metadata from both Azure and VSTS.
  3. In order to have single point of management, she thought of creating a dashboard per department in Azure that provided resource consumption costs, status (working, alerts or potential failures), and the manager of that specific department.

 

 

 

 

ITAC at PseudoGravity

In order to see the evolution of ITAC, I have decided to use a fictitious story around a startup called AntiGravity. These days Azure and AWS are very generous when it comes to supporting promising startups and both grant a free multi-year subscription to such startups so it makes absolute sense for an early-stage startup to build its IT in the cloud rather than burning personal funds to buy physical hardware.

Let’s begin with telling the exciting story of AntiGravity. John was one of those people who used to lock himself days and nights in the lab using pounds of chuck writing loads of mathematical equations around gravitational waves on the huge blackboard. Being a PhD student at the physics department of MIT, this was not considered abnormal by any means. However, what made John unique was the experiment he did on that cold early morning at his lab in Cambridge. That night it snowed pretty heavily and made the entire city white which also shut down the T red line that John used to take to get home. When John finally got out of his lab around 6AM to figure out how to get to home without any public transportation, he could not believe what he saw: there was no snow around his lab for a radius of 50 feet. All the snow was floating in the air instead. He almost fainted when he suddenly figured out what had happened. He had discovered how to generate pseudogravity waves in his labs which had kept the snow in the air. The next few months he felt really overwhelmed by the sheer amount of interest and intrusion he got exposed to from all around the world.

Finally after a year when things went a bit quiet and he successfully defended his dissertation on anti-gravitational wave, on a beautiful day in early August he met with his best undergrad friend Megan in the Philz Coffee in San Fransisco who had started working at Microsoft Research after graduating from Stanford. After having the first sip of his coffee, John started the conversation by talking about his grand vision: I want to build a city floating in the sky that could save humanity from natural disasters like earthquake and flooding forever.  Megan smiled, held John hand firmly, and said “let do it”. The week after John and Megan started working at their startup called PseudoGravity or PG for short. John, the CEO, was in charge of building a prototype that could hold a car in the sky while Megan, the CTO,  was in charge of programming and IT. They required to perform a massive amount of data analysis to build an accurate mathematical model used to keep the objects balanced in the sky. They also required a lot of wireless sensors connected to the floating objects to control their position in the air. The stream of data collected from sensors was connected to the big data server via an IoT solution. Finally they needed to design a central system which they called “the brain” to calculate the position of objects to avoid collision. Given the amount of hardware needed to run all the components, Rose decided to build their IT on the cloud. A few weeks after applying to Azure BizSpark – Azure’s startup program – PG was granted a 5-year free subscription on Azure.

Such an inspiring story already entices me to leave ITAC completely aside and just focus on finishing the PG story. However, I am going to work on both as the same time. We will see how ITAC evolves to define PG’s IT department from a two-people startup to an organization with more than 10,000 employees and 20 departments all over the world. As the startup grows and requires more IT assets to support its growth, I am expecting to see ITAC evolve smoothly in parallel. Also as we move along, I try to build tools that can be used to build the organization on the cloud and also transition it to the next step as it evolves.

Although not as exciting and grandiose of a vision as John’s, my vision is also a long shot since I am trying to build a theory with tangible assets to systematically design, build, and run an IT department at any size in the cloud.

Cloud-Organization aka IT-as-Code

Coming from a system and engineering background and working as a software engineer and manager for 15 years, I have always been looking at how to systematically run the IT department of the largest enterprises in the world. Public clouds have made it possible to systematically run an IT department at any size by allocating resources via code and defining governance in an exact and tangible manner. These two adjective are of ultimate importance. I am going to briefly talk about each in this blog but before doing so I am going to state my vision around writing these series of the cloud-organization blogs.

If in the past, IT department’s assets the software and the hardware were treated as different types of resources and were managed separately, in the cloud world, they have converged into the same type of resource. These days allocating a set of servers or cluster of databases is not any different than developing a set of enterprise applications. We can use Infrastructure-as-Code (IAC) to define the infrastructure which also include all the security rules and polices also defined as code aka Security-as-Code (SAC). IAC, SAC, and applications (I call the latter Application-as-Code or AAC to be consistent and avoid confusion from now on) are all kept in a code repository (GitHub for example), built and tested using the continuous integration (CI) pipeline, and deployed to various environments (DEV, QA, PROD) using the continuous deployment (CD) pipeline. A modern IT organization treats all of its assets as code. This new way of conceiving IT is revolutionary and makes administration tasks like business continuity and disaster recovery (BCDR) as simple as deploying the latest version of an application to the production.

I can now specify my vision:

  1. To code the entire IT department (I call this IT as Code or ITAC) and
  2. To specify hierarchies of IT staff each in charge of architecting, developing, maintaining, and releasing a specific level of the ITAC.

Based on this vision, the CTO role is specifically defined as setting up the governance which is to set who is in charge of what portion of ITAC. This definition is exact since all responsibilities of a given head of unit is coded (I call this Unit-as-Code or UAC) and is tangible since when UAC is released it produces hardware and software assets that are managed and owned by the unit. Management is also precisely and consistently defined based on best IT processes like agile or continuous software delivery. This way we can say head of each unit is in fact the delivery manager of his or her UAC.

How the entire ITAC is split among various managers and how efforts among managers are coordinated is what I will try to think and write about in these series of blogs.