What is your design budget

We all want to have good longterm software architecture. Build it right the first time. But some organizations fall into the trap of trying to get a perfect design before they start building.

They include more stakeholders, try to plan for contingencies, develop a high availability strategy. 

Those things are all good to have. But you don’t need them before you have users. If you have no users you shouldn’t be thinking about high availability. If your product doesn’t work yet you don’t need scalability.

Users > working product > scaling limitations > high availability

Another problem is having a committee to approve architecture proposals. One or two people is fine. But having three or more people who can block the start of the project is a recipe for pointless delays.

The reason we switched to agile methodologies is because it’s hard to know what the difficulties will be before you start building the software.

Be careful to structure your organization such that design doesn’t become more important than delivery.

A working product with customers is always more valuable than a product that doesn’t have customers but is highly available.

A design budget is a way to avoid falling into this trap. Simply budget a week or a month or whatever for designing up front. At the end of that time enforce a hard stop on doing more design. No more committee reviews or deliberation. If the architecture isn’t developed enough to start building, do a spike on a smaller scale. 

Polyglot programming is a bad goal 

People have extended the goal of building micro service systems such that we aren’t locked into a language to a new goal of every repository being able to use any language.

It’s phrased as ‘shouldn’t we use the best language for the job.‘ But the implication is that we should be able to pick a language for this new project independent of the rest of our existing software stack.

The problem is that it’s not a free choice. Even if you’re in an ideal microservices environment where everything is API based, logging and metrics are handled by the platform and shared libraries are banned there are costs to having more languages in play.

The most basic cost is expertise splintering. If you only use one language and everyone knows it. Then everyone can read and extend all the code. But as you start throwing more languages and paradigms in that slowly becomes impossible.

You quickly reach a point where transferring people to a new team is expensive. Transferring from one Java project to another in the same stack is easy. Switching from Java to async JavaScript is trickier. Switching from Java to Haskell takes awhile.

Polyglot makes away team work more difficult. And it does it slowly so you won’t really notice until it’s too late. 

Shared libraries are another area polyglot sucks. You can’t always make a library an API. 

Whether it’s because you can’t externalize metrics and logging or you have high performance business logic you need to run locally. Often times you will need to duplicate shared libraries across languages.

Polyglot is nice in a lot of ways. I’d rather get to use Pandas for data science if I had the opportunity. We don’t really need to use only Java for everything. But when it comes to application code, I’d rather everyone used the same language, framework and RPC schemas. It doesn’t make any sense to write certain microservices in Java, others in Haskell, some in Python and a few in Typescript. If they are RPC or event based services you are just using five languages to do the same job instead of one language to do one job. 

Then go ahead and write sidecars in Go or Rust, frontends in Typescript and data science stuff in Python. Those are all different use cases with different trade offs. 

Away team work

Away team work is a critical component of high performance software organizations. It is a way for high priority teams to work around other teams’ manpower constraints to deliver software. Without a well established culture of away team project work your organization will default to a standard of ‘shut up and wait while your item is in the backlog’. 

What is away team work exactly? Away team work is when your team implements a feature or integration in another team’s codebase. Strictly speaking in away team work the host team does not review or approve pull requests. Your team’s senior engineers will get approval at an architectural level then handle pull request review themselves. 

The Away team process allows your team to unblock itself when dependencies don’t have manpower to spare. The key component is first that your team not the dependencies team does the work. Both the implementation, testing and code review. Some effort obviously is required of the host team, but it should be minimized. 

Away team work acts as a release valve on the conflict between the host team’s priorities and potential client’s goals of delivering features to customers. If the host doesn’t have manpower available your team can provide the engineers to make it happen. 

Without away team work your organization will have to make more tradeoffs on the scheduling side. More often than not you’ll have to pick a migration over the new feature. But Away team work would have enabled your organization as a whole to deliver both. 

What do you need to make away team work happen?

The key thing is that you have to make a formal process that specifies the maximum standards. You can’t allow teams to be squeamish about it. If the requirement is that two senior engineers approve that’s fine. Or maybe only senior engineers can do away team work in your company. Thats fine but you need a formal standard to avoid negotiations happening on a per project basis. 

The core problem of micro services – how many feature can you fit through the pipeline before it breaks down?

At Amazon I had the chance to watch a monolithic service reach the point where it had to be split up into microservices. When I started we had about thirty software engineers contributing to our service. We had the great idea of developing a framework to speed up feature delivery for everyone in Alexa Shopping. The framework ended up working and the number of people contributing code to our service shot up over the next two years. As of July 2021 we had around 200-300 people in our partner support slack channel.

What happened is that we gradually spent more and more time supporting tests in our CD pipeline. First we had an oncall who did operational support and pipeline release support. Then once that person got swamped and complaints about release frequency got louder we added a second person to operational support. Then we had one person doing releases and outages with a second doing partner pull requests and office hours. 

Growth continued during this time and we attempted a number of changes to federate features and split out responsibility. We split end to end tests into separate suites so it would be easier to find out who knew what a feature was supposed to be doing. This helped a lot, prior to federating the test suites our oncalls would spend a lot of time deep diving end to end tests to figure out what was going on. Afterwards it was a lot easier to find out who we could ask for help. 

One thing that was a huge failure was expecting other teams to debug and fix their end to end tests in the pipeline. Typically, team A has a launch deadline and it asking us to deploy now. We get their code to the staging environment and we see test failure from teams B, C, and D. Teams B, C and D do not have a launch deadline so they are not prioritizing fixing their end to end tests. 

Another big failure was splitting each teams rules into a different repository. We ended up with ten repositories with one relevant file in each. It could have just been a folder in the original project. Plus it was a lot harder to figure out where to put things with ten different projects. The nail in the coffin from my perspective was that the rules were still deployed together. 

One final issue was integration tests. We wanted people to write integration tests, but everybody (including myself) avoided it as much as possible. The reality was the DSL for our end to end tests was significantly better than for integration tests. It was hard to reason how you were testing your specific feature at the API level. But in the end all our feature changes were directly testable at the end to end test level. It was just a lot quicker and easier to write an end to end test than API level test.

Finally, we reached the point where we had to split up the monolith purely because the pipeline was blocking too many teams. It was a risk because in the short term we knew it would increase operational overhead and delay needed upgrades. Unfortunately, the team lost about 50% of its people especially the most experienced ones. And I was one of them so I don’t know how things ended up. 

What is Web3? It’s about owning your user identity.

There is a lot of discussion of Web3 lately. Is it a scam? Is decentralized finance Web3? Will Web3 destroy Web2? In this blog post we will talk about what Web3 is now and what Web3 allows us to build tomorrow. 

What is Web3 now? 

Right now Web3 is a technology paradigm which supports two main types of services Decentralized Finance (Defi) and Non-Fungible Tokens (NFTs). A lot of online discussion about Web3 focuses on whether NFTs are valuable or if Defi is a scam. People want Web3 to be a scam. In my opinion that is the same as looking at pets.com in 2000 and concluding that Web 1.0 was a scam.  

We are just starting to build software using Web3. And it’s going to be awhile before someone creates the Amazon.com of the Web3 era. Right now we have NFTs and Defi. NFTs are an interesting way to sell ownership of digital goods. An NFT is not a JPEG on the blockchain. Strictly speaking an NFT is a unique identifier on the blockchain. Through metadata you can add any rights to that NFT you want to. Is that useful? Are all NFTs worth $60 million? I don’t know, some people think they are valuable. And we will see what products come out of it over the rest of the decade. 

How about Defi then? There are two main Defi products I know about. Yield farming and token swapping. Yield farming is the idea that you give your tokens to a smart contract, and then the smart contract gives you interest in return. In my opinion yield farms are mostly scams. Yield farms are evolving to combine the characteristics of gambling and ponzi schemes. Does anyone really believe they can earn 10% interest per day by investing their money into Drip? I don’t think there is any real product that yield farms produce. Some claim they provide liquidity, but liquidity does not double dumb money every year. 

However, token swapping is a real product. Decentralized exchanges allow you to swap your tokens for any of the myriad cryptocurrencies out there. In seconds, with relatively low fees. That is useful in a world with incredibly large numbers of tokens.

How is Web3 different from Web2? 

Web2 is probably best exemplified by Facebook. You have a Facebook account, using that account you can post on Facebook and Instagram, message your Facebook friends, sell things on the Facebook marketplace, etc. Facebook owns your account data. If you want to rebuild your friend graph on another platform its going to take a ton of effort. And if you don’t follow Facebooks rules they can simply delete your account. In a core way you have created a digital identity using Facebook’s platform which Facebook owns. 

The key difference between Web2 and Web3 is that in Web3 you own your digital identity. Your cryptocurrency wallet is now your digital identity. Specifically, your wallet’s public key is now your digital identity. This is very different from Web2. In Web2 you use other people’s services and your identity lives in those services. You create a Facebook account using your Gmail account. Your root identity is just an entry in somebody else’s database. 

Why does owning your online identity matter? 

First of all getting banned is much less of a concern. You control your online identity via ownership of your private key. If you buy NFTs via opensea.io and subsequently are banned from opensea.io all the NFTs you bought remain your property. In contrast if you bought League of Legends skins and were banned you lose your entire investment. Does it matter if NFTs and League of Legends skins are really valuable? No, but you definitely spent money on them. And you would be pissed if your $300 JPEG was lost.

Web3 is the difference between being a guest on Facebook’s servers and being a citizen of the internet. Being a guest is nice because Facebook takes care of everything. But guests don’t get a say in how they are treated.

It is hard to tell how big of a deal owning your digital identity will be in the future. So far we have Defi and NFTs as examples. Whats good in the Defi and NFT world? My favorite is how user accounts work. You may never have used opensea.io, maybe you bought an NFT from another platform. But you can still log in to their platform. They will ask for your email, but its not required. All they need is for you to prove ownership of your public key and they can populate your account data from the blockchain. Don’t like the UI or the fees? Log out of opensea.io and find another NFT marketplace. All your NFTs will be there too. 

The process of logging in and out of websites is totally different because you own your digital identity. You don’t need to use an email to prove you are a real person. You don’t need a password because you just sign a message with your private key to prove your identity. The biggest losers of Web3 may very well be password managers. 

Conclusion

The interesting part of Web3 is that the users own their digital identities. It will be a paradigm shift in how the internet works with some control moving from software platform providers (Facebook, Google, etc) to the end users. It’s not a scam, you really will be able to control your digital identity. But Web2 is not going anywhere anytime soon. Email isn’t going to stop relying on someone else’s machine. But you might finally be able to encrypt your emails because you know your friends public keys.