Programming – Page 3 – Sledgeworx Software

Seviipay – cryptocurrency payments for everyone

I’m working on a new project. Seviipay is a software as a service project through which I hope to make blockchain integration easy for everyone. It is in free alpha launch right now and you can check it out at www.seviipay.com.

Why blockchain integration? And why now?

I first got involved with Ethereum back in 2015 before the network went live. It was a fun toy then and I spent some time getting the test version running on my gaming desktop in college. Then I got bored of it and stopped paying attention until the 2017/18 bubble. The story would have ended there except one of my friends did a NFT project this spring and I decided to buy one of them to support him. That led me to seeing the UI at opensea.io, veefriends.com and other web3 applications. Let me tell you the user experience has come a long way in the last six years.

Web3 applications have serious advantages over the traditional web. Digital identity is a billion dollar business in the traditional web, for web3 everyone has access to identity built in. There are no accounts and no signups in web3. Your ethereum wallet is your account on every web3 site. You don’t need to remember passwords for hundreds of websites, you just use your wallet to verify your identity and you are done.

Payments in web3 are comparable in ease of use to ApplePay, possibly superior. Your browser wallet has the ability to authorize payments instantly. No need to enter your address and worry that you typed the wrong data in. Additionally, you never need to worry about your credit card number being stolen after you place an order at a random website. Making a payment via the Ethereum network does not leak secret payment information.

I’m super excited about the potential of web3 to improve identity and payments on the internet. The idea behind Seviipay is to make top level web3 UX easy to access while also providing ways to link actions on the blockchain to the rest of the world.

You can use Seviipay to setup a payment button on your website. Then get an email when anyone places an order. I’m working on a Zapier integration so that you can get a slack message or automatically add someone to a teachable course when they make a purchase.

A few things I learned after 3~ years at Amazon

“two is better than zero” is definitely harmful.

The phrase “two is better than zero” – Jeff Bezos drives a lot of the technical disfunction inside of Amazon. I typically explain it to people by saying “Amazon has 3 internal tools for every job with one tool’s worth of documentation”. Which is basically true, for every problem internally we have 0-3+ tools to tackle it. When our team would come upon issues we would review the internal solutions typically one would be deprecated, another had poor documentation, the third kind of fit our use case, the AWS service would have too high latency.

I’ll describe a few cases to help convey the insanity.

I worked in Alexa which has the concept of a ‘Prompt’ which is some text which Alexa reads aloud to customers. There is a central tool which governs prompts, but it expects you to manually handle promoting prompts from dev to prod. So our team built a ‘prompt pipeline’ service which automates prompt promotions via the central service’s API. One of the engineers on our team got promoted off of that project. About six months later I get an email congratulating another engineer on his promotion, with the comment that he built a prompt pipeline for his team. It isn’t like our teams were super far apart from each other. The 2nd team actually contributed code to the service my team operated. There was basically zero reason to build another prompt pipeline, we should have added the features to the central service for everyone.

The test accounts situation inside Amazon is another area of craziness. There are at least 3 different services that can create accounts, one of them actually works so we depended on that one. Everything was great, we automatically created accounts and ran end to end tests. Months later we start to hit throttling limits on our account creation. We do some research, and it looks like our accounts are getting flagged for fraud and terminated. I reach out to a few fraud teams to find out what is happening. Well, our test accounts look a lot like they stole credit cards and are getting flagged for that reason. I ask if they can whitelist accounts made by the standard test account service we use. They say they don’t have any support for that and recommend manually curating test accounts.

Amazon doesn’t have a tool like Splunk internally. You can’t search across all your services and get logs that way. The industry standard is to have great log search, to the point that free services like Kibana support it. Inside Amazon we have 5~ different tools that provide various levels of log viewing and searching. About half of them are tools to facilitate using Grep to find errors. There is at least one service that supports log searching but it is honestly rather difficult to figure out how and often doesn’t work.

Amazon has multiple infrastructure as code tools. We have CloudFormation and the CDK (Cloud Development Kit) which are AWS sponsored public tools. But we also have a ruby based tool (no not chef) which does infrastructure as code. Then we have another tool that is based on YAML. By the end of my time at Amazon my team supported services with components defined across four different infrastructure as code tools.

The tooling situation inside Amazon isn’t great. I think mostly due to this “two is better than zero” philosophy. There are cases where it is awesome to just be able to build what you need. And my team took advantage of that. But over the years when every tool you use sucks in multiple easily fixable ways, you start to realize the cost you are paying.

Document driven meetings are great

Amazon takes document driven meetings seriously. There is always a ‘Doc’ hosted in Quip which is a tool like google docs. Everyone on the team can read and comment on the document easily. The first 15-20 minutes of a meeting are devoted to reviewing the document. Then we will review the comments and discuss the document during the rest of the meeting. It is great, and you never have to worry about getting people to read the document before the meeting. There are never any power points or scenarios where you have to listen to someone describe something at 1/5 the speed you could have read the document. Another benefit is that you actually have some documentation for every architectural change or new project. If your organization relies on brainstorming and white boarding for architectural changes you may run into a scenario where those don’t end up documented. At Amazon that isn’t a problem because you have to have a document if you want to share your new approach.

While great overall, Amazon’s document driven culture has some short comings. The first is that there is no standard way to capture meeting notes. If agreements were made during the meeting usually the task of recording the decision was passed onto whoever wrote the document.

Another shortcoming is that no one figured out how to make ‘Agile’ meetings document driven. So we end up doing sprint planning, retrospectives and backlog grooming in the old inefficient way.

A big benefit in my opinion is that all quarterly and yearly organization goals are included in a document. Which your director and project managers will share with you. So you will actually get to see written goals for your organization. As opposed to hoping the CEO/VP tell you what their latest idea was.

How to know when you really need Microservices.

You need micro services when you have too many developers to fit software through the deployment pipeline consistently. If you can’t manage a weekly deployment there are too many cooks in the kitchen. My team hit a couple inflection points along the way from thirty engineers on the service to somewhere around 200. We had 300+ people in our support slack channel but there is no way to know if they all contributed code every release.

One inflection point was when our oncall rotation also became the ‘release’ engineer rotation. At that point we had a minimum of one person assigned to ‘operations’ at all times. Next we reached a point where we had to split out e2e tests into separate packages because our team couldn’t keep track of how all the different features were supposed to work in the service.

Then we started keeping 2 people on ‘operations’ at all times because our ticket queue started growing despite regular ‘bug bashes’. The last level we reached is the one where having 2 people assigned full time to operations was no longer enough. We were falling behind on deployments, mandatory migrations and our ticket queue. So we started outsourcing release management to another team. Then we started losing all of our best and most experienced people and how I ended up here.

Bad architectural solutions can ‘curse’ your organization for years.

Architecture is really important. Seemingly simple decisions can have long term effects that aren’t obvious. In Alexa Shopping where I worked at Amazon, we operated inside a dynamic workflow engine. To make that happen every API in our department of 1200 people shared the same schema. To handle the needs of various components the schema included an ‘Envelope’ type which was an array of arbitrary JSON objects.

There was a lot of tooling built around specifying what would be passed into a node. But the long run effect was that each API in our service shared the same schema in the form of a java type. But each API customized that java type in arbitrary ways via the envelope. This converted a strictly typed interface into a dynamically typed interface. Except the types were strict, but the limitation was enforced in another service not Java.

The envelopes solved a lot of problems but one thing it made harder was integration tests. Integration or functional tests in this context means tests against your service with the request mocked. We found that software engineers relied heavily on end to end tests but almost no one bothered with or trusted the integration tests. The reason came down to the DSL (domain specific language) used to write our end to end tests vs our integration tests.

The DSL for end to end tests used natural language to trigger Alexa functionality. You would just write what the customer would say to Alexa then declare what responses you wanted back. It took a lot of work to support, but our end to end tests were easy to write and caught a lot of bugs. They were hard to debug in a lot of cases but still easier than our integration tests.

The DSL for integrations tests on the other hand was much harder to understand. The sticking point was all the work required to create the inputs and expected outputs for our APIs. As mentioned above each API input included an arbitrary JSON array the types in which were defined in configuration in another service. And as our APIs were very large it was very hard to figure out what you needed in your input request for an integration test.

Our APIs also supported multiple flows through the workflow engine, each of which would receive a different subset of input types. So one functional test against your API would receive a certain subset, the next test would need a different subset of types to trigger a different piece of functionality. For the average software engineer or partner it was very difficult to figure out what exactly you needed to do different in your integration tests compared to another pre-existing integration test.

I addressed the problem by simply not writing integration tests at all. Of course I still had to debug them when I was supporting the pipeline which was quite difficult. Tests were documented via name only. The failure log typically contained only “Test y failed, expected 4 entities, but received only 2“. Which entities are missing? We don’t know. We also didn’t know which entities were passed into the test. Figuring that out requires you to read through several thousand lines across multiple Java classes that setup partial mocking. Another tip is to never run an integration test system that only mocks some of its dependencies.

Five years after the original architectural solution was designed we still didn’t have a good way to write integration tests against our APIs. I originally realized you could just say that the API schema was underspecified, which hopefully makes sense.

It is hard to write a test against an API that doesn’t have a schema and does who knows how many things. Many of which are legacy and conform to architectural ideas no one remembers.

The development environment is extremely important for developer happiness / retention

How much of your life do you want to spend working on development toil? Most of us prefer to write useful code over doing busy work that provides no longterm value. But unfortunately, it is easy to fall into traps where the development environment is bad but fixing it is hard so people just put up with it until they can get another job.

Part of the problem is that Amazon has its own build system. It’s named after a South American country the name of which starts with a B. You might be thinking of Bazel but that is another build system open sourced by Google. B is a different build system.

B has many issues but the one that killed it for me is the integration between B and IntelliJ. Somehow we reached a point where the non-senior engineers on the team couldn’t run unit tests in IntelliJ anymore. I complained about this several times and we never managed to fix it. Eventually the Senior engineers who set everything up all quit and joined other companies. I’d given up on fixing it myself after a few attempts to figure out what the issue was. So there I am making 200k a year and I can’t even get unit tests to run in a debugger.

The moral of the story is use Maven, Gradle or Bazel. I’m not interested in spending significant amounts of my life investigating why a particular java project will not build in IntelliJ. If you want to maintain your own build system be my guest. I’m just not going to work on it.

The people who built that system were very good highly compensated engineers. They just didn’t prioritize the developer environment since there was a lot to do. And the senior members of the team already figured out the warts. The management team needs to focus on the development environment because in the end problems there will manifest in turnover among new hires. If you have a lot of turnover at the one and two year mark take a serious look into your development environment.

Blockchains are the first mega scale software daemons

A software daemon is some bit of software that runs independently of human control. We have been creating them for most of computing history. But the majority are not that important. Nobody knows their names. Nobody really cares that much if they have bugs. If one stops working you restart it and move on with your life.

Cryptocurrency blockchains are also daemons. Each blockchain is just software that performs whatever tasks it is programmed to do. The interesting differences are the scale and the distributed nature of the blockchain. Millions of people contribute hardware to run blockchain software as opposed to a cronjob running only on my laptop.

The Ethereum blockchain is a billion dollar distributed computing platform powering things like DNS names (ENS) and art NFTs. Anyone can submit a ‘contract’ to Ethereum as it is essentially a daemon that runs other daemons. Compared to the log daemon that collects logs on your computer, Ethereum is immensely larger. Restarting your log daemon is essentially a free operation. Restarting Ethereum from zero would cause billions of dollars of losses.

Due to their decentralized nature it is also impossible to ‘stop’ a blockchain software daemon. Ethereum exists on millions of computers and as long as even one of those copies continues to operate Ethereum will continue to exist.

People also care enough about these blockchains that they have names. You have the ‘Ethereum Classic’ and ‘Eth2’ daemons which run very similar software, yet their identity is actually important to humans. Millions of copies of Linux and Windows are running right now, but none of them are important enough to have names.

Why don’t people use functional languages more.

“There are only two kinds of languages: the ones people complain about and the ones nobody uses” — common quote

This is a fun statement if you think about it. Because the languages everyone complains about are obviously better than the ones nobody uses. It seems controversial but a simple thought example shows why it has to be this way.

Imagine a world where the mainstream languages were worse than the niche languages like Haskell or Crystal. In this world Haskell has a clear 2x productivity advantage over Java. Since Haskell is clearly better than Java most projects that use Haskell will execute roughly twice as fast. Double productivity is enough that companies using Haskell will be able to achieve the same results with less money spent on developers and thus achieve higher profits. We should see companies that don’t use Haskell going out of business. Eventually we would expect all greenfield development to happen in Haskell.

You might say well, “Haskell isn’t really twice as good as Java, it is only 20% more productive overall”. Well, 20% averaged out over hundreds of thousands of software engineers is a huge amount of money. My employer would very happily take a 20% increase in productivity if it were on the table.

However, in the real world we don’t see all greenfield development happening in Haskell. Instead lots of greenfield development happens in Java, Javascript or the .Net ecosystem.

There has never been a consensus moment in the industry around niche functional languages. Instead people say they wish they could use them while making actual money writing Java code.

Personally, I would really like to use a better language than Java. Unfortunately, we don’t seem to live in that world.

Wide vs Deep software

Voice assistants are wide software. In the industry we call it the ‘long tail’ of functionality. There are hundreds of ‘tasks’ that your Alexa or Google assistant can perform for you. You probably don’t know that most of them even exist. But not knowing that these tasks exist costs you nothing. The fact that you can buy pizzas via Alexa has no impact on your ability to get news briefs. You can do either without ever engaging with the other.

This ‘long tail’ attribute makes voice assistants extremely wide software. Alexa can do hundreds of things which could be stand alone applications. But the trade off is that voice assistants are extremely vast and don’t do anything particularly well. Over the time the ‘main’ functionalities will be refined and optimized. But Voice assistants will always suffer from the ‘long tail’ problem in that they have extremely wide feature sets.

Wide software spreads across multiple domains. More domains means leaky abstractions and mapping software.

Deep software focuses on a single domain. Perhaps it is an order book or a workflow execution engine. Deep software has a clear purpose and domain. Wide software does everything.

Over time deep software converges on clean abstractions and easy to understand code. Wide software on the other hand is never finished. Wide software is naturally expansive. There is always a reason to add a new functionality to a voice assistant. And in fact there is no real barrier to entry. Adding a new functionality to a voice assistant is a net positive to the system as a whole. The negatives of adding a new domain are already baked in and many customers will enjoy the new functionality.

Deep software can be finished. It can solve a problem in one domain and be done. Hadoop is an example. No one has heard about miraculous developments in HDFS this decade. Hadoop is essentially feature complete and in maintenance mode. In reality development continues, but is it really new stuff or refinements?

Wide software cannot be ‘finished’. Wide software is an infinite sinkhole. Adding more code makes the sinkhole more valuable so more code keeps getting added. There is no real way to ‘solve’ the problems of wide software. You can partition wide software such that each domain exists separately. But if you allow one domain to reference another, now you are back into the pit.

Software Leviathans which I’ve discussed in another post (https://www.sledgeworx.io/software-leviathans/) are wide software. Supporting more domains typically increases the value of the leviathan as a whole. A voice assistant which can order dry cleaning is better than an assistant that can’t order dry cleaning. Overall there isn’t a trade off between the two. One has an additional ‘ability’ with no downside to adding that ability. You would have to make a voice assistant that only handled one domain to escape this constraint.

Wide Software isn’t magical. Wide software does too many things to be incredible at any of them. Since there are countless features the team has to spend a lot of time making sure they don’t break anything. In software leviathans not breaking things is particularly difficult because nobody actually knows what all the features are.

Since wide software is always being pushed to add something new. Energy and design focus are constantly shifted towards new features and problem domains. New domains expect old domains to support new features.

Wide software suffers another problem which is that even if some domains in the project are invested in continually say music for example. Even if music functionality is iterated on again and again. That particular domain being awesome doesn’t change the flavor of the beast. It is still a ball of mud, dirt, rocks, etc.