Software Engineering – Page 12 – Sledgeworx Software

Don’t move to the Cloud to increase CPU utilization

I have worked with a bunch of companies that launched major initiatives to move their hardware onto the public cloud. None of those companies managed to get their CPU utilization over 10%. At my current job we run Java with 10-20% cpu utilization and 90% memory reserved for the JVM. The standard software development approach does not result in amazing hardware utilization rates.

The consulting clients we worked with at my previous job expressed a lot of interest in increasing efficiency. We theorized complex tagging and did proof of concepts with Cloudability. But I do not recall actual savings coming out of it. Although there was a lot of complaining about the AWS bill being high.

One Fortune 500 company I worked with had a dev environment with a huge number of hosts (1000+) that were basically never utilized at all. That company only used continuous delivery in development, not for production deployments. The other obvious issue at that company was their reluctance to rely on AWS Autoscaling groups to handle load spikes. They allocated for peak load despite running in the cloud.

One concern that came up a couple times was that Autoscaling groups have a scaling response rate in the order of minutes. In the event of a traffic spike, it might be 5-10 minutes before extra hosts come online.

If you are worried about unexpected instantaneous peaks write a fallback. Serve a landing page our of cache and sit tight. There is no magic solution to instantly increasing your traffic by 100x without scaling preemptively.

The largest websites in the world scale ahead of time. We know when we will get lots of traffic historically. You know when your Super Bowl add is going live. Scale up a week before hand. Run load tests to make sure you can handle the traffic.

Run your servers at 30-60% utilization. Build a fallback page for big instantaneous peaks. Most importantly know ahead of time what your traffic is going to look like so you can prepare.

The top cause of outages is changing code.

The last week of my oncall shift has been pretty quite. The holiday season has elevated traffic around 50% higher than normal, but I haven’t really noticed. There haven’t been any service outages and it almost feels like I’m not really oncall. Why has it been so quite? We haven’t deployed any code for two weeks. Not deploying code means we aren’t deploying any bugs to production.

After a deployment you will probably notice any defects over the next couple days. Once you have fixed those it is smooth sailing for that version.

Continuous Deployment makes it easy to deploy bugs 10x a day. Agile gives you a justification to deploy 10x a day. Ask yourself, what are you deploying each day? A CSS change to a button? A new option in a drop-down. A re-write of the graphing functionality because no one can understand the current implementation? A new feature like Google Docs integration?

If you could only deploy 1 feature each week what would make the cut?

PlantUML a text based diagramming language

One of the senior engineers at my job is a big fan of PlantUML, so I recommended it to one of the junior guys who needed a diagramming tool. I’ve been taking a look myself since I have never had a goto diagramming tool.

PlantUML is text based language. You can define structs and their relationships with other items. There are a lot of keywords, which can be a bit confusing, but it generates pretty good diagrams.

Here is the text for a system diagram and the image it generates below.

@startuml
actor actor [
  a user
]
database postgres
queue celery 
stack redis

node django [
  Django webservice
]

node worker [
  Turtle Detector
]

boundary boundary [
nginx
]

cloud cloud [
cloud
]

actor --> cloud

cloud --> boundary

boundary --> django

django --> postgres
django --> celery

celery -> worker
worker -> redis
redis --> django
@enduml

Here is the code for a smaller class relationship diagram.

@startuml
class User {
  +customerId : String
  ~submissions : Submission[]
  
}

class Submission {
   -size : int[][] 
   #image : int[][]

}

User <|-- Submission


class TurtleModel {
   ~model : Pytorch.GAN
}
@enduml

Turtle Generator Project Idea

New programmers sometimes ask me “what project should I work on next?”. This project is one I drafted up for myself because I wanted to build a more complex application with Django and pytorch.

The Turtle Generator Project is an attempt to create a website on which people can submit pictures of turtles and vote on whether machine generated turtles are “turtle” or “not turtle”. The user submissions and votes form a GAN or generative adversarial network, both classifying and producing pictures of turtles, although we may use user submissions as part of our dataset of turtles.

Pages / Components

Draw or Submit turtle component

Drawing component where a visitor can use their mouse or touchpad to to draw a turtle and submit it to the turtles dataset.

Vote component

A component consisting of a picture of a turtle generated by our backend algorithms and a button which says “turtle” or “not turtle”.

Architecture

Django frontend + PostgresDB
serves web pages and handles user interaction
votes and turtle drawing submissions are submitted to the MachineLearner system via Celery
submits “turtle image” request to celery queue — gets turtle back

Possibly Reactjs or just django templates

Machine Learner
online machine learning system based on pytorch
takes celery tasks and either
generates a picture of a turtle
adds a vote submission to the training data ( classification )
adds a turtle picture submission to the training data

Celery + Redis
Message queue used to handle queuing training tasks

Creating a good Project README

Project README flies are typically an after thought in the software development process. If a question comes up repeatedly it gets added in an unstructured fashion. This is unfortunate, because the people who need READMEs the most are new engineers who joining the team. They don’t know any of the team’s jargon. They probably don’t have a good understanding of what the project does. And they probably don’t understand the internal architecture of the project.

You want the first part of the README to be an introduction to your project. Answer the question “Why do we have this service?”.

To help new engineers use as little jargon as possible, and define terms in the README.

Include a summary of the architecture of the project in the README. It should cover what abstractions you are using and why you picked the ones that you did. If you use any patterns that are not included in every project at your company make sure to mention them in the README. The last thing you want is for people to take over the project from you, not be able to figure out why you chose these abstractions and then removing them from the codebase.

You README should also include the steps to get the project running. What permissions and credentials do new engineers need to run builds and integration tests? Who should they contact to get those permissions? Make sure to include the common failure cases that new engineers ask questions about.

Include a summary of the typical build process for the project. If you use make, write explanations for every make command you support and when they should be used. If you use a standard build tool like Maven, mention the extensions and plugins you use. “We use the Jacoco Plugin to ensure 80% code coverage, if you add a Spring configuration class you can add it to the ignored list for Jacoco.”

If you have integration or end to end tests in a different package reference in your README. Include an example of typical usage of the external package and expect people to read the README for that package if they run into trouble. Make sure to include common failure cases in the test suite. If external dependencies commonly cause your integration tests to fail, call out how a new engineer can determine that is the case and what they should do in response.

Example Table of Contents for a README

Introduction
Why does this project exist?
Where can I find additional documentation?
Where can I find our CI/CD infrastructure?

Architecture
What is the basic architecture of the system?
MVC, SPA, messaging, RPC
Do we have any managed thread pools?
What are our asynchronous tasks?

What patterns do we use in our codebase?
Explain any unusual patterns you use and why you need them.

How to get builds running
What tools are needed to run builds?
What build commands and flags should a new engineer be using?

How to get Tests running
What tools to use to run unit/functional/integration/end-to-end tests
Are any external packages needed
How to retrieve the external packages
Basic commands for any external packages

How to know if the tests passed or failed.