AWS Athena is a managed big data query system based on S3 and Presto. It supports a bunch of big data formats like JSON, CSV, Parquet, ION, etc. Schemas are applied at query time via AWS Glue.
To get started with Athena you define your Glue table in the Athena UI and start writing SQL queries. For my project I’ve been working on heavily nested JSON. The schema for this data is very involved so I opted to just use JSON path and skip defining my schema in AWS Glue. Of course that was after I wasted two days trying to figure out how to do it.
Setting up AWS Athena JSON tables
AWS Athena requires that your JSON objects be single line. That means you can’t have pretty printed JSON in your S3 bucket.
To use Athena and JSON Path for a nested JSON object like this. You simply define a single column called “data” with a string type and you are done.
Setting up an Athena table for complex JSON object queries.
Then you query using JSON Path like this.
Example of a simple JSON PATH Athena query
SELECT
json_extract_scalar(data, '$.component1.date') as service_date,
json_extract_scalar(data, '$.component2.objects[0].type') as component2_json,
json_extract_scalar(data, '$["component3"]["objects"][0]["xy"]["xy-01"]') as xy_value
FROM "athena_blog_post"
Note that the json_extract and json_extract_scalar functions take a string column as their argument.
One thing to watch out for is that the dash character “-” is banned from the first type of JSON Path. So you might have to use the bracket notation instead.
Using the UNNEST operator to pivot and flatten JSON arrays of objects.
Often when working with JSON data you will have arrays of objects. You might want to split each entry in a nested array into its own row. This can be accomplished with the UNNEST operator.
Using the UNNEST operator to pull data from nested arrays into rows.
Here we have used json_extract to retrieve a nested JSON object. Then we cast it into a json array which the UNNEST operator accepts.
SELECT json_extract_scalar(data, '$.component1.date') as service_date,
json_extract_scalar(components, '$.date') as component_date,
json_extract_scalar(components, '$.type') as component_type,
components,
"$path"
FROM "athena_blog_post"
CROSS JOIN UNNEST(CAST(json_extract(data, '$.component1.objects') as array(json))) as t (components)
People have extended the goal of building micro service systems such that we aren’t locked into a language to a new goal of every repository being able to use any language.
It’s phrased as ‘shouldn’t we use the best language for the job.‘ But the implication is that we should be able to pick a language for this new project independent of the rest of our existing software stack.
The problem is that it’s not a free choice. Even if you’re in an ideal microservices environment where everything is API based, logging and metrics are handled by the platform and shared libraries are banned there are costs to having more languages in play.
The most basic cost is expertise splintering. If you only use one language and everyone knows it. Then everyone can read and extend all the code. But as you start throwing more languages and paradigms in that slowly becomes impossible.
You quickly reach a point where transferring people to a new team is expensive. Transferring from one Java project to another in the same stack is easy. Switching from Java to async JavaScript is trickier. Switching from Java to Haskell takes awhile.
Polyglot makes away team work more difficult. And it does it slowly so you won’t really notice until it’s too late.
Shared libraries are another area polyglot sucks. You can’t always make a library an API.
Whether it’s because you can’t externalize metrics and logging or you have high performance business logic you need to run locally. Often times you will need to duplicate shared libraries across languages.
Polyglot is nice in a lot of ways. I’d rather get to use Pandas for data science if I had the opportunity. We don’t really need to use only Java for everything. But when it comes to application code, I’d rather everyone used the same language, framework and RPC schemas. It doesn’t make any sense to write certain microservices in Java, others in Haskell, some in Python and a few in Typescript. If they are RPC or event based services you are just using five languages to do the same job instead of one language to do one job.
Then go ahead and write sidecars in Go or Rust, frontends in Typescript and data science stuff in Python. Those are all different use cases with different trade offs.
Hey, everyone I’m working on an NFT project. Originally, I wanted to get programmer NFTs for the team at Guava Labs. I wanted to do something like Coders waiting for Compilers, but that didn’t have enough pizazz. Then I was talking to some NFT artists and seeing if we could sell them a website with web3 components to tie into their NFT projects, but none of them were ready to go. So I ended up getting getting an idea and making up a mythos.
Compiler Fighters Club
Synopsis
The compiler fighters are a group of people dedicated to fighting compilers. This band of software developers, engineers, testers and hackers is taking the fight to all enemies of good code. Against them are arrayed the fearsome forces of disorder. The Infinite Bug Army, the Zero Day League, and even the compilers themselves.
The Compiler Fighter Club is an NFT project exploring the meaning behind ‘fighting the compiler’. We hope to create a fun set of collectible NFTs and identities for all the hackers out there.
Portraits
I am trying to figure out a way to let people have NFT characters created based on them. The really big NFT projects so far have mostly relied on computer generating 1000s of NFTs. Generated art is neat, but at this point do we need more computer generated projects? Maybe if someone did a ML GAN style NFT run it would be unique. And why
Characters
Compiler Fighters
The Compiler Fighters are a group of misfit web developers, software engineers, software testers, sysadmins, hackers, and anyone with the will to fight the compiler. Most Compiler Fighters align with the software engineer or IT stereotype. But as Compiler Fighters are a misfit group in the real world as well as in the NFTs we expect to see body builders, metrosexuals and all kinds of crazy characters.
Compiler Fighters should be dressed in attire appropriate for professional programers at work. Since dress codes are rare in the industry that means anything from t-shirts and flip flops to suits. Although, few hackers would be caught dead in a suite.
Compiler Fighters are typically armed with whatever weapons they can salvage from their offices when the forces of disorder attack. This includes nerf guns, pool cues, nerf swords, and anything else hackers are known to carry. Artistic license is encouraged in the armament of Compiler Fighters.
Infinite Bug Army
The Infinite Bug Army is a force of bugs which seek to sow disorder wherever they can. The Infinite Bug army does not include any humans and some people even question if computer bugs are alive at all. Nonetheless they are fearsome foes.
I’ll make sure to include an NFT of the original bug, a moth which got trapped in a relay in old school computers.
Bug Army NFTs can be anything from images of bugs to images of code containing common or meme worthy software bugs.
Zero Day League
The Zero Day League is a collection of software vulnerabilities, bugs and hacks. Examples include Heartbleed (https://heartbleed.com), ShellShock, StageFright, etc. Thematically they should all probably be actual zero days in the initial launch. Meaning serious bugs that were exploited in the wild before being announced publicly such that there were zero days of warning before people were hacked.
Compilers of Doom
The compilers of doom are a force of software compilers which have risen up against the forces of order. They seek to overturn the software universe and end the era of human machine cooperation.
Compilation errors are one area to explore for Compilers of Doom cards. Anyone with coding experience will recognize compiler errors from different programming languages.
Whats next?
I’m having a great time with the concept. I’ll keep building out the mythos and some character ideas. Then we will work with some artists to do an initial release. I’m hoping to launch with some thematic NFTs and of course the Guava Labs team NFTs which was the original inspiration.
I set up Oauth2 authentication with Zapier for Seviipay.com recently and figured it was worth a short post on the gotchas I encountered.
The biggest problem I had was with the Refresh token configuration. I would do a test connection, validate that I wanted to give Zapier access, send a test notification and it would work. Then an hour later I get an email from Zapier that they got a 401 error and are turning off my integration. The issues were that Doorkeeper does not enable refresh tokens by default and that Zapier does not pass the client ID or secret by default to the refresh api.
Below is my doorkeeper.rb with all comments removed. I had to uncomment the line for use_refresh_token.
Doorkeeper.configure do
orm :active_record
resource_owner_authenticator do
current_user || warden.authenticate!(scope: :user)
end
admin_authenticator do |_routes|
current_user || warden.authenticate!(scope: :user)
end
access_token_expires_in 2.hours
use_refresh_token
end
Then in Zapier I just had to update my request options to include the client id and client secret.
One other issue I ran into is that Doorkeeper requires you to set a SCOPE. I just put the string ‘READ’ in for my scope and it seems to be working.