Wednesday, January 8, 2014

Acme Air and Cassandra Part Two

As I mentioned in my last blog post, I had a multi-faceted project over the holidays to try out continuous delivery, Cassandra, and Docker in the context of Acme Air.  In this post, I wanted to focus on the Cassandra aspect.

First, why is this called part two if there is no part one?  There is a part one of this story, but I never blogged about it.  When Jonathan Bond and I were working on the Netflix Cloud Prize sample work, Jonathan implemented a Cassandra set of services for Acme Air using the NetflixOSS Astyanax Cassandra client and Astyanax's JPA like entity persister support.  You can see his cassandra-cli DDL script for the column families, some samples of how he did queries and what the entities looked like by reading through github.

I started by looking at what Jonathan did and decided to recode for the following reasons.  Jonathan was working under a time crunch for the cloud prize work and we had decided to keep the application as portable as possible across the tested data services.  In the past we have implemented back end implementations for WebSphere eXtreme Scale and MongoDB.  The reality is each different back end required specific additions to the entity model of the data returned to the web applications.  WebSphere eXtreme Scale requires the use of PartitionableKey interfaces on primary key classes to co-locate related data in the grid (Users and User Bookings for example).  Mongo required, when using the Spring Data MongoDB support, specific annotations to help the mapping and, when using Morphia, specific deserializers for things like BigDecimal.  Jonathan's implementation tried to come up with a common interface for entities and then implementations specific to each back end, but I think the entity code became the least common denominator vs. the best demonstration of how to use Astyanax/Cassandra.  Next Jonathan worked the problem top down starting with an entity model allowing the Astyanax persister support to create the columns dynamically.  It wasn't clear to us, especially since we never performance or scale tested the final code, if this top down approach functioned well under load.  Finally, I'm the kind of guy who wants to start with the most code possible and then move to higher level abstractions proving that they add value without sacrificing things like performance or maintainability of the tables in production.  In the relational world, I have seen problems come from pure top down mapping approaches of relational to Java objects that were solved with more JDBC like approaches or more sophisticated meet in the middle mappings.

This all said, I started to back up and look for a way to start with a more (in the relational world) JDBC/SQL like approach.  Given I was new to Cassandra, I got myself bootstrapped two ways.  First, I grabbed Nicolas Favre-Felix's Docker Cassandra automation (more on that in my last blog post).  Second, I went through the Datastax Java Developer online course in order to learn Cassandra and how to code to Cassandra in Java.  I highly recommend the course but be aware that it will lead you to quickly embrace Cassandra 2.0.x and CQL3 (vs. Cassandra 1.2.x and Thrift) which eventually caused me problems.  I didn't actually work through the sample application that accompanies the course, but instead planned to apply my learning to Acme Air.  The course took three days off and on and I ended up passing the final exam to get my certification (yeah!).

My education complete, I took the current Acme Air NetflixOSS enabled codebase and branched it to a "Astyanax" branch.  I coded up a static CQL3 DDL script and a bit of the data loader program and ran it.  I quickly ran into the dreaded "InvalidRequestException(why:Not enough bytes to read value of component 0)" error.  I found others confused by this on the Netflix Astyanax Google Group and found that I could add "COMPACT STORAGE" to my table definition and get limping along again.  Later Michael Oczkowski responded to the forum post with great links I should have found that explained why COMPACT STORAGE was needed and what issues I was likely up against using Astyanax (which is based on the Thrift client to Cassandra) against a data model created with the assumption of using the CQL3 protocol.

As I wanted to use Astyanax, I decided that I wanted to move back to Thrift and cassandra-cli created tables.  You can see the "new" Thrift based DDL in github that I created at that point.  You can see the Astyanax code I used to persist to these column families in older code on github.  Note that both of these older versions of code aren't complete or up to date with the final version of the DataStax Java Driver code as I gave up.  I was able to get some of the less interesting entities and queries working, but started to run into issues when I needed the equivalent of "composite keys" in CQL3.  An example was Flight which in CQL3 I define with a primary key of (text flight_segment_id, Date scheduled_departure_time) with a secondary index of text flight_id.  I believe either CQL3 is easier for SQL-historical guys like me to understand or there is less documentation for Thrift for these composite type of keys (or both).  I played around with various Thrift definitions looking at the resulting CQL3 view but could never replicate such a composite key.  I tried to simplify to a case where I was trying to do a primary key that was a partition key of text and then two clustering columns.  I could force it to be something like PRIMARY KEY((String, String) String) or PRIMARY KEY (String, (String, String)), but never PRIMARY KEY(String, String, String).  For what it's worth I gave up vs. pushing my way through total understanding here due to lack of clear documentation on these scenarios for Thrift.  I also later realized I was somewhat confused on how composite key'ed rows were stored.  I assumed that the partitioning and the clustering columns defined the hash of the node for storage.  After reading this article, I realized that my key/value + SQL model was wrong.  Also, I tried to define annotated composite keys in Astyanax as you can see commented in the latest code.  Finally, I really wanted to use the CQL3 syntactic sugar for collections.  I understand such collections are possible with coding approaches with Thrift, but the sugar seemed more palatable with CQL3.

All of the above was really more of a commentary on Thrift vs. CQL3.  However, I did have a few issues unique to Astyanax.

First, there is very little end to end documentation on Astyanax.  The wiki quotes both the Netflix RSS Recipe and Flux Capacitor.  Unfortunately both are very simple single column family examples of Cassandra with little complexity to the primary keys and data modeling.  If you look at the source code both are really simple keyed rows with at most 2 columns with add, query by primary key and remove.  Additionally for composite keys, there is a wiki example of how to query, but not how to store or update.  There is a slightly better example in the Astyanax serializer tests, but the table definition isn't very easy to understand as it was written as a test case vs. a sample application.  I'm guessing (but haven't verified) that the example code in the sample that goes along with the DataStax Java education course would be far more end to end as it would include more complex data modeling with composite keys and secondary indexes in already working code.

Next the Astyanax API is more to abstract the query language to something familiar to Java programmers.  I found, with my previous JDBC/SQL knowledge, the DataStax to be less to learn as I was already pretty familiar with the concepts of getSession, prepareStatement (against a known QL string), bindStatement, execute, and walk rows.  Being able to work more directly with the queries as Strings vs. build them through a Java API fit better with my experience.  I need to get back to Astyanax to see if this sort of pattern is supported (the entity persister has a query language that can be executed with find).

Finally, two parts of the Astyanax API left me with unanswered questions.  I couldn't see how to pass BigDecimal to ColumnListMutations which was easy to do with the DataStax Java Driver via bind(BigDecimal) and Row.getDecimal().  Also, it seemed like I needed to use MutationBatch to update (or initially create) multiple columns in the same row.  It wasn't clear to me if this would perform poorly (the DataStax online education mentioned batch and how it was slower) and if the simpler "INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?)" DataStax Java Driver was more efficient for this scenario.

The above is more of a state in time level of understanding of Thrift and Astyanax vs. CQL3 and the DataStax driver and Cassandra in general.  As mentioned in Netflix's blog, they are working on Astyanax over the DataStax Java Driver and CQL3.  I wonder how much of what I've been focusing on will be handled by this Astyanax update.  Also, as I start to performance test, scale, and operationalize the tables I created, it is very likely that I'll learn there are still some concepts in Cassandra I don't fully understand.  I certainly know I'm not yet using Cassandra enough in the dynamic column and sparse row aspects so my understanding of de-normalization and optimal Cassandra data modeling might not yet be complete.

Based on skills I gathered during the DataStax Java online education and due to some of the fore mentioned issues, I decided to switch horses to use CQL3 fully and the DataStax Java driver.  You can see in the latest github code the CQL3 DDL and the service implementation that makes calls through the DataStax driver.  I have the application fully functional and tested against a three node Cassandra cluster.  I have not yet scaled the data up nor performance or scale tested.  With my experience with Java/JDBC plus the DataStax Java education, I was able to code this in about two days or focused effort.

I think it would be good to have a part three of this adventure into Cassandra with Acme Air.  It would be good to take the new (or even old) Astyanax driver and complete a Thrift as well as CQL3 version based on this completed CQL3/DataStax Java Driver implementation.  I would additionally like to see usability and performance/scale comparisons.  I think the Netflix tech blog does a good job of showing primitive performance comparisons, but it would be interesting to see how the more complex data and query model of Acme Air compares.

As is likely obvious, I was new to Cassandra in this work.  Therefore comments, as always, are welcome.  I'm not sure when I'll get back to this work, but if I do, pointers on Thrift/Astyanax documentation would be welcomed.  Finally, if you have the time and knowledge of Astyanax, I would welcome a OSS port of the working code.

Tuesday, January 7, 2014

Experiments With Docker For Acme Air Dev

Over the holidays, I decided to do a side project to learn a few things:

1) Strategies and technology for continuous delivery
2) Cassandra and Astyanax
3) Docker for local laptop development

I did all of these projects together with the goal of having a locally developed and tested "cloud" version of Acme Air (NetflixOSS enabled) that upon code commit produced wars that could be put into a animator like build or chef scripts and immediately deployed into "production". I wanted to learn Cassandra for some time now and it was good to do in tandem to continuous delivery when using CloudBees for continuous build as the branch for Cassandra could use all openly available (OSS on maven central). For this blog post, I'll be focusing on Docker local laptop cloud development but I thought it would be good for you to understand that I did these all together.  If you're interested in picking up from this work, the CloudBees CI builds are here.

I decided to make the simplest configuration of Acme Air which would be a single front end web app tier (serving Web 2.0 requests to browsers/mobile) connecting to a single back end micro service for user session creation/validation/management (the "auth" service). Then I connected it to a pretty simple Cassandra ring of three nodes.

Here you can see the final overall configuration as run on my Macbook Pro:

This is all running on my Macbook Pro.  The laptop runs vagrant as a virtual machine.  Vagrant runs Docker containers for the actual cluster nodes.  In a testing setup, I usually have five Docker containers (three Cassandra nodes, one node to run the data loader part of Acme Air and ad hoc Cassandra cqlsh queries, one node to run the auth service micro service web app, and one node to run the front end web app).  Starting up this configuration takes about three minutes and about ten commands.  I could likely automate this all down to one command and would suggest anyone following this for their own development shops perform such automation.  If automated the startup time would be less than 30 seconds.

As a base Docker image, I went with the Nicolas Favre-Felix's Docker Cassandra automation.  He put together a pretty complete system to allow experimentation with Cassandra 1.2.x and 2.0.x.  In making this work, I think he created a pretty general purpose networking configuration for Docker that I'll explain.  Nicolas used pipework and dnsmasq to provide a Docker cluster with well known static hostnames and IP addresses.  When any Docker container is started, he forced it to have a hostname of "cassX" with X being between 1 and 254 (using the docker run -h option).  He did that so he could have the Cassandra ring always start at "cass1" and have all other nodes (and clients) known the "cass1" hostname is the first (seed) Cassandra node.  Then he used pipeworks to add an interface to each node with the IP address of 192.168.100.X.  In order to make these host names resolve to these hostnames across all nodes, he used dnsmasq with every 192.168.100.X address mapped to every cassX hostname.  Further, to make non-Docker hostnames resolveable dnsmasq was configured to resolve other hostnames from the well known Google nameservers and the container itself was configured to use the dnsmasq nameserver locally (using the docker -d option).  With all of this working it is easy to start any container on any IP address/hostname with all other containers being able to address the hostname statically.

I'd like to eventually generalize this setup with hostnames of "host1", "host2", etc. vs. "cass1", "cass2".  In fact, I already extended Nicolas' images for my application server instances knowing that I'd always start the auto service on "cass252" and the web app front end on "cass251".  That meant when the front end web app connected to the auth service, I hardcoded Ribbon REST calls to http://cass252/rest/api/... and I knew it would resolve that to  Eventually I'd like to startup a Eureka Docker container on a well known hostname which would allow me to remove the REST call hardcoding (I'd just hardcode for this environment the Eureka configuration).  Further, I can image a pretty simple configuration driven wrapper to such a setup that said startup n node types on hosts one through ten, m node types on hosts eleven through twenty, etc.  This would allow me to have full scale out testing in my local development/testing.

This networking setup gave me a "VLAN" accessible between all nodes in the cluster of 192.168.100.X that was only accessible inside of the Vagrant virtual machine.  To complete the networking and allow testing, I needed to expose the front end web app to my laptop browser.  I did this by using port mapping in Docker and host only networking in Vagrant.  To get the port exposed from the Docker container to Vagrant I used the docker run -p option:
docker run -p 8080:8080
At this point I could curl http://localhost:8080 on the Vagrant virtual machine.  To get a browser to work from my laptop's desktop (OS X), I needed to add a "host only" network to the Vagrantfile configuration:
Vagrant.configure("2") do |config| "private_network", ip: ""
At that point, I could load the full web application by browsing locally to  I could also map other ports if I wanted (the auto service micro service, a JDWP debug port, etc.).  Being able to map the Java remote debug port to a "local cloud" with local latency is game changer.  Even with really good connectivity "remote cloud" debugging is still a challenge due to latency.

So, returning to the diagram above, the summary of networking is (see blue and grey lines in diagram starting at bottom right):

1) Browser to which forwards to
2) Vagrant port 8080 which forwards to
3) VLAN which forwards to
4) Docker image for the front end web app listening on port 8080 which connects to
5) VLAN which is
6) Docker image for the auth service listening on port 8080

Both #4 and #6 connect to which is

7) Cassandra Dockers nodes listening on ports 9042.

I also wanted to have easier access to my Macbook Pro filesystem from all Docker containers.  Specifically I wanted to be able to work with the code on my desktop with Eclipse and compile within an OS X console, but then I wanted to be able to easily access the wars, Cassandra loader Java code and Cassandra DDL scripts.  I was able to "forward" the filesystem through each of the layers as follows (see black lines in diagram starting at middle bottom):

When starting Vagrant, I used:
Vagrant.configure("2") do |config|
  config.vm.synced_folder "/Users/aspyker/work/", "/vagrant/work"
When starting each Docker container, I used:
docker run ... -v /vagrant/work:/root/work ...
Once configured, any Docker instance has a shared view of my laptop filesystem at /root/work.  Being able to "copy" war files to Docker containers was instantaneous (30 meg file copies were done immediately).  Also, any change on my local system is immediately reflected in every container.  Again, this is game changing as compared to working with a remote cloud.  Remote cloud file copies are limited by bandwidth and most virtual instances do not allow shared filesystems so copies need to be done many times for the same files (or rsync'ed).

In the end, this left me with a very close to "production" cloud environment locally on my laptop with local latency for debugging and file manipulation regardless of by networking speed (think coffee shop and/or airplane hacking).  With a bit more automation, I could extend this environment to every team member around me ensuring a common development that was source controlled under git.

I have seen a few blogs that mention setting up "local cloud" development environments like this for their public cloud services.  I hope this blog post showed some tricks to make this possible.  I am now considering how to take this Acme Air sample and extend it to IBM public cloud services we are working on at IBM.  I expect by doing this development cloud spend will decrease and development will speed up as it will allow developers to work locally with faster turn around on code/compile/test cycles.  I do wonder how easy the transition will be to integration tests in the public cloud given the Docker environment won't be exactly the same as the public cloud, but I think the environments are likely close enough that the trade-off and risk is justified.

I need to find a way to share these Docker images and Dockerfile scripts.  Until then, if you have questions feel free to ask.