Menu
Stein Eldar Johnsen

In my new job, they have been struggling with a problem since long before I joined: Migrating from a JSON based HTTP API to using Thrift API. And it has so far failed to propagate to core products for more than 2 years.

Last year I got a new job in a company that made a mix of “small” apps on different platforms that basically did the same: Make various platform dependent and independent content available for device personalization. Zedge has made a pretty good business out of providing a platform for a community of users to upload mostly wallpapers and ringtones to a central web-site that makes it available to users all over the world on web and mobile apps. And then expanded upon that theme.

In the beginning the app used simple JSON structures when communicating with the API backend over HTTP. On the back-end side, PHP has offered pretty robust JSON serialization and deserialization natively, and on Android, jackson 1.6 has been used.

But the jackson model objects has been modified with a lot of logic, and the complexity of the model has increased drastically since the beginning, so there was proposed a solution:

  • Move from pure JSON, which has no defined model structure, and a very loose binding to the API and over to thrift, which has a both well structured and typed model definition, and an explicit type-safe API.

But the migration has simply not happened. And the main reason:

  • The difference between the code using undefined JSON over HTTP and defined thrift over a thrift transport is pretty large, and the amount of code that needs to be converted (mostly in one fell swoop) is too much. And the conversion has thus been postponed time and time again.

This is when I got the job of driving refactoring of, code cleanup, expanding testing and testability etc of this Android app. In the beginning I was pretty much for “just doing the conversion”, but with a couple of started runs it became clear: The job really was too large. The main problems were basically this:

  • When the thrift model was designed, it was designed to be more “well behaved” than the current JSON models. This changes some core aspects and assumptions that the code that uses the current JSON model has, e.g. int ID in json, vs string ID in thrift.
  • The way the old API was designed was very hard linked to the fact that it uses HTTP POST requests. But thrift the transport API behave in a very different way.
  • Some parts of the JSON even broke the JSON standard (e.g. assuming map key ordering is preserved), so various hacks had been put in place to enforce these assumptions. And thrift even does not allow for such hacks to be made.
  • The JSON model objects were full of parse-time and run-time logic that had to be separated out in such a way that it could be used with pure data objects.
  • And the case of moving to thrift objects, but keeping the majority of the existing JSON API was blocked by the fact that thrift has no way of parsing “standard” JSON (it can generate such JSON, called “simple” JSON protocol). And it’s default JSON protocol looks horrible.

At this point I decided to take a look at what changes I could do to the thrift java classes and protocols to fix these problems, I got into a very big problem (and an annoyance):

  • The thrift classes are mutable. Which is pretty annoying to design around if the model classes are not trivially small. There are also problems related to thread optimization, but the big issue: Making an architecture that enforces good data model management. But changing the classes to be immutable would break the entire protocol and transport stack provided in the thrift java library.
  • Updating the library with a new protocol that can both write and read standard JSON (called “simple” JSON in thrift), is next to impossible, as the default protocol handling assumes perfect knowledge in the wire-protocol itself, per field, over the data type it is containing (i.e. that it’s a 16-bit int, and not a 32-bit). And now I understand why the thrift JSON protocol looks that horrible. To solve that I have to modify the generated code specifically for this scenario (e.g. generating a protocol “scheme” for each generated class for the simple JSON protocol). But:
  • The thrift compiler is a huge monster of parser code and code generators written in C++. The naming of fields and methods some time didn’t make sense to me until I realized the code seems to have been forced to follow a structure that probably wasn’t meant to be written in modern C++… And it is also not that well documented.

So updating the parser ended up into the “no F**king way” category of preferred tasks. At this point I was cleaning up my gittool project for publishing it on github, and was nearing completion there, and got the brilliant idea:

  • Let’s make a new thrift-java library, IDL parser and code generator from scratch…

And enter the providence project, initially called “thrift-j2”, but renamed when I realized I couldn’t be bothered by convincing the thrift maintainers why my approach should be better, and I’m still not sure it even is from a technical point of view, but I do like it much better.

PS: I released v0.0.1 just today. Though no big fanfare yet: It works, but is probably still riddled with bugs.

-----

This blog post was first published on morimekta.net in march 2016.