Java, JSON and/or XML (aka getting my feet wet with JSON)

I did some micro-benchmarking trying to compare Java and JavaScript
(using the Java 6 Rhino implementation) parsing of JSON messages. These
are several hundred results for the Google
Geo location
service, which I though was good
sampling data. First
of all, JSON reduces bandwidth usage by more than 40% over XML use.

Size
in characters
XML JSON
Average 1477 909
Max 8452 5226

In terms of performance, I ran this multiple times, trying to
remember my statistics class.
On my laptop (a Windows/Solaris x86 @ 2.13GHz running Java 6 b90 with
no tuning) :

Implementation #
loc
Time to execute
Java hack 4 15 ms
JSON Java API 14 62 ms
Interpreted JSON
(Java 6 Rhino)
10 422 ms
Compiled JSON scripts
(Java 6 Rhino)
24 781 ms
STaX hack 18 265 ms
XPath 16 2422 ms

My Java hack is the fastest but it’s a hack because it’s based
on
String manipulation (no parsing) and it stops on the first occurrence
of
a given sub-string (I don’t even check for result status in the
message). While this is not elegant code, all four implementations
provide the same results… The JSON Java API is pretty fast and the
difference with the Java hack is probably due to the fact that it does
full parsing of the message.

The STaX
hack simply stops after it finds the data it was looking for (result
code and two coordinates). The XPath implementation uses two simple
XPath expressions (/kml/Response/Status/code
and /kml/Response/Placemark[1]/Point/coordinates)
and a simple StringTokenizer to get the same
data. Be carefull in both cases to use the UTF-8 encoding as set in the
Google response file : .getBytes("UTF-8").

Java 6 Rhino-based implementations are an order of magnitude slower. This
is mainly due to everything remaining interpreted (no compilation to by-code).
Also, this use-case is evaluating each message three times. My implementation
with Compilable, CompiledScript, and Bindings
is slower than using subsequent plain ScriptEngine.eval()
calls. In both cases, the NetBeans
profiler
shows com.sun.script.javascript.RhinoScriptEngine.eval(String,
javax.script.ScriptContext)
calls taking up all
the CPU time (I’ve ran all my tests in the same run, so 8,2%
is really 100% for this specific test).

Searching for HotSpots shows the overhead of setting up compiled
scripts:

Setting up the ScriptContext is costly and
recycling it could be worthwhile (15% gain of
the total time spent talking to Rhino).
Backtraces show the getRuntimeScope method
as being much more costly when invoked from the
compiled version of eval().

The only thing that can be done here is reduce the number of calls to eval().
In the case of XPath, which I though was an elegant way to retrieve
data, the profiler shows 80% spent in JAXP’s
implementation of XPath (and 16% in getting
the data back from the database). Nothing I can do to improve things
here…

Of course, in the context of Google’s GeoCoding service, these
performances numbers need to be put into perspective: I have 1.75
seconds to
process each result and the worse result will still have plenty of time
to complete before my timer triggers the next call.

When dealing with AJAX web applications, having JavaScript
around can justify the use of JSON over XML. JSON also doesn’t
need something like XPath since it’s a JavaScript object easy to
manipulate. XML can be more
verbose, but using XML and proper parsing (SAX, DOM, and now XPath or
STaX) is easy
and part of every JDK. JSON parsing tools (about a 80k jar
file
for JSON Java classes) are not. Maybe (yet another) good RFE for
Dolphin.

But then again with E4X (not part of the Rhino Mustang implementation),
worlds are converging, maybe colliding.

Update 1:: If E4X is what you want, Phobos has full support for this

Update 2:: Google now has a short and simple CSV response format with only return code, coordinates and a new useful “precision” value.

Author: alexismp

Google Developer Relations in Paris.