Jackson and XmlMapper – reading arbitrary data into a java.util.Map

I like the Jackson library from FasterXML. Really handy for reading JSON, writing JSON. Or I should say “serialization” and “deserialization”, ’cause that’s what the cool kids say. And the license is right. (If you need a basic overview of Jackson, I suggest this one from Eugen at Stackify.)

But not everything is JSON. Sometimes ya just wanna read some XML, amiright?

I work on projects where Jackson is included as a dependency. And I am aware that there is a jackson-dataformat-xml module that teaches Jackson how to read and write XML, using the same simple model that it uses for JSON.

Most of the examples I’ve seen show how to read XML into a POJO – in other words “databinding”. If my XML doc has an element named “Fidget” then upon de-serialization, the value there is used to populate the field or property on the Java object called “Fidget” (subject to name remapping of course).

That’s nice and handy, but like I said, sometimes ya just wanna read some XML. And it’s not known what the schema is. And you don’t have a pre-compiled Java class to hold the data. What I really want is to read XML into a java.util.Map<String,Object> . Very similar to what I would do in JavaScript with JSON.parse(). How can I do that?

It’s pretty easy, actually.

This works but there are some problems.

  1. The root element is lost. This is an inadvertent side-effect of using a JSON-oriented library to read XML.
  2. For any element that appears multiple times, only the last value is retained.

What I mean is this:
Suppose the source XML is:

<Root>
  <Parameters>
    <Parameter name='A'>valueA</Parameter>
    <Parameter name='B'>valueB</Parameter>
  </Parameters>
</Root>

Suppose you deserialize that into a map, and then re-serialize it as JSON. The output will be:

{
  "Parameters" : {
    "Parameter" : {
      "name" : "B",
      "" : "valueB"
    }
  }
}

What we really want is to retain the root element and also infer an array when there are repeated child elements in the source XML.

I wrote a custom deserializer, and a decorator for XmlStreamReader to solve these problems. Using them looks like this:

String xmlInput = "<Root><Messages><Message>Hello</Message><Message>World</Message></Messages></Root>";
InputStream is = new ByteArrayInputStream(xmlInput.getBytes(StandardCharsets.UTF_8));
RootSniffingXMLStreamReader sr = new RootSniffingXMLStreamReader(XMLInputFactory.newFactory().createXMLStreamReader(is));
XmlMapper xmlMapper = new XmlMapper();
xmlMapper.registerModule(new SimpleModule().addDeserializer(Object.class, new ArrayInferringUntypedObjectDeserializer()));
Map map = (Map) xmlMapper.readValue(sr, Object.class);
Assert.assertEquals( sr.getLocalNameForRootElement(), "Root");
Object messages = map.get("Messages");
Assert.assertTrue( messages instanceof Map, "map");
Object list = ((Map)messages).get("Message");
Assert.assertTrue( list instanceof List, "list");
Assert.assertEquals( ((List)list).get(0), "Hello");
Assert.assertEquals( ((List)list).get(1), "World");

And the output looks like this:

{
  "Parameters" : {
    "Parameter" : [
      {
        "name" : "A",
        "" : "valueA"
      },{
        "name" : "B",
        "" : "valueB"
      }
    ]
  }
}

…which is what we wanted.

Find the source code here: https://github.com/DinoChiesa/deserialize-xml-arrays-jackson

Hat tip to Jegan for the custom deserializer.