ZpqrtBnk

What becomes of XSLT?

Posted on December 19, 2014 in umbraco

A few days ago, a Pull Request was submitted to Umbraco Core that would have exposed the content of JSON content properties as XML fragments, in the XML cache, so it would have been easier to process them with XSLT. That PR was rejected because, says Niels,

"we don't have plans for further XSLT support moving forward, so we'd rather not add features that we'd need to maintain in the future".

It should not come as a complete surprise. In fact, it was already announced and discussed in a blog post that is... three years old ;-) MS has virtually left the System.Xml.Xsl namespace untouched for years, and at the moment the entire namespace is missing in the .NET Core repository and support for XSLT is still an "if".

I have no idea why MS does not give XSLT more love. All I can do is wildly speculate that in order to achieve good performances, their implementation generates dynamic code and compiles stuff to MSIL, something that has been reported to leak memory in some cases... and maybe they do not want to maintain it any further or have it in .NET Core.

Outside of MS, there is no XSLT open-source processor for .NET that I am aware of.

So, RIP XSLT?

Announcing the death of XSLT would be premature, though. XSLT works in Umbraco today and probably will for some time. There is no conspiracy to deliberately terminate XSLT. As long as it can be supported, it will be. It is just not a first class citizen.

What does that mean?

The short-term issue with the rejected PR has to do with the fact that it introduces XML-cache specific code in places where we would rather not have it. "Not being a first class citizen" means that we would rather not invest time into refactoring it and introducing features that we are not sure we want to support.

But it is not about XSLT, really

What started it all is the addition of new property editors that store their value as JSON, not XML, meaning that XSLT (or XPath queries, for what it is worth) has to switch between XML & JSON contexts, and it is a pain.

So, it is not about XSLT, really. There is nothing wrong with XSLT: it is the content that is becoming an issue. Of course we could decide that all property editors are required to store their value as XML—but it is so much easier to use JSON at JavaScript level. So, JSON it is.

And it goes beyond property editors:

When prototyping the "content variations" for CG14, we struggled with the fact that a node would have "different versions of itself" depending on its variations. We could find ways to fit that structure within the database, but... what about the XML tree?
When thinking about "virtual nodes" or "sub nodes" that would represent "content fragments", and support what complex property editors (Archetype, Vorto, the Grid...) need, again we hit the XML tree limit.

Content today becomes more and more tabular and relational, on top of the old (and still widely used) hierarchical organization. The way Umbraco works at the moment, the XML cache has to match 1-to-1 to the content tree structure, and that is a constraint and a limiting factor.

And then there are the technical issues with having one big XML document, such as concurrency management, generating clones of that document for preview, etc.

Beyond the XML cache

The big idea is to stop using one big XML document as the in-memory cache for Umbraco, and move to a cache offering finer granularity and working at content object level. To reach that goal we need to:

Turn the existing cache into a "service" that could have different implementations. That is a big task because the XML management code is everywhere.
Implement new cache solution(s) eg store IPublishedContent objects in memory, use Examine...

However, most of the content will still follow a tree structure, and could very well be exposed as XML. Not as one big XmlDocument object anymore, but as one XPathNavigator that can be used to navigate the content tree, run XPath queries on top of it... and transform it with XSLT.

Today, property value converters already expose a ConvertToXPath method. The purpose of that method is to expose the property value to an XPathNavigator. So, a property editor could store its value as JSON in the database, and still provide an XML representation of the value. All it needs is to write the proper converter.

What we should end up with, is a content model (content cache) that is free from the "one big XML document" constraint, yet can be exposed as an XML tree where it makes sense.

Getting there

This is all, obviously, work-in-progress. In the meantime, we would rather not touch the current XSLT implementation. This should help understanding why the PR was rejected.

We are currently cleaning up the Core code base so that it stops expecting an XmlDocument everywhere, and learns how to run with XPathNavigator instead. This is a crazy task because XML is everywhere, and it means refactoring some ten years old bits of code. Yet, the development version of that blog site does run on an XML-free version on Umbraco—and that includes running XSLT macros. The plan is to run that blog site on that new cache ASAP.

Stay tuned.

There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.