Querying Generic Data with XQuery

Written by: David Boddie, Nokia, Qt Development Frameworks

In the Checking the Weather with XQuery article about using Qt to query and process XML documents, we looked at some of the core classes of the QtXmlPatterns module. We found that, by performing queries on a document, we could get useful information out of an XML document in a form that a simple, specialized XML reader could handle.

This use of XQuery to validate and simplify information for XML readers can make it easier to deal with complex documents, but it seems a little inefficient to read a document, generate XML and read it in again, even if the reading process is much simpler than it would be without XQuery.

Fortunately, we found that the QAbstractXmlNodeModel class cuts out this unnecessary extra step. An instance of this class acts like an XML stream reader or classic event-based XML reader, interpreting the output directly from the query engine as it is generated.

Now that we have the ability to process XML and interpret it in a general way, we now turn our attention to the other side of the QtXmlPatterns module. This article looks at the ability of the query engine to be used with non-XML sources of data and the possibility of applying this feature to models of XML documents in order to visualize the output of queries.

Adapting to New Data

Arbitrary data can be exposed to the query engine via the API defined by the QAbstractXmlNodeModel class. The API is used by the query engine to access data in forms that are familiar to it &endash; as elements, attributes and text nodes &endash; and to navigate within the data, and instances of this class do their best to adapt the data to these constraints.

Most implementations of the API use the QSimpleXmlNodeModel class as a starting point, subclassing it to take advantage of some of its default behavior. The complexity of the functions that need to be reimplemented varies depending on the function and the underlying data structure that is being adapted. An implementation of the API will typically involve the functions listed below.

The first two functions deal with high level information about the data exposed by the node model:

QUrl documentUri(const QXmlNodeModelIndex &nodeIndex) const
QXmlNodeModelIndex root(const QXmlNodeModelIndex &nodeIndex) const

These three functions deal with the identities and types of items, determining how they are treated by the query engine:

QXmlName name(const QXmlNodeModelIndex &nodeIndex) const
QXmlNodeModelIndex::NodeKind kind(const QXmlNodeModelIndex &nodeIndex) const
QVector<QXmlNodeModelIndex> attributes(const QXmlNodeModelIndex &element) const

The next two functions are responsible for passing representations of the values of items (as opposed to their names) to the query engine:

QString stringValue(const QXmlNodeModelIndex &nodeIndex) const
QVariant typedValue(const QXmlNodeModelIndex &nodeIndex) const

The final two functions are responsible for supplying information about the order of nodes and their relative positions in the document, ensuring that relative path operations work correctly on the data:

QXmlNodeModelIndex::DocumentOrder compareOrder(const QXmlNodeModelIndex &nodeIndex1, const QXmlNodeModelIndex &nodeIndex2) const
QXmlNodeModelIndex nextFromSimpleAxis(SimpleAxis axis, const QXmlNodeModelIndex &origin) const

We'll examine each of these in more detail as we implement an example. However, one common theme we can see in the API is the use of QXmlNodeModelIndex objects. Just as Qt's model/view framework uses QModelIndex to refer to items in a model-independent way, this API uses QXmlNodeModelIndex to refer to items in the data structure.

A Simple Example

To show how to reimplement the QAbstractXmlNodeModel and use it, we present a simple example that lets the user search within a custom data structure using queries entered into a text editor. The data structure describes a collection of books, represented by the Books class, each of which is represented by the Book class and holds fields containing title and author information.

ClassesXML representation
Books
    Book
        Field (title)
            Text
        Field (authors)
            Text
    Book
        Field (title)
            Text
        Field (authors)
            Text
    ...
<books>
    <book>
        <title>
            text
        <authors>
            text
    <book>
        <title>
            text
        <authors>
            text
...

Each field is represented by the Field class. Instead of simply storing textual data directly in each field, it is instead wrapped in instances of the Text class. We add this extra level of indirection so that we can create a parent-child relationship between the text and the field containing it and, indeed, between all the instances of these classes. As a result, each of these classes contains a parent member that refers to the object that contains it.

It is useful to examine the data structure and consider how each item of data will be represented using XML concepts, and whether all pieces of data even need to be exposed to the query engine. In this example, we choose to represent each Book object using a book element. We also represent instances of the other classes using suitably-named elements. However, the textual data held by each Text instance will be represented by a text node. The parent of the collection is represented by the document object.

To illustrate the parent-child relationship, we show a Python implementation of the Book class:

class Book:

    def __init__(self, title, authors):

        self.title = Field(u"title", title, self)
        self.authors = Field(u"authors", authors, self)
        self.parent = None

    def index(self, value):

        if self.title == value:
            return 0
        elif self.authors == value:
            return 1
        else:
            raise ValueError, repr(value) + " not a field in this book"

The first point to note is that the parent is initially undefined. However, when a Book object is added to a Books container object, the parent is changed to refer to the container. The Books object sets the node model to be its parent &endash; this is something we will deal with later. We ensure that each Field object has the Book object as its parent by passing self to it when it is created.

The second point concerns the definition of the index() method. This is used to obtain the order of the child items of the book. Here, we define the title field to be the first child and the authors field to be second.

We now examine the XML node model, BookModel that exposes instances of these classes to the query engine.

class BookModel(QSimpleXmlNodeModel):

    def __init__(self, books, namePool):

        QSimpleXmlNodeModel.__init__(self, namePool)

        self.books = books
        self.books.parent = self
        self.cache = {}

The constructor of the class accepts two arguments: a Books object, which we relate to the model by setting its parent attribute, and an instance of QXmlNamePool. This name pool manages the names for a query which we will create later, and all objects related to it need to use the same pool.

    def documentUri(self, node_index):

        item = self._getItem(node_index)
        if item == self:
            return QUrl("books://")
        else:
            return QUrl()

The documentUri() method should return a unique identifier for the document, returning a non-empty URL if asked. Here, we use the convention that self (the model) refers to the document. The _getItem() method we use to convert the node model index, node_index, to an object we can recognize as an item of data.

We define two methods to convert objects to node model indexes and back again. These use a dictionary as a cache, but some Python bindings let you use the model's createIndex() method and the node model index's corresponding internalPointer() method to perform each way of the conversion.

    def _createIndex(self, item):

        self.cache[id(item)] = item
        return self.createIndex(id(item))

    def _getItem(self, node_index):
        pointer = node_index.data()
        if pointer:
            return self.cache.get(pointer)
        else:
            return None

With a way to convert between items and indexes, we can see that the root() method will return a node model index corresponding to the model itself, the shorthand we use for the document. The root is used as an entry point for queries.

    def root(self, node_index = None):
        return self._createIndex(self)
    
    def name(self, node_index):

        item = self._getItem(node_index)
        if not item:
            return QXmlName()

        if isinstance(item, Books):
            return QXmlName(self.namePool(), "books")
        elif isinstance(item, Book):
            return QXmlName(self.namePool(), "book")
        elif isinstance(item, Field):
            return QXmlName(self.namePool(), item.name)
        else:
            return QXmlName()

The name() method does what you might expect, checking the types of each item and returning an appropriate QXmlName that can be used to identify the item in queries. Anything we receive that does not correspond to an element is given a null name.

    def kind(self, node_index):

        item = self._getItem(node_index)

        if isinstance(item, Text):
            return QXmlNodeModelIndex.Text
        elif item == self:
            return QXmlNodeModelIndex.Document
        else:
            return QXmlNodeModelIndex.Element

The kind() method is also fairly simple, returning XML types for the items it is given; this is used to present items of data as elements, attributes, text nodes and the document root. As in many other parts of the API, the caller guarantees certain things about the index it supplies. In this case, it guarantees that it is non-null and that it corresponds to an item in this model, making it simpler for us to implement this method.

    def attributes(self, element):

        return []

For each node model index corresponding to an item represented as an element, attributes() needs to return a collection of indexes that correspond to the attributes of that element. We made a decision not to represent any part of the data structure as attributes in XML. As a result, we can return an empty list &endash; corresponding to an empty QVector.

    def stringValue(self, node_index):

        item = self._getItem(node_index)
        if isinstance(item, Text):
            return item.data
        else:
            return QSimpleXmlNodeModel.stringValue(self, node_index)

    def typedValue(self, node_index):

        item = self._getItem(node_index)
        if isinstance(item, Text):
            return QVariant(item.data)
        else:
            return QVariant()

Although stringValue() is not strictly required in subclasses of QSimpleXmlNodeModel, it is called in preference to typedValue() in some situations. We implement both methods to return values for text nodes.

The compareOrder() method is conceptually simple to write, but can take a bit of thought. We need to return a value indicating whether an item precedes, follows, or is the same as another item in the data structure as presented to the query engine.

    def compareOrder(self, ni1, ni2):

        if ni1 == ni2:
            return QXmlNodeModelIndex.Is

        item1 = self._getItem(ni1)
        item2 = self._getItem(ni2)

        # Check for one of the nodes being the document itself.
        if item1 == self:
            return QXmlNodeModelIndex.Precedes

        if item2 == self:
            return QXmlNodeModelIndex.Follows

        # Check the location of each node within the structure.
        location1 = []
        while item1.parent is not self:
            location1.append(item1.parent.index(item1))
            item1 = item1.parent

        location2 = []
        while item2.parent is not self:
            location2.append(item2.parent.index(item2))
            item2 = item2.parent

        while location1 and location2:

            i1 = location1.pop()
            i2 = location2.pop()

            if i1 < i2:
                return QXmlNodeModelIndex.Precedes
            elif i1 > i2:
                return QXmlNodeModelIndex.Follows

        return QXmlNodeModelIndex.Is

We go to quite a bit of trouble to determine the relative locations of items, compiling a list of their positions within their parents, their parents' positions within their parents, and so on. In some examples, a much simpler implementation of this method works sufficiently well.

The nextFromSimpleAxis() method typically reflects the relationship between items in the data structure as translated into the form of a document containing elements. The axis effectively describes the direction of navigation from the item corresponding to the origin index.

    def nextFromSimpleAxis(self, axis, origin):

        item = self._getItem(origin)
        if not item:
            return QXmlNodeModelIndex()

        return_item = None

        if isinstance(item, Text):

            if axis == self.Parent:
                return_item = item.parent

        else:

            if axis == self.Parent:
                return_item = item.parent

            elif axis == self.FirstChild:

                if item == self:
                    return_item = self.books
                elif isinstance(item, Books):
                    return_item = item.books[0]
                elif isinstance(item, Book):
                    return_item = item.title
                elif isinstance(item, Field):
                    return_item = item.text

            elif axis == self.PreviousSibling:

                if isinstance(item, Book):
                    index = item.parent.books.index(item)
                    if index > 0:
                        return_item = item.parent.books[index - 1]

                elif isinstance(item, Field):
                    if item.name == u"authors":
                        return_item = item.parent.title

            elif axis == self.NextSibling:

                if isinstance(item, Book):
                    index = self.books.index(item)
                    if index < len(item.parent.books) - 1:
                        return_item = item.parent.books[index + 1]

                elif isinstance(item, Field):
                    if item.name == u"title":
                        return_item = item.parent.authors

        if return_item:
            return self._createIndex(return_item)
        else:
            return QXmlNodeModelIndex()

One way to manage all the possible ways to navigate around the document is to check the type of item supplied as the origin, handling text nodes and elements separately, then apply the axis to each of those.

This method is also guaranteed to only be called with certain combinations of argument values, ruling out nonsensical attempts to navigate to the parent of a document or to the child of a text node, for example. These are helpfully documented, making our implementation easier to write.

With the BookModel written, we can now give it some data to use and execute queries on it. The code to set up the model is included within a QMainWindow subclass, and can be seen in the example code available with this article. The interesting part for us is the executeQuery method, which we invoke when an action is triggered in the user interface.

Apart from the initialization of a message handler to deal with error reporting, we can see some familiar classes and objects in use. An QXmlQuery object is created along with an instance of our model. Note that, as mentioned earlier, we pass the query's name pool to the model when it is created.

    def executeQuery(self):

        query = QXmlQuery()
        query.setMessageHandler(self.messageHandler)
        self.bookModel = BookModel(self.books, query.namePool())
        query.bindVariable(u"root", QXmlItem(self.bookModel.root()))
        query_string = u"declare variable $root external;\n" + self.queryEdit.toPlainText()
        query.setQuery(query_string)

Before a query can be run, we need to make sure that it is operating on our model. We do this by binding a query variable, "root", to the index corresponding to the document and prepend a declaration to a query string obtained from a text edit. The user is expected to use $root whenever they want to refer to the document in our model.

        if query.isValid():

            array = QByteArray()
            buf = QBuffer(array)
            buf.open(QIODevice.WriteOnly)

            formatter = QXmlFormatter(query, buf)

            if query.evaluateTo(formatter):
                self.resultBrowser.setPlainText(QString.fromUtf8(array))

            buf.close()

The query is evaluated with the results being passed through a formatter object and written into a byte array. This is displayed in another editor in the user interface.


Looking for titles using the Books example.

The Books example accompanying this article shows a small, fixed collection of books, an output browser and an editor window that you can use to enter queries &endash; press Ctrl+Return or access the Query menu to execute them.

The example includes a few sample queries that can be selected from the Insert menu. Feel free to experiment with these!

Visualizing Queries

Now that we have an understanding of the techniques used to expose data to the query engine, we can do some interesting things. For example, we can create a data structure that holds the contents of an XML document and apply queries to that instead of applying them to the XML itself. This might seem like an exercise in redundancy, but we can also wrap the data structure using Qt's model/view API, and this lets us visualize both the input document and the results of queries.

The image below shows the user interface for the Visualize XQuery example. This example lets you examine which parts of a document are visited by queries and which parts end up in the final output. Like the Books example, this also expects users to use $root to refer to the root of the input document. Because this example is quite complex, we only give a brief overview of the tricks and techniques used to create it and invite the reader to examine the source code.


The Visualize XQuery example showing a query on a weather forecast.

The upper part of the window contains a tree representation of the input document &endash; this uses a custom model, XmlItemModel, to expose the contents of an XML document to Qt's model/view infrastructure, enabling us to display its contents in a QTreeView widget. The lower part of the window contains a QPlainTextEdit widget where the user composes queries to apply to the input document.

The XmlItemModel representing the input document contains a tree of XmlItem objects that is built using an instance of the XmlItemModelBuilder class, a subclass of QAbstractXmlReceiver, a class we described in the previous article. The builder operates on the output of a query that lists the entire input document to produce an XmlItemModel object that can be displayed in a QTreeView widget.

Just as the BookModel in the Books example operates on the Books, Book, Field and Text classes, the XmlNodeModel in this example operates on the tree of XmlItem objects that represent the input document, allowing it to be queried. In fact, the XmlNodeModel and XmlItemModel models operate on the same tree of items. As a result, we can keep track of the items visited by the XmlNodeModel when a query is performed and adjust their properties to make visible changes to the view showing the contents of the XmlItemModel. We darken the background color of each item each time it is visited to distinguish between those items that are visited often and those which are visited rarely, or not at all.

The view in the Structure tab shows the contents of a second XmlItemModel which is created from the results of each query executed on the input document. Again, an instance of the XmlItemModelBuilder class is responsible for building a model for display purposes. The items created by the builder are different to those in the model representing the original document, so we cannot directly map the query results to the parts of the document that were selected. However, we can execute the query again and record the results in an QXmlResultItems container. Each QXmlItem can be traced back to a QXmlNodeModelIndex that may correspond to an item in the input document model. If so, we can highlight the item in the view by changing its properties and updating the view.

Since the query may create completely new elements, resulting in new items and new node model indexes, it is sometimes impossible to say which parts of the input document were transformed into the query results. Although we can visually inspect the input document and say which parts were selected, using the coloring to help us, it is often the case that new elements are created using attributes and values from elements with identical names in the original document. This is the case for some of the sample queries included in the example.

Summary and Further Reading

The two examples given in this article are just two contrived ways of using the features of the QtXmlPatterns module to access non-XML data, and only lightly touch on the concepts surrounding XML. We consider them to be a starting point for further exploration and hope that they are at least helpful when you need to visualize and debug your queries.

The Qt documentation contains some overviews and examples that you may find helpful. General use of XQuery and the QXmlQuery class are described in some detail in the following documents:

These examples show how to expose other kinds of data structures to XQuery:

The XQuery language specification is also a useful document to use as a reference when constructing complex queries.

Source Code

The source code for the examples mentioned in this article is available from the Qt Quarterly Web site: qq34-xquery.zip

No comments

Write a comment

Sorry, you must be logged in to post a comment.