Faster Hierarchies with Nested Sets and the Entity Reference Hierarchy module.
In Drupal 7 we used Node Hierarchy module to keep track of a hierarchy of pages. Node hierarchy ties directly to the menu system. When getting a list of all ancestors or descendents, it is a O(n) operation, and at least one site we use it on has a lot of nodes in the tree. Performance was terrible. Add to that it has no notion of revisions or forward revisions, so changing the parent and saving a draft can cause all sorts of issues with your menu. When the time came to update the site to Drupal 8, we took a different approach.
Nested SetsThe performance issues with Drupal 7 Node Hierarchy are due to the data structures being used to store the tree. We decided to dust off the old computer science textbooks, and look up the chapters on tree storage, and see what options we had. Currently the data structure used to represent a tree is a Linked List where the table stores only three values:
This means when finding all descendants of a node, we need to do a query for entries with the node ID stored as the parent node ID. Once we get that, we do another query using that ID as the parent node ID. Wash, rinse, repeat. You get the idea. For very large tables querying for descendents or ancestors is very inefficient, O(n) in Big O notation. Nested Sets represent the data in a different way. They use a table which includes:
- Parent ID
- Weight (optional)
This left and right position represent the set of all children contained within. The following diagram shows how this works.
By Nestedsetmodel.jpg: Sherahmderivative work: 0x24a537r9 (talk) - Nestedsetmodel.jpg, Public Domain, https://commons.wikimedia.org/w/index.php?curid=10979293
- Left Position
- Right Position
- Depth (optional)
For Suits, In the example above, we store a left position of 3 and a right position of 8. Any child elements must have a left position greater than 3 and right position less than 8, as Slacks and Jackets do.
The benefit of storing information in this way becomes obvious when we need to do a query to find all descendents. We just need to query for where left is greater than the node’s left, and right is less than the node’s right. All in a single query, even for thousands of nodes. Thats O(1) in Big O notation, and a massive improvement over O(n).
Updates, on the other hand, are slow. When we need to insert, delete or move a node, we have to potentially update all nodes in the tree. This is obviously an expensive and slow database update. However, given that in our case, this is only done by content editors when making change to the hierarchy, the tradeoff is well worth it, compared to many more times the queries are being made by end users.
Decoupling the Model from the FrameworkWithin PreviousNext our preferred approach to Drupal development is to start by modeling the domain logic in plain old PHP classes, then add Drupal wrappers and integration around it. Commonly known as hexagonal architecture or ports and adapters, this ensures our code is focussed on business rules and is easier to test and maintain. We will get into the details of this in a future post! While thinking about how we could improve on Node Hierarchy for Entity Hierarchy in Drupal 8, we wanted to take the ‘separate the model from the framework’ approach and built a library that just deals with Nested Sets. https://github.com/previousnext/nested-set This library is completely decoupled from Drupal. There is no reference to any Drupal code in the code base. Instead of trying to work with Drupal’s database abstraction layer, (and making all of Drupal a dependency) we chose to use Doctrine DBAL as the database abstraction layer because of the simple API, the code maturity and the community around it. We focussed on using PHP interfaces to decouple implementation, and a high level of testing to have confidence we are keeping data integrity. We then went on to develop the Drupal 8 module for Entity Hierarchy, which requires the nested-set library. In order to provide the DBAL database connection it expects, we wrote a simple factory which takes the Drupal database connection and returns a DBAL one, called DBAL Connection. Entity Hierarchy module provides a new field-type that extends from Entity Reference. To use it you setup a new Entity Reference Hierarchy field on the child bundles and configure it to reference valid parent bundles. For example, you may have a section content type. Under this may live articles and events. To configure this sort of hierarchy, you create a new entity-reference hierarchy field on the article and event content type called Parents and configure it to allow a single reference to a section. The field comprises the standard entity-reference autocomplete and select widgets, but also comes with a weight field which editors can use in a similar fashion to the menu weight, this allows you to nominate child orders in the tree. When you update the child entities by changing the parent or the weight, the entity-reference hierarchy field type takes care to update the nested set. Once you have entered your data and have your entities in a tree structure, you can then use the views integration to filter and order the tree. For example, you could create a view with a contextual filter for 'Is a child of' and limit it to children and grandchildren. You could then embed this view on the parent, taking the entity ID from the URL as the contextual filter value. This would allow you to display children, grandchildren etc on the parent page.
The FutureNow that we have a proof of concept, our goal is to get Entity Hierarchy to a stable release, and have the rest of the Drupal community start using it and providing feedback (and fixes!). To this effect, we've released 8.x-2.0-alpha1 - please take it for a spin and use the issue queue to report any issues you encounter. Looking further ahead, there is no reason this approach could not be used to replace Drupal’s existing Menu and Taxonomy hierarchies too. At present the only formatters in the module just extend the standard Entity Reference ones in core. Our plan is to add a formatter that lets you configure how high in the hierarchy to traverse. This would allow you to have a formatter that showed fields from the root entity in the tree (multiple roots are possible). So returning to the section example, this would allow you to add a 'section image' field to the section, but have that display on any child articles or events, by way of the parent formatter. Follow along with development of that feature in the issue queue. Let us know what you think in the comments!
ThanksThanks to Dan North aka webdrops for kicking off the original work with the field-type and The University of Technology, Sydney for sponsoring the module's development.Co-authored by Lee Rowlands.