Skip to main content

Migrating content from Solr to Drupal

Taking data from Drupal to Solr is common-place these days with the well-established Apache Solr and Search API Solr modules. But what happens when you need to take data the other way. This article explores migrating legacy data from a Solr Index to Drupal nodes.

by lee.rowlands /

One of our clients uses a proprietary content repository system internally to manage a significant curated collection spanning both physical and digital assets. This proprietary system includes support for Solr and our client's existing web-presence utilises this, connecting via a custom Zend MVC app.

One of the limitations of the proprietary system is that it is intended for a very specific use case, and while this meets most of their requirements - there are additional elements and functionality that they wish to add to their web-presence to enrich and enhance this dataset that the system propriertor is not willing to consider.

As most of their web-presence except for the shop and search functionality is Drupal powered, it made sense to migrate this data-set to Drupal nodes to allow the collection curators to add this web-required enhanced data. In addition, the client's longer term goal is to replace the Zend MVC apps powering the search and shop functionality with Search API and Commerce respectively. Considering both deal with site users finding and purchasing reproductions of the collection items, creating nodes for each item makes good long-term sense.

So in terms of importing this content from their proprietary system we had a choice between using the source RDMS (Oracle in this case) or the Solr index as the authorative source of the content. Given that connecting to the Solr Index was straight forward and there was a limit to the number of connections allowed to the RDMS, we chose the later.

Enter Migrate Lists

The migrate module includes the MigrateList and MigrateItem abstract classes, which are intended to be used in conjunction with the MigrateSourceList to perform list based migrations. Eg a MigrateList object handles fetching/storing the list of items to be migrated whilst the MigrateItem class handles fetching the individual item.

As these are abstract classes, you need to extend them with specific classes, Migrate ships with MigrateListJSON and MigrateItemJson which do just that - providing a great example of the kind of functionality your classes should implement.

Usage is relatively straightforward

class SomeSolrMigration extends Migration {

  public function __construct() {
    parent::__construct(MigrateGroup::getInstance('some_solr_migration'));

    // Setup description.
    $this->description = 'Migrates and synchronizes Solr documents from some collection';

    // Register dependencies.
    // Taxonomy terms.
    $this->dependencies = array(
      'SomeTermMigration',
    );

    // There isn't a consistent way to automatically identify appropriate "fields"
    // from a solr import, so we pass an explicit list of source fields
    $fields = array(
      'url' => t('Document ID'),
      'title' => t('Title'),
      'call_number' => t('Call Number'),
      'item_description' => t('Item description'),
      // Other fields here.
      // ..
      'some_term' => t('Some Term')
    );

    // The id here is the one retrieved from the solr list class, and
    // used to identify a specific document.
    $this->map = new MigrateSQLMap($this->machineName,
     // Substitute your field here for the unique key.
      array(
        'url' => array(
          'type' => 'varchar',
          'length' => 255,
          'not null' => TRUE,
        )
      ),
      MigrateDestinationNode::getKeySchema()
    );

    $this->highWaterField = array(
      'name' => 'timestamp'
    );

    $item_list_class = new MigrateSolrList('url');
    $item_class = new MigrateSolrItem('url');
    $this->source = new MigrateSourceList($item_list_class, $item_class, $fields);

    $this->destination = new MigrateDestinationNode('destination_node_type');

    // Basics.
    $this->addFieldMapping('title', 'call_number');
    $this->addFieldMapping('body', 'item_description')
      ->arguments(array('format' => 'full_html'));

    // Fields.
    $this->addFieldMapping('field_item_id', 'url');
    // Add other fields from your node type/source.
    // ..

    // Defaults.
    $this->addFieldMapping('uid')->defaultValue(1);
    // Other defaults as required.
    // ..
  }
}

You'll note this uses MigrateSolrList and MigrateSolrItem classes, which are two classes we wrote for our client that extend the abstract MigrateList and MigrateItem classes and handle the Solr specific logic. The key methods your classes need to implement are as follows:

MigrateList::getIdList()

Your method should extend this class to provide the logic to fetch the id list.

MigrateList::computeCount()

This method provides the migration source (an instance of MigrateSourceList) with the count of records to be migrated.

MigrateItem::getItem()

This handles fetching the actual item and takes the item id (from the item list) as an argument. The method returns the actual item - in this case we returned the solr document as fetched from the index.

Summary

Once the migration functionality was complete, the client was able to add new fields to the destination node type and has already begun replacing the search forms with views built using Search Api module.

An added bonus is that because migrate module tracks the source ids and we implemented high-water functionality in our MigrateSolrList::getItemList() method, updates to content on the source system can be synced into Drupal using migrate's update functionality.

The level of extensibility provided by Migrate module is quite remarkable. If you have a large amount of legacy data to bring into Drupal, migrate will surely be able to provide you with a solid foundation to perform the migration.

Posted by lee.rowlands
Senior Drupal Developer

Dated