Upgrade array syntax the automatic way

by cameron.zemek / 9 May 2014

This post covers upgrading from using the old array syntax to php 5.4 short array syntax using an automatic tool built with Pharborist. Phaborist is a PHP library to query and transform PHP source code via abstract syntax tree operations. The result is more robust and shorter then the alternative of using tokens from token_get_all.

In my previous blog post I introduced using Pharborist for working with PHP source code. In this post Pharborist is used to change the old sytle array syntax array(...) to the new short array syntax [...] . If we were using token_get_all() would have to deal with array keyword (T_ARRAY) as both a type hint for parameters (eg. function test(array $a) ) and as keyword to create an array. With Pharborist this is taken care of already and instead we use tree transversal and manipulation functions to change the source code. Now lets look at the code todo this:

<?php
require_once 'vendor/autoload.php';

// Import the Pharborist classes
use Pharborist\Filter;
use Pharborist\Node;
use Pharborist\Parser;
use Pharborist\TokenNode;
use Pharborist\TopNode;

function processTree(TopNode $tree) {
  /**
   * Tracks if we made a change to the tree. 
   * @var bool $modified
   */
  $modified = FALSE;
  /**
   * Loop over array nodes in the tree.
   * @var \Pharborist\ArrayNode $array
   */
  foreach ($tree->find(Filter::isInstanceOf('\Pharborist\ArrayNode')) as $array) {

Filter::isInstanceOf() takes a class name and generates a callback that matches nodes that are an instance of that class. $tree->find() Finds all nodes in the syntax tree that are matched by the callback and returns a collection of the matches. In this case all array nodes both old array syntax array(...) and the new short [...] array syntax are returned.

    // Test if using old syntax.
    if ($array->firstChild()->getText() === 'array') {

An array node is a ParentNode where the first child will be either a TokenNode of type T_ARRAY or [. And its last child is TokenNode of type ) or ]

      // Remove any hidden tokens between T_ARRAY and ( .
      $array->firstChild()->nextUntil(function (Node $node) {
        return $node instanceof TokenNode && $node->getType() === '(';
      })->remove();

In php comments and whitespace are insignificant to the grammar rules, so its possible to write array /* comment */ (4, 2). Lets remove anything between the array keyword and the ( tokens. nextUntil() matches following siblings until the callback returns true and returns the collection of the matches. Then on this collection remove() is called to remove all these nodes from the syntax tree.

      $array->firstChild()->remove(); // remove T_ARRAY token.
      $array->firstChild()->replaceWith(new TokenNode('[', '[')); // replace ( with [ .
      $array->lastChild()->replaceWith(new TokenNode(']', ']')); // replace ) with ] .

Now remove the array keyword and convert ( to [ and ) to ]. TokenNode constructor takes two required parameters, the token type and the token text. Its worth noting that all leaf nodes of the syntax tree are TokenNodes that correspond to the tokens returned from token_get_all()

      $modified = TRUE;
    }
  }
  return $modified;
}

/**
 * Process a drupal php file.
 */
function processFile($filename) {
  if (substr($filename, 0, strlen('./core/vendor/')) === './core/vendor/') {
    // Ignore vendor files
    return;
  }
  try {
    $tree = Parser::parseFile($filename);
    $modified = processTree($tree);
    if ($modified) {
      file_put_contents($filename, $tree->getText());
    }
  } catch (\Pharborist\ParserException $e) {
    die($filename . ': ' . $e->getMessage() . PHP_EOL);
  }
}

processFile takes a filename to php source code, converts it into a syntax tree with Pharborist and pass it to processTree to be convert the old array syntax to the new short array syntax.

// Find drupal php files.
$extensions = array('php', 'inc', 'module', 'install', 'theme');
$directory = new \RecursiveDirectoryIterator('.');
$iterator = new \RecursiveIteratorIterator($directory);
$pattern = '/^.+\.(' . implode('|', $extensions) . ')$/i';
$regex = new \RegexIterator($iterator, $pattern, \RecursiveRegexIterator::GET_MATCH);
foreach ($regex as $name => $object) {
  processFile($name);
}

This block of code recursively searches a directory for drupal php files.

As you can hopefully see, Pharborist does all the heavy lifting of parsing the source code and then processTree uses the jQuery inspired API to transverse and manipulate the syntax tree. Go here for the full listing of the source code.

Upgrade array syntax the automatic way

Share this post on social media