Programmatically copy field data in Drupal 8

The code to programmatically copy field data in Drupal 8 is pretty simple, but I wasn’t able to find any great examples for performing the operation at scale when I ran into the need myself. Hopefully the code sample below helps somebody save a few minutes of digging for solutions.

My use case was straightforward: After more than a year of continual development on a client site, new patterns emerged in how content editors utilized certain entity reference fields. It became apparent that similar relationships across different content types used separate fields with different machine names. This made it tricky to aggregate and filter content. It also led to overly complex implementations of any custom functionality based on these relationships. One thing was clear: we needed the fields to be consistent across content types.

Basically, I needed entity references stored in various older fields, let’s say field_old_one and field_old_two, to be copied into a single destination field we’ll call field_new.

Once I’d added our field_new field to all the appropriate content types via the field UI, I needed to migrate the references out of the old fields, and into the new fields. What’s the best way to do this when manual GUI changes would take too long? A hook_post_update_NAME implementation was my answer.

The code to programmatically copy field data in Drupal 8 isn’t the hard part. Assuming your source and destination field types are compatible, it’s just two quick lines:

/* @var $node \Drupal\node\NodeInterface $node */
$node->$dest_field = $node->$source_field;
$node->save();

It gets complicated when you need to do this on hundreds or thousands of entities. We don’t want to load, process, and save all those entities at once as that would likely bring our server down. We need to batch the operation. Implementations of hook_post_update_NAME act as implementations of callback_batch_operation — allowing us to execute a custom callback _my_module_copy_field_values across multiple HTTP requests and avoid PHP timeouts. First, we define that callback.

/**
  * Copies the value from one field to another empty field.
  *
  * @param array $sandbox
  * The batch operation sandbox.
  * @param string $bundle
  * The node bundle.
  * @param string $source_field
  * The source field name.
  * @param string $dest_field
  * The destination field.
  * @param int $nodes_per_batch
  * The amount of nodes to update at a given time.
  */
function _my_module_copy_field_values(array &$sandbox, $bundle, $source_field, $dest_field, $nodes_per_batch = 20) {
  $storage = \Drupal::entityTypeManager()->getStorage('node');
  // Initialize some variables during the first pass through.
  if (!isset($sandbox['total'])) {
    $query = $storage->getQuery()
      ->condition('type', $bundle)
      ->notExists($dest_field)
      ->exists($source_field)
      ->accessCheck(FALSE);
 
    $nids = $query->execute();
 
    $sandbox['total'] = count($nids);
    $sandbox['ids'] = array_chunk($nids, $nodes_per_batch);
    $sandbox['current'] = 0;
  }
 
  if ($sandbox['total'] == 0) {
    $sandbox['#finished'] = 1;
    return;
  }
 
  $nids = array_shift($sandbox['ids']);
  $nodes = $storage->loadMultiple($nids);
 
  /* @var $node \Drupal\node\NodeInterface $node */
  foreach ($nodes as $node) {
    $node->$dest_field = $node->$source_field; // Programmatically copy field data
    $node->save();
    $sandbox['current']++;
  }
 
  $sandbox['#finished'] = min(($sandbox['current'] / $sandbox['total']), 1);
}

In this case, the $sandbox variable is passed by reference to each iteration of the post_update hook. It will loop over this function until $sandbox['#finished'] == 1 — or until the current node operation is equal to the total number of nodes.

In my case, I needed to run this operation on a few different fields across half a dozen content types each with their own mappings — I’ll just use article and post in my example below. To make it a little easier, I wrote the custom callback _my_module_copy_field_values to apply updates on a per content type and field basis. Then I called it from a few implementations of hook_post_update_NAME, one for each content type.

With your custom callback _my_module_copy_field_values defined, you simply call it in your hook_post_update_NAME implementations defined in your MY_MODULE.post_update.php file. Then when you run database updates, your work is done.

/**
 * Migrate Article field_old_one > field_new.
 */
function my_module_post_update_8001_migrate_article_field(&$sandbox) {
  _my_module_copy_field_values($sandbox, 'article', 'field_old_one', 'field_new');
}
/**
 * Migrate Post field_old_two > field_new.
 */
function my_module_post_update_8002_migrate_post_field(&$sandbox) {
  _my_module_copy_field_values($sandbox, 'post', 'field_old_two', 'field_new');
}

Executing your field level changes in hook_post_update_NAME versus hook_update ensures that any schema level changes (which should be in hook_update) are completed before your content level changes — just in case you’ve got some other updates going on or are working with a handful of other developers in the same codebase.

Do you have a simpler way to do this? Feel free to share in the comments.

Drupal Drupal 8 Code

Read This Next