The migrate API works with plugins and stores the configuration for those plugins in a configuration entity. There are a number of plugin types offered: source, process, and destination are the most important. Source merely provides an iterator and identifiers, and most of the time the destination plugins provided by core are adequate, so this article will focus on process plugins.
Process plugins
Nothing gets into the destination unless it is specified under the top level process
key in the configuration entity. Each key under process
is a destination property and the value of it is a process pipeline. Each “stage” of this pipeline is a plugin which receives the output of the previous stage as input, does some transformation on it, and produces the new value of the pipeline.
There are a few plugins which indeed only use the pipeline value as input – for example, the machine name plugin transliterates the input (presumably a human name) and replaces non-alphanumeric characters with underscores. However, if that was all plugins could do they wouldn’t be too useful. Instead, every plugin receives the whole row and the name of the destination property currently being created.
Each stage in the process pipeline is described by an array, where the plugin
key is mandatory and the rest is just the plugin configuration. For example:
process: vid: - plugin: machine_name source: name - plugin: dedupe_entity entity_type: taxonomy_vocabulary field: vid
The above mentioned machine name transformation is run on name
and then the entity deduplication plugin adds a numeric postfix ensuring the vid field of the taxonomy_vocabulary
entity is unique. That is the canonical format of the process pipeline.
However, often only a single plugin is enough:
process: body.format: plugin: migration migration: d6_filter_format source: format
In this case, instead of a list of plugins, just a single plugin is used. Note the dot in body.format
. The system supports the dot notation for source destination; it’s the equivalent of $destination['body']['format'] = ...
.
Finally, very often something just needs to be copied. This could be implemented as:
process: nid: nid langcode: language
That is the rough equivalent of $destination['nid'] = $source['nid']
; the second is $destination['langcode'] = $source['language']
. Internally, both shortcuts demonstrated here get translated to the first canonical format. Here is the langcode: language
notation in the canonical format:
process: langcode: - plugin: get source: language
Any time the source
key is used, the system inserts an additional pipeline stage running the get
plugin. In case you’re wondering what the starting value of the pipeline is, it always starts with NULL
. Most often the first stage will contain a source and get
will provide a value based on the source.
Now let’s see the relevant parts of a process plugin:
namespace Drupal\migrate\Plugin\migrate\process; /** * @MigrateProcessPlugin( * id = "machine_name" * ) */ class MachineName extends ProcessPluginBase { public function transform($value, MigrateExecutable $migrate_executable, Row $row, $destination_property) { $new_value = $this->getTransliteration()->transliterate($value, Language::LANGCODE_DEFAULT, '_'); $new_value = strtolower($new_value); $new_value = preg_replace('/[^a-z0-9_]+/', '_', $new_value); return preg_replace('/_+/', '_', $new_value); }
As this is a plugin, the Drupal\modulename\Plugin\migrate\process
namespace is mandatory and so is the @MigrateProcessPlugin
annotation. These plugins typically extend ProcessPluginBase
. There is only one method specified by the interface: transform
. The $value
is the current value of the pipeline and the return value will be replaced by it. While this particular plugin only depends on $value
, it would seriously limit the usefulness of the system if the other parameters were not available. Thankfully, it is possible to peek at other values in the row: $row->getSourceProperty($property)
gets other source values. Already calculated destination values are available too by using: $row->getDestinationProperty($property);
. The plugin configuration is available as the array $this->configuration
. For example, if you look back at the second example, $this->configuration['migration']
would be d6_filter_format
.
Handling Lists
All the examples above are simple scalars: there’s one body and it has a single format. Every vocabulary has a single identifier. And so on. Sometimes the value of a source property will be an array. There are two kinds:
- A simple list of scalars. Typically, these are strings for the permissions in a role, the recipients of a contact category, etc. The system automatically handles these: if the source is a single property, and yet the value is an array, then the system will iterate this array and call the pipeline for every single value. So the process plugin doesn’t need to handle this case itself, it can just transform scalars. However, if a plugin actually wants to handle arrays – we will see an example in a minute – it can easily do so by adding
multiple = TRUE
to the annotation. - A list of arrays. For example, the filters of a text format. Every filter has a module, a delta, settings, etc. For this case, we have an
iterator
plugin which, as the name suggests, iterates over the value and runs a process pipeline for every property of the current filter.
This essentially makes the process subsystem recursive with the usual advantages and disadvantages of recursion: it’s harder to understand but insanely powerful.
There are two more features I’d like to draw attention to; both are in the key: @id
row. One, the iterator plugin, can change the key – the source simply gives us a list of filters, but Drupal 8 expects the filter plugin ID to be the key. Second, the @id
notation is usable not just for the key
but also as a value of any source key. Specifically, @id
notion means: use the already calculated id
destination value for this. And it works because the key
is calculated last.
Image: ©iStockphoto.com/jimmyanderson
Comments
What in the world are those dashes?
YAML needs to tell apart the stages of the pipeline. In PHP you are looking at array(array('plugin' => ...), array('plugin' => ...))
Thanks!