Drupal’s Feeds module is a great tool for importing content into a site. The Feeds UI is relatively simple, and it’s fairly full featured out of the box. One such feature is the ability to assign a unique ID to each item. This allows sequential imports to determine if an item of content has already been imported and how the site should handle that situation. The options here include:
- skipping the import of that row.
- deleting the old version and creating a new version.
- updating the existing version with the new content.
To function correctly this requires some part of the imported data to be unique. When importing from a CSV, this would be a column that will never have the same data twice. In some cases your source data might not contain that unique key. This is a problem when using feeds to update existing content.
For some projects, a unique key can be created by combining parts of each row. Feeds has hooks that allow you to make alterations to the data at varying steps in the import process. It’s worth mentioning that the Feeds Tamper module also provides functionality like this. I had a hard time getting this to work and found that creating a small custom module was a faster solution.
Setting Up Your Importer
First, you need to set up your Feeds importer. This is well documented by Feeds’ documentation, so we can use that to set up an importer as normal. The only addition we’ll have to make is creating our GUID mapping as seen in this example:
Typically, anything in the “SOURCE” column should map to a column in the spreadsheet we upload. Since our spreadsheet doesn’t have a unique column, we can name our SOURCE anything. We will use that name later in the code.
Creating Our New Column
Since Feeds will attempt to use our GUID source when importing our content, we’ll need to alter the data of our CSV after the upload but before it imports anything. Luckily, Feeds has a hook for such a scenario: <a href="http://www.drupalcontrib.org/api/drupal/contributions!feeds!feeds.api.php/function/hook_feeds_after_parse/7">hook_feeds_after_parse()</a>
. This hook accepts two arguments: the first being an object containing data about our imported source and the second being an object containing our source’s data. Our alteration looks like this:
In this hook, we’re first making sure we only affect the correct importer. Then we created an array of keys that will match the column names from our source CSV. After that, we loop over the results and create our new GUID column. Each result (a.k.a. $row
in this example) is an array that is keyed by the name of our source CSV columns. Adding new elements to this array would be the equivalent of adding a new column in our spreadsheet. It’s also important to note that the key is the same name used when defining the GUID in the importer mappings. The only difference is the use of lowercase for the key name.
Our new GUID column is set by a helper function, _mymodule_create_guid()
. This function takes three arguments, two of which are required. The first is the current row’s data ($row
), the second is the array of keys we want to use as our GUID ($key
), and the final is the delimiter between the keys. If nothing is passed in, this defaults to a hyphen.
This function will return a string that will be used as this piece of content’s unique ID. That’s all there is to it.
Fin
The most obvious issue here is ensuring that the combination of columns will always be unique. If you can’t guarantee that, then this wouldn’t make a very good GUID. However, there’s more uses for this hook than just creating GUIDs for your content. You can use this hook to manipulate your data however you want prior to it being imported into your project.