Since I started my career as a back-end developer, I’ve been tormented by different types of migrations. The first one was not Drupal at all, it was eZ Publish CMF and we migrated XML feeds into it. It was based on custom extension and it helped me to establish a rudimentary knowledge of data import. Then came Drupal, and my first migration from Drupal to Drupal, and from custom CMS built on .NET with MSSQL database to Drupal, and migrating from Wordpress and so on. This article is not about how to get start with your first migration, is more like what kind of edge cases can be faced.
In May 2016, Lemberg and GoalGorilla started working on the new version of WIM - platform for municipalities, which was originally created and supported by Dimpact. In September 2016 WIM 2.0 was released. Since the creation of the platform, we’ve been also doing migration from old platform to the new one and this article outlines some specific cases.
As we have a platform, a good idea is to have a core migration containing the main classes and default migrations. So we created wim_migration module that actually does not contain any migrations, only classes that we can use or extend. Most of these classes extend classes from migrate_d2d module. This idea allowed us to have a flexible core, so if we run into an issue during the site migration, we can fix it for all other as well.
Usually, migrating content is pretty simple in Drupal when we have a fields mapping and minimum set of the contrib modules. Things become more interesting when we deal with some of the modules.
Typically, media is used for inserting images into the body field or other text fields. If you check the code you can find remapMediaJson method in DrupalNode7Migration class that, by default, takes care of this, makes it look all nice, until you face unexisting view modes. In our new platform we removed a lot of image styles, some of them were replaced so we added our mapping and fixed other attributes in the following way:
Hmm, actually webforms are well documented and a lot of examples can be found for it, but this remark is a bit different. We all know that modules get updates from time to time, which is nice because we like new features. But what if they contain table alters? This can be a nightmare if you have everything working well, but after migration every required component is lost. On one side we have 7.x-3.24 version, the latest, for now is 7.x-4.14, so after searching by `db_change_field` we can find this:
The fix is pretty easy, but this case is not so common. The best solution would be to update modules on the source site, but this is too idealistic, in the real world nobody will do that.
As mentioned before, the updates can contain table alters, as well as dev versions. Current version for nodequeue is 7.x-2.1 but on sites, they have 7.x-3.x-dev which is no longer in development nor is supported. The difference was that some tables lost or were without the keys. So, for example, nodequeue_nodes table doesn’t have primary keys, nodequeue_queue has a primary key, but there is a string instead of the destination table which is integer, hence it creates a really big mess with the keys. After doing some research and analysis, we’ve found a way to alter the legacy database and add some keys to it.
This function runs while the installation is carried out, so that later we have the correct keys, and are able to proceed with the migration. There is a migration module for nodequeue migrate_nodequeue, but it's built like a wizard and does not fit our case as we run it via drush.
There is a case when all sites contain subdomains that are related to the intranet and they won't be migrated. Some of them contain subsite, so we decided to host each subsite within its own environment and built separate migration for each domain. We set an array of allowed domains in the common arguments. Then, we catch them and extend the query in the constructor. It looks something like this:
There was no DISTINCT at the start and we faced a strange bug when after running the migration it was marked as incomplete. The reason was the query produced duplicate values after execution, this happens especially if you have multiple joins, so be aware and use DISTINCT.
And at Last
As we see, there are always some ambiguous cases when we deal with migration, but this is a powerful process, and during this process not only can we migrate content but also improve something or even fix things. After wrapping up the essential part of core migration, all the effort is put on fixing some site-specific cases, for example, the most popular issue is that content editors insert URLs linked to the internal path or URLs containing the domain. Our best idea on how to go about fixing it is to make some function that replaces all the internal links to an aliased representation and remove the domain for all the external links if they are related to the current site. Also, we should keep in mind that there can be such a specific case when it is easier to write an exception case for one item instead of trying to keep everything in one place. So far, our migration process is going quite well, there are always some challenges and we are ready to tackle them.