eBook on WordPress development and deployment

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

Advanced WordPress development

Last weeks I’ve been busy finishing up an eBook with the kind of straight forward title: WordPress DevOps – Strategies for developing and deploying with WordPress It’s a 100 page guide covering how You can get WordPress to work a little bit better in a proper development process, covering automation, testing and deployment.

If you’re interested, click here or on the book cover above to head on over to Leanpub and sign up for it. You’ll get an email as soon as it’s available for download. And yes, it will be available in pdf, epub and mobi formats. When you do sign up, please share your email address with me, I’d love to be able to stay in touch with you.

Mysql Workbench on Ubuntu 14.10

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

As a friendly tip for anyone else who:

  1. Just upgraded to Ubuntu 14.10
  2. Use Mysql Workbench a lot
  3. Use the built in TCP/IP over SSH to connect to remote servers.

There’s a big chance that you saw this:

Selection_001

Problem

In Ubuntu 14.10 the package python-paramiko was upgraded to 1.15. The package is used by Python apps to create ssh connection. In version 1.15 it seems that paramiko has problem with some servers due to a mismatch in what protocols to support. End result is that your MySQL Workbench cant connect using an ssh connection

Solution

Downgrade to python-paramiko 1.10 that was shipped with Ubuntu 14.04.

Howto

Run these two commands in the terminal.

$ wget http://launchpadlibrarian.net/167351511/python-paramiko_1.10.1-1git1build1_all.deb
$ sudo dpkg -i python-paramiko_1.10.1-1git1build1_all.deb
post

WordPress management – looking for beta testers

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

logo2_300x200

One of the projects I’m working on is called Remote Control Panel for WordPress and is getting ready for beta testing. Or to be honest, we’re still in the middle of functional testing so we could still be a couple of weeks away. But we still think it’s time to start recruiting beta testers for this service. Our goal is to launch the beta test first half of February.

So what is Remote Control Panel?

Remote Control Panel is a freemium service that will help users manage multiple WordPress installations. In plain English, that means that our service will:

  • Do continuous backups of your WordPress site, all files and the complete database.
  • Keep backups for a year. Allowing you to restore from any given point in time
  • Keep track of all your plugins and themes and alert you when there’s an upgrade available.
  • Provide a one-click interface to upgrade plugins and themes on all your WordPress sites at the same time.

What will I get for testing it?

Well, we hope that the most important thing we deliver is your next favorite tool for backup and management, but apart from that, all our Beta testers that provide feedback will have the first 12 months free after official launch. Even if we’re planning to offer a free entry level version of the product with limited storage capacity, the premium version will offer more storage and more frequent backups. At the very least, you’ll get $179 (14.95/month) worth of premium backup service for helping us out with the beta testing.

What do I need?

At this point, we’re not picky at all. If you’re the admin of one or more WordPress installations, you’re welcome to apply for the beta testing program.

Where do I sign up?

Head on over to https//remotecontrolpanel.net to sign up for the beta test. Many thanks in advance.

Creating a persistent ssh tunnel in Ubuntu

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

In situations when no VPN is either not available or you just think it’s an overkill to configure an ssh tunnel can be a pretty good alternative. An SSH tunnel works by setting up an ssh connetion between two hosts and use that connection to transport normal network traffic. On one side of the tunnel, the OpenSSH software takes anything that is sent to a specific port and sends it over to the other side of the connection. When the packets arrive on the target side, the OpenSSH software forwards them to the correct local port. Natrually the traffic is encrypted on it’s way, thereby creating a miniature VPN.

Setting up an ssh tunnel is quite straigt forward, for example at the terminal:

ssh -NL 8080:127.0.0.1:80 root@192.168.10.10

Let’s break that down:

  • -N means that SSH should just create the connection and then do nothing.
  • -L means that the local side is the listening side. If you want it the other way around, use -R instead.
  • 8080:127.0.0.1:80 tells ssh that the listening side should listen on port 8080 of it’s localhost interface and the other side should forward it to port 80. So, on the machine that runs this command, anything sent to 127.0.0.1:8080 is forwarded to port 80 on the other side
  • root@192.168.10.10 tells ssh to connect to 192.168.10.10 with username root

There’s really only one problem with this. Whenever the ssh connection breaks the tunnel will break and you have to rerun that command again. Depending on your situation this might be a problem and that’s why autossh exists.

Autossh

Autossh is a tool that sets up a tunnel and then checks on it every 10 seconds. If the tunnel stopped working autossh will simply restart it again. So instead of running the command above you could run:

autossh -NL 8080:127.0.0.1:80 root@192.168.10.10

Note: starting an ssh tunnel with autossh assumes that you have set up public key authentication between the client and server since there’s no way for autossh to ask you for the password. Especially if you want to follow the rest of this article and have the tunnels start automatically. Please read more here  for details on how to setup public key authentication.

Start automatically

Having autossh monitor your tunnels is a great advantage, but it still requires a command to be run whenever autossh itself would die, such as after a reboot. If you want your ssh tunnels to persist over a reboot you would need to create a startup script. Since Ubuntu 9.10 (Karmic) Ubuntu have used Upstart as its main mechanism to handle startup scripts so the rest of this article will discuss how such a script can be created.

I wanted to achieve three things with my startup script. First, I’d like my tunnel to come up when the computer boots. Second, I’d like to be able to keep configuration for multiple tunnels in one file. And lastly, I’d like to be able to control (start/stop) my tunnels without having to kill individual ssh processes.
Three files are created:

First, /etc/init/autossh.conf:

description "autossh tunnel"
author "Erik Torsner "

start on (local-filesystems and net-device-up IFACE=eth1 and net-device-up IFACE=wlan1)
stop on runlevel [016]

pre-start script
NUMHOSTS=$(egrep -v '^[[:space:]]*$' /etc/autossh.hosts | wc -l)
for i in `seq 1 $NUMHOSTS`
do
start autossh_host N=$i
done
end script

Second, /etc/init/autossh_host.conf

stop on stopping autossh

respawn

instance $N
export HOST=$N

script
ARGS=$(head -$N /etc/autossh.hosts | tail -1)
exec autossh $ARGS
end script

And lastly, the config file /etc/autossh.hosts

-NL 8080:127.0.0.1:80 root@host1.example.com
-NL 8080:127.0.0.1:80 root@host2.example.com

Rationale for each file

/etc/init/autossh.conf
This is the main startup script. The first parts of the file sets the conditions that triggers this script to start or stop. I’ve told my script to start as soon as the local filesystem is ready and when mu network interfaces are up. Chances are that you want to modify this to suit your computer, at least the name of the network interfaces.
The actual script part is a simple bash script that checks the contents of a config file to determine how many tunnels that needs to be started. It then starts a numer of autossh_host jobs with an individual id that happens to correspond a the line number. It would have been nice to just start the autossh processes directly from this script, but autossh is a blocking command and only the first line would be executed on startup. Dropping the process to the background using a & character at the end would work but the resulting autossh processes would be out of our controll.

/etc/init/autossh_host.conf
This is the job where the interesting part actually happens. Note that this script doesn’t have any “start on” stanza at all, it’s only ever started when the main job tells it to start. The interesting part is its “stop on” stanza that simply says that it should stop whenever the parent job is stopped. Using this mechanism all child jobs can be controlled via the parent job.

The script part uses a very naive method of picking out the correct line from /etc/autossh.hosts and use that line as the argument for autossh. Note that with this approach, the config file /etc/autossh.hosts can’t contain any empty lines or comments at all. Room for improvement.

/etc/autossh.hosts
This is just a config file with one row per ssh tunnel to create. Each line consists of the entire parameter line sent to autossh. Again, please note that the parsing of this file is very naive and won’t allow for any blank lines or comments at all.
Starting the tunnels
If everything is correctly set up, you should now be able to start your tunnels using:

sudo start autossh

and stopping them with

sudo stop autossh

If you have problems, check out the logfiles under /var/log/upstart. There should be one logfile for each script/instance, so expect to find autossh.log as well as many autossh_host-N.log.

Questions? Did I miss something? Is my solution entierly wrong? Let me know in the comments.

A slow host is a bad host

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

Just a quick post about a discovery I just made. When using the Load Time Profiler plugin to measure load time, it’s possible to compare two hosting environments by looking at the first part of the load process, before any plugins or theme files are loaded. The last point such point measured by the plugin is post_exec_register_theme_directory that is added to the profiling data just before plugins are about to get loaded.

On a really fast host running Apache2 without any PHP enhancements but with plenty of RAM and fast SSD disks, this event is done 18.4 ms into the process. This host happens to be my laptop. But please do note, the load time is measured entirely on the server side, so there’s no advantage of being on the same physical computer.

On a really (really) slow host, a business account on Swedish hosting company Binero, the same exact point is reached after 220 ms(!). I’ve reported this to their support, but they seem to think that it’s completely in order so it’s not a matter of a single misconfigured server. They really are that slow. As a reference, a similar hosting provider, Loopia, is about 25% quicker at 165 ms.

220 versus 18.4, that’s more than a ten times the load time. Actually, when looking at the total load time, including plugins and the theme, the Binero hosted account will sometimes consume more than 10 seconds before the page is sent back to the browser. Yes, we’re looking at moving that site to another home in the near future and being able to profile the load time behaviour was exactly the kind of data point needed to make that decision. With that kind of performance, there’s simply no other quick fix available.

Now, go test your hosting provider and please let me know what you find in the comments below. Download the plugin here.

 

post

WordPress profiler

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

memory_added

This post is part two of a mini series where I (1) explain why keeping track of WordPress load timing is important and (2) discuss how WordPress load time profiling can be done using a newly created plugin. If you want to skip the chatter and go straight to the plugin, you can download it here.

I’ve updated my load time profiling plugin a little bit over the weekend. There are now two new features:

  1. The output will show you memory consumption at each log point
  2. The plugin will trace time spent in each function called via an ‘add_action’ callback.

Lets talk about both these two a bit and why they make sense

Memory consumption

There are a few reasons that you want your WordPress installation to have as low memory consumption as possible.

First of all, there is a limit. Sometimes on some hosts, you will get an error that tells you that your PHP script tried to allocate more memory than is available and it will halt execution. This is not a specific WordPress error, but since WordPress is built on PHP, you’ll get it sometimes. When troubleshooting this type of error, it’s not always obvious what code that is causing the problem. Even if Plugin A uses a lot of memory and pushes memory consumption right to the limit, it will sometimes load just fine anyway. But a little later, when Plugin B is initializing, your total memory consumption may hit the limit and processing will halt. When that happens, finding the problematic piece of code can be very time consuming, not to mention frustrating.

The other reason is that allocating memory pretty much always takes a lot of time, memory allocation is simply an expensive operation. Not that each call to emalloc in itself is all that expensive, but when a script gradually built up memory consumption there have usually been quite a few emalloc calls under the hood. So the mere act of consuming memory also costs a lot of CPU cycles.

The last reason is a matter of scalability. A script that uses 256Mb of memory will occupy the server RAM with… well, 256Mb. If your webserver only have 1024Mb available it can only make room for four 256Mb scripts running at the exact same time. If the script on average takes 2 seconds to finish it means that the maximum requests/sec that your server can handle will be heavily reduced. So you want your WordPress requests to consume as little RAM memory as possible simply because it means that the server can fit more requests into memory at any given point of time.

All the three above reasons makes it important to be able to track how your WordPress installation uses memory. I hope you find it useful.

Hooked functions

One of the features that makes WordPress such a wonderful product is the ability for a theme or plugin to hook into it. In practical terms it means that a plugin or theme can request to be notified via a callback whenever WordPress hits a certain point in it’s execution, in WordPress lingo, such a callback is referred to as an action. For instance, most plugins uses a hook for the Init event that fires early in the startup process and allows a plugin to perform certain initialization before the page starts to render. Plugins and themes can register that they want a callback for the Init event by simply calling the add_action function. More than one plugin can hook up to the same event and when this happens WordPress will just execute them one by one.

A potential problem with callbacks is that all plugins are not written with performance and memory efficiency in mind. On the contrary, some plugins will spend ages doing expensive initialization tasks and consuming a lot of memory for no particular reason at all. For the end user this means that the requested page is delayed until some plugin is done doing unneeded work. One example is the very popular NexGen Gallery plugin that on initialization will load a lot of PHP files into memory and perform some quite expensive initialization work on every single request, regardless if the resulting web page will actually display a gallery.

In the first version of my Load time profiler plugin you could track how long time WordPress took to complete a few of the actions, but it was not possible to see how long each individual callback would take. In this version, I’ve added code to track this so that for instance when the Init action is fired, you can see the time consumed by each individual callback (click image to enlarge)

init_breakdown

One tiny drawback

The traditional way to do profiling would be to run the code in a specific profiling tool that needs to be installed on the same server as your code is running on. In the case of WordPress, the challenge is that many production installations are running in hosted environments where the web developers can’t just go and install debugging and profiling extensions at will. This plugin is really an attempt to overcome this limitation by actually modifying how WordPress is executed in a controlled way. Three core WordPress files are copied to the plugin directory and are modified to (1) collect data on timing and memory consumption and (2) alter the include paths a little to avoid the standard version of those files. Feel free to check the modified files, they are stored under /path/to/plugins/loadtimeprofiler/resources and have the suffix _custom.php

The drawback of this should be fairly obvious, since the code is modified, it’s not the exact same as in production and therefore, the results will differ ever so slightly between what’s reported and reality. So as long as you keep this in mind and use the tool sensibly, I think you can draw some

Download

I swear I will start the process to get the plugin into the official repository at wordpress.org asap, in the meantime click here to download the current version of the plugin.

I’m eager to hear back from your. I’ll answer technical questions or listen to your critique. Whatever feedback you think is appropriate. Use the comments below to let me know what you think.

 

WordPress load time analysis

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

Load time profiler ‹ Walltool — WordPress

UPDATE Dec 8th 2013: I’ve updated the plugin even more. Read about it in part 2.

UPDATE Dec 3rd 2013: While testing the plugin on a few installations, I discovered a couple of bugs. If you downloaded the plugin on Dec 2nd 2013, you may want to try the improved version available below.

I’ve been working a lot with WordPress lately and doing so I’ve sometimes needed to trouble shoot slow WordPress installations. Naturally, I’ve googled for things like “wordpress performance”, “wordpress is too slow” and similar phrases. Almost always, those searches will take me to a page listing the top-10 things you should do to speed up your WordPress website. Those lists of tips are usually quite good, you’ll typically learn that you should install a caching plugin, clean up your database and get rid of unneeded plugins etc. Mostly quite useful information…. but with some important exceptions.

Installing a caching plugin such as W3 Total Cache will often solve most of your performance problems as long as you have mostly a “read only” website. However, if your website is inherently slow when used without any caching, your users will notice it and suffer from it whenever they do anything that can’t be cached as easily, like posting comments or doing custom searches. The most scary example  I’ve experienced was an e-commerce website that was screaming fast as long as you were just browsing products (good) but was terribly slow when adding things to the cart (bad), entering a coupon code (worse) or checking out (flat out terribly bad). In that case, it didn’t matter much that the front page got a 90/100 score on Google Speed since the users had a terrible experience just as they were about to place their orders. So sometimes, a WordPress site need performance beyond what you can get from a good caching plugin.

Maybe it’s time for a little clarification. When people talk about performance it’s common that a few important concepts are mixed up. For the remainder of this post, when I say performance I mean the time needed to create the main HTML document that WordPress generates and that IS the page. Specifically, I don’t talk about the size of or number of other resources (images, css-files, js-files etc) that your page needs. Those are almost always static and the webserver will seldom require the PHP  scripting engine to serve them to your users, and if your webserver actually is too slow, static files can easily be moved to a CDN anyway.

When a user requests a page from (or makes an Ajax call to) your Wordpres site and that request can’t be cached, the entire WordPress system needs to initialize in order to generate the response. For the sake of argument, let’s divide the response generation process into two phases:

  1. Internal WordPress init. In this phase. WordPress will load itself into the PHP process memory, it will set up the database connection and read various options from the database. It will initialize your theme, meaning it loads the theme’s function.php into memory and allow it to register the hooks it want. It will also load the main php-file from all your active plugins and they will also typically register some hooks.
  2. Theme processing When phase 1 is ready, your theme will be called into action. If it’s a standard post or page, Wordpres will typically load your single.php or page.php and let it execute it’s code. That code will call internal WP functions such as ‘wp_head()’ or get_bloginfo() to do it’s job and sometimes, calling an internal function will call back into your theme’s function.php, it all depends on the theme.

During these two phases, you really don’t have a lot of information about what WordPress spends the most time with and that makes it terribly difficult to know what you can do to optimize it. I found this very frustrating so at one point, I started hacking a few of the files in WordPress core in to insert debug information about what it’s doing. Specifically, I was inserting profiling code into wp-settings.php like this:

require( ABSPATH . WPINC . '/version.php' ); // <--- existing line
$profiledata['post_require_version.php'] = microtime(TRUE); // <--- my line

Then in my footer.php, I’d output the content of the $profiledata array into HTML to review it. Naturally, modifying core files like that gets messy quite fast. You can easily break the entire site by accident  and any changes will be overwritten on the next WordPress upgrade. So even if I believe that I was gathering useful information this way, I needed to do it cleaner. So, (since this is WordPress) I wrote a plugin, I now have a working plugin that will do the following using a bunch of Ajax calls from an admin-page:

  1. Copy wp-settings and wp-config into a separate plugin folder
  2. Rewrite wp-settings and wp-config so that they contain profiling code as shown above and some modified path information.
  3. Via a special bootstrap php file, load wordpress using the modified versions of wp-settings and wp-config and at the end, output the profile data using json
  4. Submit the data back to the server for storage, and render an HTML table with all the profiling data.

This is what it looks like:

Load time profiler ‹ Walltool — WordPress

And the complete content of the analysis can be downloaded here (tab separated text)

One interesting thing is that this is not limited to just displaying internal WordPress load time analysis, if you want to, you can add additional profiling statements into your own theme code and have them displayed in the same table. I’d even say it’s the preferred thing to do. One thing that I’ve seen so far is that in a lot of cases, the WordPress internal stuff takes roughly 40-55% of the total load time, so to get the full story of what happens during load time.

I’ve just finished writing this plugin so it’s really quite rough at the moment. It’s not been tested with a lot of WordPress versions and the UI is not at all what I’d want it to be. I’m in the process of adding it to the WordPress repository, should you want to look at it before that, feel free to download it from here:

Download Loadtimeprofiler

Feedback, comments questions? Use the comments below.

 

 

Rackspace and load test automation

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

Well, last workday of this week turned out nice.

I’ve been working with LoadImpact.com for a few months, providing text material for their blog. Mostly hands on technical stuff about load testing, how their tool can be used, and fun things you can find out with a good Load Testing tool at hand.

But this week, one of my posts actually got published on the Rackspace Devops blog. I don’t have the numbers, but I’m suspecting that Rackspace to have quite a decent amount of readers. So Hooray! Not that Rackspace called me personally and begged for it, rather that Rackspace and LoadImpact are working together in other areas, but still, more readers, heh? Anyway, the space available for this first post on their blog was limited, so I almost immediately followed up with some nitty gritty details and a working demo. And yes, there’s code in there.

I other news. Ferrari finished 9 and 10 in Belgium Grand Prix qualification (that’s F1) and I’ve just decided to port an ongoing project from CodeIgniter to Laravel 4. Only things that bugs me about that is that Laravel 4 seem to have more runtime overhead than CI. Expect more on the conversion process in the next few weeks.

Now, time to prepare saturday dinner and a get glass of the red stuff.

 

 

 

 

 

post

WordPress file permissions

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

 

In order for WordPress to be able to install a plugin and plugins or themes automatically there are a number of conditions that have to be met. If all those conditions aren’t met, one-click installations or upgrades won’t happen, instead, whenever you try to upgrade, WordPress will show you the FTP credentials input form. If you’re anything like me, you hate it.

I sometimes run into this problem. My first instinct is to check the obvious file permissions. Will the web server have write access to all the important places. As long as we’re only talking about plugins and themes, important places means the wp-content folder. When I’m certain that the web server have write access, I typically try again and successfully upgrade my plugin.

Every once in a while, installing or upgrading still won’t work even if I’m 100% certain that WordPress should be able to write everywhere it needs to. I end up searching for a solution for about 10 minutes, give up and resort to manually uploading plugins via ssh and get on with my life. Today I decided to find out the root cause of this problem and solve it. Writing this blog post about it servers as much as a ‘note to self’ as assistance to anyone else that trouble shoots this without finding a solution.

The rules

So, the rules for WordPress to be able to install and upgrade plugins and themes:

  1. The web server needs to have write access to the wp-content folder. For example on a Debian based system (i.e Ubuntu), this will be user ‘www-data’, on RedHat/Fedora, it’s typically user ‘httpd’ (plese correct me here if I’m wrong). WordPress will test this by writing a temporary file to wp-content and then remove it. There are plenty of blog posts, howtos and forum posts about this. They usually points back to this article: http://codex.wordpress.org/Changing_File_Permissions
  2. The files in wp-admin needs to be owned by the web server user. WordPress will test this by using PHP function getmyuid() to check if the owner of the currently running PHP script is that same as the owner of the newly created temporary file. If it’s not the same, WordPress will select another method of installation or upgrade.

Rule #2 is what typically gets me. Whenever I move an existing WordPress installation to a new home, I’m sometimes (obviously) not careful with setting file permissions and file ownership and end up in this situation. Rule #1 is extremely intuitive, checking for write permission is close to second nature. But Rule #2,  checking file ownership in wp-admin… well, I’d even say it’s slightly unintuitive. If anything is worth protecting it should be the admin area, and being more restrictive with file ownership and permissions under wp-admin would even kind of make sense.

Anyway. Comments, questions or other feedback. Please post a comment below.

 

[wysija_form id=”3″]

 

Gmail and Google Apps mail migration

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInPin on PinterestShare on Reddit

I’ve been a long time Google Apps user, I think it’s a perfect solution for a smaller company like mine. In fact, it’s a perfect solution for bigger companies as well. Right now, I’m working with a client that wants to consolidate 10 individual Google Apps domains into one single account that handles all the domains. About half of the 15 users have accounts in all the other domains, the other half have accounts in 2-3 of them. Even if this is not the most typical kind of work I’m doing, this assignment brings some welcome new challenges into my current work.

Migration tool requirements

Anyway. One of the client requirements is naturally that all existing email is migrated into the new Google Apps account. Easy peasy right? Well, it turns out that migrating email to and from Gmail (and therefore Google Apps) is quite a challenge due to a number of reasons. Of course, there are plenty of advice on the Internet, but none of the proposed solutions would cover all my needs which are:

  1. The tool must be able to handle XOAUTH on the source.
  2. The tool must be able to handle XOAUTH on the target.
  3. The tool must be able to handle Gmail’s rather special model of treating folders like labels.
  4. The tool must be scriptable
  5. The tool must be able to handle delta changes, running the script a second time should not create duplicates on the target email account.

I’ve looked at plenty of alternatives but the only tools I found that could potentially do the work for me came at a too high cost. The cloud based tools that exist typically charge per user account, that would have been fine if there was a 1-1 mapping between users and accounts. But in my scenario, each user have on average 7 accounts and I had given my client a fixed price for the entire job. So, even if I had loved to try, I can’t afford to lose money on a job.

The two most problematic requirements was to handle xoauth on both ends and to handle the gmail folder/label magic. The two official tools from Google, the migration API and the Google migration tool for Exchange failed. The API only gives you write access, so it’s not possible to get email OUT of a Gmail account using it. The Google Exchange Migration tool assumes that the source server is something other than Gmail and requires you to know the username / password for all source accounts.

A solution… almost

Enter imapsync. Imapsync used to be a free open source tool that is now under a commercial license. But for a mere EUR 50, I bought access to the source code (in perl). Imapsync is able to handle XOAUTH on both source and destination, it’s scriptable and it’s able to use MessageId to keep a kind of state. Running imapsync twice with the same parameters will not duplicate the amount of emails on the target server, more on that later.

The one problem I had with imapsync was the folder vs label management. The problem that most people know of is that Gmail doesn’t really use folder, it uses labels. Even if it’s similar in a lot of cases, there are differences. What I learned is that there’s another issue regarding the concept of subfolders or nested labels. An example

  • Via IMAP, create a folder  named foo. => Gmail creates a root level label “foo”. 
  • Via IMAP, create a folder named foo/bar => Gmail creates the label “bar” nested under the label “foo”.
  • Via IMAP, create a folder named “fuu/bar” => Gmail create the root level label “fuu/bar”.

See the difference? In the last example, you’d perhaps thought that Gmail would create a root level label “fuu” and then a nested label “bar” under it. But nope, Gmail will happily create a label containing the actual IMAP label separator character. Bummer. So the end result is that if you transfer email with imapsync out of the box, you will get a flat structure of really long label names. And that flat list that can grow to be quite long if your’re actively using nested labels. And you don’t want that.

I was pondering a whole lot of various solutions to this problem. I actually got to the point where I tried to migrate the source account to a local IMAP account on my own machine, manipulate the Maildir directly on disk to insert dummy email in strategic places and then migrate to the target account. It worked, but it also introduced a whole new set of moving parts.

The final solution (thanks Dennis)

dennis_filter

It took a long sunday walk with the dog before I realized that the proper solution would be to work with the imapsync source to fix folder creation. As I described above, the cause of the folder / label problem is that Gmail treats things differently depending on the order of folder creation. So, after the initial shock of seeing 5000 lines of Perl code (I don’t consider Perl to be part of my standard toolbox) I got to work and built me a patch. With the patch in place, the folder creation now works as I’d expected it in the first place. The one downside to this solution is that it won’t be able to see the difference between a label on the source Gmail account that actually contains a / (forward slash).

The other thing I with this patch is that it don’t have a switch to tell imapsync if you want the different folder creation behavior or not. I guess that’s needed before I submit it back to the maintainer.

Anyway, this patch assumes that you have imapsync 1.542 even if it’s likely to work well with other versions as well. If you have another version of imapsync and want to work with Gmail migrations, consider upgrading anyway since only 1.542 supports xoauth. On line 2312, replace the existing create_folders function with this modified version:

sub create_folder {
	my( $imap2, $h2_fold, $h1_fold ) = @_ ;
        my(@parts, $parent);

	print "Creating folder [$h2_fold] on host2\n";
        if ( ( 'INBOX' eq uc( $h2_fold) )
         and ( $imap2->exists( $h2_fold ) ) ) {
                print "Folder [$h2_fold] already exists\n" ;
                return( 1 ) ;
        }

        @parts = split($h2_sep, $h2_fold );
        pop( @parts );
        $parent = join($h2_sep, @parts );
        $parent =~ s/^\s+|\s+$//g ;
        if(($parent ne "") and !$imap2->exists( $parent )) {
        	create_folder( $imap2 , $parent , $h1_fold);
        }

	if ( ! $dry ){
		if ( ! $imap2->create( $h2_fold ) ) {
			print( "Couldn't create folder [$h2_fold] from [$h1_fold]: ",
			$imap2->LastError(  ), "\n" );
			$nb_errors++;
                        # success if folder exists ("already exists" error)
                        return( 1 ) if $imap2->exists( $h2_fold ) ;
                        # failure since create failed
			return( 0 );
		}else{
			#create succeeded
			return( 1 );
		}
	}else{
		# dry mode, no folder so many imap will fail, assuming failure
		return( 0 );
	}
}