Search This Blog

Saturday 1 August 2009

Q&A - Load Balancing Humans


One of our customers, USA Transcription Services, uses WatchDirectory to automatically send work from clients to transcriptionists. Clients upload work to a directory monitored by WatchDirectory and WatchDirectory sends it to the transcriptionist assigned to the client.

Normally each client has a dedicated transcriptionist, but some clients send a lot of work to transcribe - more work than one employee can handle in a reasonable time. What is needed is to assign 2 transcriptionists to such a client, making sure no work is duplicated (done by both).

Lori, my contact at USA Transcription Services, asked if I knew a way to solve this.

Solution Outline

Instead of sending work directly from the directory where clients upload work, we first need to distribute these files into separate directories, assigned to different transciptionists. These directories can then be monitored by another WatchDirectory task to send the work to the individual employees.

Solution 1 - Sort Files

Use the Sort Files plugin to distribute the detected files to employee folders. Create multiple "sort rules", making sure they are all "final" (so a file is only copied to one employee folder).

This solution depends on the names of the files uploaded by the customers. If you can be sure these names are quite random, you can base the sort rules on, for example, the first letter of the filename. The first sort-rule would use a mask like


so it copies all files with names starting with a, b, c, d, e, f, g and h (ignoring case), and the second rule would just use * as the mask - matching all files not handled by the first rule.

One problem: the masks as entered for sort-rule one will also match the file C:\Directory\ZZZZZZZZ.TXT because of the mask "*\d*".
So, in this case, it is better to enter the file masks as *directory\d* (or just *ory\d*).

Another problem, it can be quite hard to guarantee all files uploaded by clients have random names. Maybe better to use masks based on the second character of those names:


This post is not intended to go into the linguistic analysis of filenames but I think the letter 'E' is quite common as the second letter in English filenames

Solution 2 - A Batch File that distributes randomly

This is the solution Lori is using now. If you don't need rocket science precision (exactly half of the files go to directory-1, the other half goes to directory-2), this will work fine. Especially if you are dealing with a large number of files to distribute.
This solution uses the Run a Batch File task to start a script that uses the environment variable %RANDOM% to determine the target directory (and thus the employee that transcribes the file).
Here is the script:

SET TARGET1=C:\Uploads\Employee1
SET TARGET2=C:\Uploads\Employee2
rem get a random number (0 - ~32000)
rem and get the remainder of divide by 2, so we have a number 0 or 1 as the result
rem move the detected file to TARGET

If a client needs 3 employees, the script can be changed to (changes highlighted):

SET TARGET1=C:\Uploads\Employee1
SET TARGET2=C:\Uploads\Employee2
SET TARGET3=C:\Uploads\Employee3
rem get a random number (0 - ~32000)
rem and get the remainder of divide by 3, so we have a number 0, 1 or 2 as the result
rem move the detected file to TARGET

Solution 3 - No Randomness

Random can be a tricky concept... I will let Dilbert Explain. To guarantee an even distribution of files you need to count the files, please see this forum post.

PS: I found the Dilbert comic only on It must be somewhere on the official site as well, but I could not find it. If you have a link to the original pic on, please let me know.

No comments: