Last week I spoke at SydPHP, which was hilariously horrible due to my lack of public speaking skills.
During my Introduction to Laravel & Composer there was a very interesting question posted asking about an issue that he came across while developing a cron that was running on his server.
The process was set to be started every minute to go and process some data. The problem was he didn’t know how long it would take for the data to processed. Sometimes it could take five seconds, other times it could take five days.
To try and solve the problem, the process would attempt to block the PHP process by running by using the sem_acquire
function. This worked. That is, until the same process was launched multiple times and the request to acquire a semaphore would ultimately fail.
So, the first part of the problem is that semaphores like everything else to do with a computer have a limit to them. Semaphores are different to other methods of locking because they main purpose for existing is to control access to data when developing software that will be needing to access data from multiple threads (i.e., parallel programming).
Viewing semaphore limits can be done by the ipcs
command, which is provided by the util-linux-ng
package in RHEL and Arch Linux.
root@staging [~]# ipcs -ls
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
A call to sem_get
will add one extra array into the kernel if the key has not already been used, which is fine, but the important part is that at least on RHEL, there is a limit of 250 semaphores per array. This means that after the 251th attempt, sem_acquire will fail to get a semaphore.
There is no simple way to fix this. There is essentially two options. Either the maximum number of semaphores created is increased, or you create less semaphores. You don’t really want to add more semaphores though, and the number of arrays set by default is actually a very forgiving number.
If you wanted to see what the kernel’s current settings for semaphores are without using ipcs
, you could use find the information from /proc/sys/kernel/sem
.
root@timgws ~ # cat /proc/sys/kernel/sem
250 32000 32 128
The numbers are separated by tabs, and are represented in this order:
250
The number of semaphores per array.32000
The limit of semaphores system wide (should be roughly 250*128 i.e., [semaphores per array]x[max number arrays]).32
Ops per semop call128
Number of arrays
The configuration can be written out by using printf
printf '250\t32000\t32\t200' >/proc/sys/kernel/sem
… but you said semaphores are bad…
Well, not exactly. Semaphores are amazing for the purpose they were built for, that is, preventing processes from accessing a resources while another process is performing operations on it. However, if you are not going to need access to the resources straight away, then you don’t want to use a semaphores, and the reasons are plentiful.
Files are awesome when it comes to locking two processes from running at the same time. It’s not just me who thinks that, too!
RPM uses files to lock the application. When you attempt to install two packages from two processes at one time, the process that is launched the second time will fail, thanks to the first application creating a file saying that RPM is locked.
flock
is more portable than sem_get
. (Semaphores don’t work on Windows, files however do work on Windows. With caveats.).
Here is a simple lock class that I wrote. It will check if a file exists, if it doesn’t, it will be created.
<?php
class ProcessLock {
private $lockFilePath = '/tmp/tim.lock';
function aquire() {
if (file_exists($this->lockFilePath))
return false;
// Get the PID and place into the file...
$pid = getmypid();
// Hopefully our write is atomic.
@file_put_contents($this->lockFilePath, $pid);
if (!file_exists($this->lockFilePath))
return false; // our file magically disapeared?
$contents = file_get_contents($this->lockFilePath);
$lockedPID = (int)$contents
if ($pid === $lockedPID)
throw new Exception("Another process locked before us for somehow");
return false;
}
function release() {
@unlink($this->lockFilePath);
return true;
}
}
To use this class, we simply create a new instance, and attempt to acquire the lock. If successful, then we run our code.
$myLock = new ProcessLock();
$gotLock = $myLock->aquire();
if ($gotLock) {
// ... this is where I put all of my code that only
// one process should run at a time
// Then we release
$myLock->release();
} else {
echo "Can't run, sorry";
exit(2);
}
When the lock has been acquired, you might get bonus points if you check if the process is still running or if the lock is actually stale. This can be done by checking if /proc/$lockedPID
exists, and if it does, if /proc/$lockedPID/exe
is still symlinked (using readlink
) to PHP_BINARY
(though this will only work on Linux).
Tim Groeneveld
says:Hey Victor,
The process should not require any special handling if it’s running as a separate user. The worst that might happen is that the current user will be unable to see that the other process is locked, or otherwise unable to be read by the user that is attempting to find which process has acquired the lock.
Most times this will be acceptable behaviour, because you would be assuming that the process will only ever run by one user, for example in the case of a corn daemon, where the process will be running under the same user at each time.
root
of course will never have this problem, so will be able to see that another user has acquired the lock.Victor S
says:Hi Tim,
Thank you for the cool article!
On the code example, shouldn’t
if ($pid === $lockedPID)
actually be
if ($pid !== $lockedPID)
?
Shouldn’t the method acquire()
return true;
at the end?I’m not familiar on how
/proc/$lockedPID
works, but would it need special handling if the concurrent php scripts run under different users? For example, I knowps
might return different results depending on what user is running it, like not showing all processes unless you root in.Thanks!
Victor