Sam Trenholme's webpage
Support this website or listen to my music

Looking at Celery

 

December 20 2011

In today's blog entry, I describe how to have cloud (distributed) computing with Python.

Celery is the technology this blog entry will look at. Celery uses the following components:

  • A daemon, called celeryd that is placed on each node (computer, either physical or virtual) and queries a "broker" to see if any tasks are to be performed.
  • A broker, which can be one of several databases or other data store systems, which is used to store which tasks are to be performed. The client sends a task request to this broker; the celery servers (celeryd instances) get task requests from the brokers and perform them. Once a task is finished, its return value is stored on the broker to give back to the client.
  • The clients, which request celery tasks to be performed.
Installing Celery

Not only does Celery need to be installed (after installing Python, naturally), the broker server has to also be installed (this is an issue Celery's documentation is not clear about). Celery's preferred broker is something called Rabbitmq, written in erlang.

Once Python, Erlang, and Rabbitmq are installed, Celery can be installed.

Using Celery

Once celery is installed, set up a directory for running celeryd which will process tasks. Celeryd uses a file called celeryconfig.py to get configuration. Here is a simple configuration file which uses Rabbitmq on localhost (the same machine/node) as a broker:

BROKER_URL = "amqp://guest:guest@127.0.0.1:5672//"
CELERY_RESULT_BACKEND = "amqp"
CELERY_TASK_RESULT_EXPIRES = 300
CELERY_IMPORTS = ("tasks", )

The "CELERY_IMPORTS" line lists the files that contain tasks that this Celery daemon can process. "tasks" above points to a file called "tasks.py" which contains the code for the actual task to perform. In this example, the tasks is a simple brute-force search for a prime number:

from celery.task import task

@task

# Determine if a number is prime. This is slow -- the point of this
# is a computation that is slow enough it should be handed off to another
# computer or thread

def is_prime(x):
    s = x ** .5 # Square root of x
    if (x % 2) == 0:
        return 0 # Not prime; even number
    q = 3
    while q < s + 1:
        if (x % q) == 0:
            return 0 # Not prime
        q += 2

    return x # Prime

The above code is run on a celery server, and determines whether or not a given number is prime.

Now that the above code is in place, we can have a client that determines the first number that is prime after a given number:

#!/usr/bin/python

from tasks import is_prime

def find_prime(x):
    p = 3 # We can run up to three tasks at once
    results = []

Below, we fill up the task queue by testing the first p (maximum simultaneous tasks run at one time) numbers.

    a = 0
    while a < p:
        results.append(is_prime.delay(x))
        x += 1
        a += 1

Now that the task queue is full, we wait for tasks to finish. If a task is finished and the candidate number, in fact, is prime, we output the prime number and stop spawning new tasks. If a given task is finished and the candidate number is not prime, we use the now-empty task slot to spawn a new task testing the next-higher to see if it is prime

    a = 0
    while a < 2000: # Infinite loop protection
        b = 0
        while b < p:
            if results[b].ready():
                if results[b].result != 0:
                    return results[b].result
                results[b] = is_prime.delay(x)
                x += 1
            b += 1
        a += 1
    return 0 # No prime found :(

That finishes the code looking for a prime number. We now run this code to find the first prime number for a series of power of 10s.

print find_prime(1000)
print find_prime(1000000)
print find_prime(1000000000)
print find_prime(1000000000000)
print find_prime(1000000000000000)

This simple example shows how Celery can be used with Python programs to make programs run across multiple computers, which increases the scalability and speed of Python applications.

While this blog post is copyright 2011 Sam Trenholme, all code contained here is public domain and can be used for any purpose whatsoever. To post a comment about an entry, send me an email and I may or may not post your comment (with or without editing)

Previous entry Next entry Blog index