Wednesday, October 24, 2012

Django + Google App Engine + MapReduce

If you're using Django-nonrel on Google App Engine, mapreduce will not work out of the box. I put a bit of work getting it running. Fortunately, I was not the first. This blog post suggests some code to get you started and allow you to run a mapper on all of our entities.  Unfortunately it only allows you to map app engine entities, not Django entities.  The code below fixes that issue. It works in a similar way, but performs a Django "get" before running the mapper to convert a key into a Django entity. This adds a bit more overhead; one more get per map.

class DjangoEntityInputReader(AbstractDatastoreInputReader):
 An input reader that takes a Django model ('app.models.Model') 
 and yields entities for that model
 def _iter_key_range(self, k_range):
   query = Query(util.for_name(self._entity_kind)
   raw_entity_kind = query.db_table
   query = k_range.make_ascending_datastore_query(
            raw_entity_kind, keys_only=True)
   for key in query.Run(config = datastore_query.QueryOptions(
      yield key, eval(self._entity_kind).objects.get(

 def _get_raw_entity_kind(cls, entity_kind):
   A bit of a hack, returns a table name based on entity kind.
   return entity_kind.replace(".models.","_").lower()

To use code above, you would place the above class in your and use the following in your mapreduce.yaml:

- name: My mapper


    input_reader: myapp.views.DjangoEntityInputReader

    handler: myapp.my_mapper

    - name: entity_kind
      default: myapp.models.MyModel

That's all you need to get mapreduce up and running, but there is an additional problem.  Mapreduce uses a property called "__scatter__" to scramble up the entities and assign them to a proper map reduce shard.  However, Django does not have the __scatter__ property, so what happens is that all of the entities get assigned to a single map reduce shard. You do not get to enjoy the massive parallelism of mapreduce. In order to make the change, you'll need some code of mine, which I posted here. Feel free to please contact me if you have any questions.