compXnonsense: Django + Google App Engine + MapReduce

If you're using Django-nonrel on Google App Engine, mapreduce will not work out of the box. I put a bit of work getting it running. Fortunately, I was not the first. This blog post suggests some code to get you started and allow you to run a mapper on all of our entities. Unfortunately it only allows you to map app engine entities, not Django entities. The code below fixes that issue. It works in a similar way, but performs a Django "get" before running the mapper to convert a key into a Django entity. This adds a bit more overhead; one more get per map.

class DjangoEntityInputReader(AbstractDatastoreInputReader):

'''

An input reader that takes a Django model ('app.models.Model')

and yields entities for that model

'''

def _iter_key_range(self, k_range):

query = Query(util.for_name(self._entity_kind)

).get_compiler(using="default").build_query()

raw_entity_kind = query.db_table

query = k_range.make_ascending_datastore_query(

raw_entity_kind, keys_only=True)

for key in query.Run(config = datastore_query.QueryOptions(

batch_size=self._batch_size)):

yield key, eval(self._entity_kind).objects.get(pk=key.id())

@classmethod
def _get_raw_entity_kind(cls, entity_kind):
'''
A bit of a hack, returns a table name based on entity kind.
'''
return entity_kind.replace(".models.","_").lower()

To use code above, you would place the above class in your views.py and use the following in your mapreduce.yaml:

- name: My mapper

mapper:

input_reader: myapp.views.DjangoEntityInputReader

handler: myapp.my_mapper

params:

- name: entity_kind

default: myapp.models.MyModel

That's all you need to get mapreduce up and running, but there is an additional problem. Mapreduce uses a property called "__scatter__" to scramble up the entities and assign them to a proper map reduce shard. However, Django does not have the __scatter__ property, so what happens is that all of the entities get assigned to a single map reduce shard. You do not get to enjoy the massive parallelism of mapreduce. In order to make the change, you'll need some code of mine, which I posted here. Feel free to please contact me if you have any questions.

compXnonsense

Wednesday, October 24, 2012

Django + Google App Engine + MapReduce

1 comment:

Blogs I Love To Hate

Link Whoring