class DjangoEntityInputReader( AbstractDatastoreInputReader):
'''
An input reader that takes a Django model ('app.models.Model')
and yields entities for that model
'''
def _iter_key_range(self, k_range):
query = Query(util.for_name(self._ entity_kind)
).get_compiler(using="default" ).build_query()
raw_entity_kind = query.db_table
query = k_range.make_ascending_ datastore_query(
raw_entity_kind, keys_only=True)
for key in query.Run(config = datastore_query.QueryOptions(
@classmethod
def _get_raw_entity_kind(cls, entity_kind):
'''
A bit of a hack, returns a table name based on entity kind.
'''
return entity_kind.replace(".models.","_").lower()
To use code above, you would place the above class in your views.py and use the following in your mapreduce.yaml:
- name: My mapper
mapper:
input_reader: myapp.views.DjangoEntityInputReader
handler: myapp.my_mapper
params:
- name: entity_kind
default: myapp.models.MyModel
That's all you need to get mapreduce up and running, but there is an additional problem. Mapreduce uses a property called "__scatter__" to scramble up the entities and assign them to a proper map reduce shard. However, Django does not have the __scatter__ property, so what happens is that all of the entities get assigned to a single map reduce shard. You do not get to enjoy the massive parallelism of mapreduce. In order to make the change, you'll need some code of mine, which I posted here. Feel free to please contact me if you have any questions.