class DjangoEntityInputReader(
  '''
  An input reader that takes a Django model ('app.models.Model') 
  and yields entities for that model
  '''   
  def _iter_key_range(self, k_range):
      query = Query(util.for_name(self._
                  ).get_compiler(using="default"
      raw_entity_kind = query.db_table
      query = k_range.make_ascending_
                  raw_entity_kind, keys_only=True)
      for key in query.Run(config = datastore_query.QueryOptions(
@classmethod
def _get_raw_entity_kind(cls, entity_kind):
'''
A bit of a hack, returns a table name based on entity kind.
'''
return entity_kind.replace(".models.","_").lower()
To use code above, you would place the above class in your views.py and use the following in your mapreduce.yaml:
- name: My mapper
  mapper:
    input_reader: myapp.views.DjangoEntityInputReader
    handler: myapp.my_mapper
    params:
    - name: entity_kind
      default: myapp.models.MyModel
That's all you need to get mapreduce up and running, but there is an additional problem.  Mapreduce uses a property called "__scatter__" to scramble up the entities and assign them to a proper map reduce shard.  However, Django does not have the __scatter__ property, so what happens is that all of the entities get assigned to a single map reduce shard. You do not get to enjoy the massive parallelism of mapreduce. In order to make the change, you'll need some code of mine, which I posted here. Feel free to please contact me if you have any questions.
 
1 comment:
Hi, yeah, this paragraph is actually pleasant and I have learned lots of things from it concerning blogging. Thanks. Your Domain Name: Password Protect Folder It: Here’s How
Post a Comment