Wednesday, October 24, 2012

Django + Google App Engine + MapReduce

If you're using Django-nonrel on Google App Engine, mapreduce will not work out of the box. I put a bit of work getting it running. Fortunately, I was not the first. This blog post suggests some code to get you started and allow you to run a mapper on all of our entities.  Unfortunately it only allows you to map app engine entities, not Django entities.  The code below fixes that issue. It works in a similar way, but performs a Django "get" before running the mapper to convert a key into a Django entity. This adds a bit more overhead; one more get per map.

class DjangoEntityInputReader(AbstractDatastoreInputReader):
'''
 An input reader that takes a Django model ('app.models.Model') 
 and yields entities for that model
 '''
 def _iter_key_range(self, k_range):
   query = Query(util.for_name(self._entity_kind)
            ).get_compiler(using="default").build_query()
   raw_entity_kind = query.db_table
   query = k_range.make_ascending_datastore_query(
            raw_entity_kind, keys_only=True)
   for key in query.Run(config = datastore_query.QueryOptions(
                              batch_size=self._batch_size)):
      yield key, eval(self._entity_kind).objects.get(pk=key.id())


 @classmethod
 def _get_raw_entity_kind(cls, entity_kind):
   '''
   A bit of a hack, returns a table name based on entity kind.
   '''
   return entity_kind.replace(".models.","_").lower()


To use code above, you would place the above class in your views.py and use the following in your mapreduce.yaml:

- name: My mapper

  mapper:

    input_reader: myapp.views.DjangoEntityInputReader

    handler: myapp.my_mapper

    params:
    - name: entity_kind
      default: myapp.models.MyModel

That's all you need to get mapreduce up and running, but there is an additional problem.  Mapreduce uses a property called "__scatter__" to scramble up the entities and assign them to a proper map reduce shard.  However, Django does not have the __scatter__ property, so what happens is that all of the entities get assigned to a single map reduce shard. You do not get to enjoy the massive parallelism of mapreduce. In order to make the change, you'll need some code of mine, which I posted here. Feel free to please contact me if you have any questions.

Sunday, September 30, 2012

PACER API with REST Interface Released

I had previously written a short blog entry on my open source PACER API. The open source project is ongoing, but I have recently devoted my efforts to Docket Alarm and its online PACER REST API, which is now substantially complete.

Docket Alarm's API allows users to search for docket information from Federal courts and pull the information using a simple REST interface.  The API has a wide variety of potential applications, especially for due diligence.  For example, an application that assists in originating loans could use the API to automatically look up a potential creditor's bankruptcy or litigation history.

The API can search by name, geographic location, date range and a number of other fields.  Additional fields can be added by request.  Once a search is complete, the API can access the case's docket text and associated meta-data. The meta-data contains fields like the judge's name, all of the party names, and the lawyers associated with each party. Finally, the API allows you to pull individual documents as PDFs.  Put together, it is a relatively complete set of features for a variety of applications.

The API only exposes a small subset of the features the features available on the greater website Docket Alarm.  If requested, additional features can be added.

The API specification is currently live and fully documented. Documentation is located here. If you are interested in using this feature, please let me know.

Tuesday, April 10, 2012

9th circuit rules that violating a website's terms of service is not criminal.
http://ping.fm/8Ei07

Monday, January 23, 2012

U.S. Courts PACER: An Accessible, Open-Source API

Get Access to All Information on the U.S. Courts Docketing System

Anyone who has tried to look up a court case on a government website has run into the Public Access to Court Electronic Records system, or as everyone calls it: PACER. I have developed and just released a new API, that gives programmers access to all public information on the U.S. Federal Courts docketing system.

Features include:
1. Search for cases by party name, docket number, and filing date.
2. Retrieve the names of parties to a case, their attorneys, and law firms.
3. Download the entire docket of a particular case.
4. Download pdfs of individual filings and their attachments.
5. Keep track of costs of each PACER transaction.
Right now, there are hooks into all Federal District Courts, most Appeals Courts, most Bankruptcy Courts and also the I.T.C. I am not aware of any other service or API which offers something similar for the I.T.C.

This project does not make PACER free. It still costs $0.08 per page (which can add up quickly). Although the API works perfectly as stand-alone python, it can plug into Django (or any other python framework) very easily. There are also hooks (and some meager documentation) to make it work on google app engine.

Also note that this project is released under the AGPL, a free and open-source license, but one which requires you to open-source your code if you use it in a program or a web-app.

The project can be found:

I am building a web-service which exposes a REST API to PACER and it will use this open-source API. If you are interested in learning more, let me know.

Tuesday, September 27, 2011

Studying hard? Check out my friend's flashcard app: www.kleio.info

Wednesday, September 14, 2011

Found a mirror for android source despite kernel.org (and android.git.kernel.org) being down: http://ping.fm/MXbba

Friday, September 9, 2011

First time a hacking incident directly effected my productivity... kernel.org is down, no more android source.