Scientific computing with GAE and PiCloud

Google App Engine (GAE) is a great platform for learning web programming and testing out new ideas. It is free and offers great functionality, such as Channel API (basically Websockets). Deployment is as easy as clicking a button (on a Mac) on running a Python script (on Linux). The best of all is that you can program in Python and offer an easy end-user web interface without time consuming installation, dependencies and nerves.

The largest drawback of GAE for me is that it does not allow for long running processes in a straightforward way nor spawning new processes. This is no go when you want to provide users with some data from complex online analyses or your simulations.

PiCloud to rescue

This is where PiCloud enters the stage. PiCloud is nothing more than an user-friendly interface to Amazon cloud. Starting your Python program is as easy as: import cloud; cloud.call(yourfunction). Of course, because it runs on the cloud, you can create hundreds or thousands of copies of your function to achieve a performance boost (if you have heard of distributed or parallel computing you will know what I have in mind).

However, the most important fact for us now is that PiCloud allows to spawn long-running, CPU-intensive processes. This is what PiCloud is really for!

From theory to practice

Let us say that we want to implement an artificial intelligence game in which you control agents by means of Python programs (something like Google AI Contest). The details are not important. In scientific world this would be probably an application to search for new drugs based on known chemical compounds (such as DrugDisovery@Home) or a simulator of particle collisions (LHC@Home). The crucial aspect is that there is a long running, CPU-intensive process that provides the raw data (l will call it the ‘engine’) and a thin server responsible for presentation and user interaction (the frontend).

Let’s start

The first thing to do is to implement the frontend. After you have downloaded and installed the GAE SDK (see detailed instructions online), you can implement your first application. This will be something very simple – a single button “Start game” and an area showing game progress. When an user clicks the button a new game is started (in the cloud of course) and the game area is updated live. The skeleton of the web app for the GAE platform might look like this (frontend.py):


from google.appengine.ext import webapp

html_page = """
<html>
    <head>
    <meta content="text/html;charset=utf-8" http-equiv="content-type">
    <script type="text/javascript" src="static/jquery.js"></script>
    </head>
  <body>
        Hello Game!
        <form action="start" method="POST">
        <input type="button" id="start" value="New game"
                    onClick="post()"/>
        </form>

        <div id="chart"></div>
        <script type="text/javascript">
           function post() {
                    // start the game by StartGame.post method
                    // here we use jQuery to make an asynchronous request
                    // without redirection
                    $.post('/start')
                   }
        </script>
        <script type="text/javascript" src="static/pacman.js"></script>
  </body>
</html>"""

class MainPage(webapp.RequestHandler):
    def get(self):
        self.response.out.write(html_page)

class ReceiveData(webapp.RequestHandler):
    def post(self):
        data = self.request.body
        self.send_data_to_clients(data)
        def send_data_to_cllients(self,data):
            ….

class StartGame(webapp.RequestHandler):
    def post(self):
        ...

application = webapp.WSGIApplication(
                              [('/', MainPage),
                               ('/data', ReceiveData),
                               ('/start', StartGame)]
                                     ,debug=True)

def main():
    run_wsgi_app(application)

if __name__ == "__main__":
    main()

The application consists of three handlers:

  • MainPage– renders the content of the webpage
  • StartGame– starts a new process in the cloud
  • ReceiveData – receives data from the cloud process and informs the clients=browser (for example via Channel API)

In the cloud

The engine is quite straightforward. Let’s say that you implemented your game/simulation/analysis and its main entry point is the run_process function.

In order to start it on the cloud you will need to obtain a free API key from PiCloud and run this script (backend.py):

import cloud
jid = cloud.call(run_process)
result = cloud.result(jid)

The problem is that the result will be returned only when run_process finishes. What if you want to obtain the results live (after each turn of the game/each iteration of the analysis)? Well, nothing stops you from sending data from the process to the frontend server. To this end, modify the run_process function like this (in backend.py):

import json
import urllib2
def run_process():
    for i in iterations:
        # ... compute result
        data = {'result': result}
        data_json = json.dumps(data)
        req = urllib2.Request(frontend_url, data_json, {'content-type': 'application/json'})
        response_stream = urllib2.urlopen(req)

In each iteration of the loop POST request is sent to the frontend_server (via urllib2 library), which in turn will pass the data to the client’s browser. The data are sent in JSON (JavaScript Object Notation)string, which is common in the web world.

Note: json module is a standard module in Python 2.6 so make sure that your use this or later version of Python. It is preinstalled on PiCloud but on GAE you will have to use experimental Python2.7 backend (not required for this application).

Here comes the problem

So far so good, but in order to run the process on the cloud, we need the picloud installed on the frontend server – GAE. However, GAE does not provide picloud libraries. Fortunately enough, PiCloud allows to run processes via REST interface that does not require external libraries except the (pre-installed) urllib2.

First, you need to upload your function to the cloud. You have to do this only once, from your own local machine at which you installed the picloud libraries (you did, right?). To do so, in backend.py you may replace the cloud.call with:

import cloud
cloud.setkey(your_user_token, your_long_api_key)
cloud.rest.publish(run_process, 'run_process')

your_user_token and your_long_api_key are available from your picloud account (go to Control panel). When you run the updated backend.py from your local machin the cloud.rest.publish will return an url that can be used to call your function.

Now, you may start your process just by making a simple HTTP request from the frontend server. To this end, modify your StartGame handler in frontend.py:

import base64
import urllib2
class StartGame(webapp.RequestHandler):
    def post(self):
        base64string = base64.encodestring('%s:%s' % (your_user_token, your_long_api_key))[:-1]
        http_headers = {'Authorization' : 'Basic %s' % base64string}
        request =urllib2.Request(url_to_cloud_function, data='', headers=http_headers)
        response = urllib2.urlopen(request)

Replace url_to_cloud_function with the URL returned by cloud.rest.publish.

Conclusions

Integrating GAE and PiCloud is quite easy and can be done in very few steps. Both platforms are very powerful and will allow you to open your work to wider public and create social/interactive/distributed applications in a couple of hours.

The game at work can be viewed at http://pelitaapp.appspot.com and its full code is available from github.

Leave a Reply

Your email address will not be published. Required fields are marked *