Thursday, 10 July 2008

Appengine, Python and doctests

Up until now I have been working mostly with UI on Google AppEngine. Now I am starting on the real meat of Golf Adept - with the first more than trivial model - the primary stroke record. A model by itself does not necessarily require unit tests, but the loading of the model is special as it comes from the client. A user name is converted to a user object, lattitude and longitued to a GeoPt and date/time to internal format. As it happens the last was the most important.

I love the concept of doctests. A developer leans a new interface best by example. A doctest is a running example where it will do the most good - with the source.

I had already used doctests on business logic files. It is just a matter of executing the doctest library if the module is run as a program. Now I want something entirely different - to run a doctest in the appengine context.

I have chosen to integrate the doctest and web frameworks so a test can be run when needed. My framework is Django because that is what I am using. I use a common library so that the doctest will work in any project I create. If you don't use Django, change the example to webapp and point to it with app.yaml.

To make some sense I need to describe my application layout. My application urls.py is relatively empty, but references one in a common library:


from lib.view import urls

from django.conf.urls.defaults import patterns

urlpatterns = patterns(
'',
)
urlpatterns += urls.urlpatterns


The library one does the work:


from django.conf.urls.defaults import patterns

# from lib.view import urls
# urlpatterns += urls.urlpatterns
urlpatterns = patterns(
'lib.view.page',
(r'^$', 'main'),
(r'^html/(.+)$', 'static'),
(r'^active/(.+)$', 'content'),
(r'^cms/(.*)$', 'cms'),
(r'^admin/(.*)$', 'admin'),
(r'^doctest/(.*)$', 'doctest'),
(r'^(.+)$', 'content'),
)


In short a module called page.py has a method called doctest that is called and passed the rest of the url. So, http://localhost:8080/doctest/model.record will run the sample doctest.

Here is the python to run the specified doctest:


def doctest(request,modulePath):
""" given a module as part of the URL, run a doctest on it.
eg: http://localhost:8080/doctest/model.record
"""
import imp, doctest
# doctest uses imp.get_suffixes - but appengine doesn't allow the use of imp.
# It is only to check module for binary so we can bypass it.
def get_suffixes(): return None
imp.get_suffixes = get_suffixes
# doctest writes to stdout. We need to save that to a string to drop into
# the response.
import sys
stdout = sys.stdout
try:
sys.stdout = StringIO()
module = __import__(modulePath, globals(), locals(), [''])
doctest.testmod(m=module,verbose=False)
content = sys.stdout.getvalue()
if len(content) < stdout =" stdout" content="content," mimetype="'text/plain')">

All the tricks that I sweated to discover are documented above.


  1. Google override imp as it provides a level of access that is risky for common environment. Unfortunately doctest uses imp to check that it is not given a binary file. Since we are proving controlled data we can bypass the test by returning no suffixes.
  2. doctest throws everything to the console. A CGI program sends console output back to the browser. Problem Django expects the contents to be created on demand. So, redirect stdout, grab the output and toss it to the browser.
  3. Lastly, doctest loads and runs. If you run again without changing code is is already loaded. It must keep static data as it tries to combine the results from the current and last run. The solution is to remove the reference in loaded modules so it will reload every time.



Testing a Google AppEngine Model
My example is a real file.

  1. It does not validate data as it is getting said data from a trusted source.
  2. It uses a static load method massage the input data, create a record and save it.
  3. I use a generated key name so that if the same data is loaded more than once it will not be duplicated in the database.
  4. The tests are at on the doc at the head of any method or class.
  5. They call a _test() method that loads a record, checks the database for a result.
  6. The test also deletes the record. Being a good little test it cleans up after itself.
  7. This is a first release with basic tests. When integrating it with other parts of the system it may break. Rather than just fixing the error it makes a lot of sense to replicate the problem in a new doctest line so that any fix can be proved to stay fixed. Besides it is a lot faster to run a doctest over than following a certain manual path through the UI.


# Copyright 2008 Askowl Pty Limited
from google.appengine.ext import db
from google.appengine.api import users
from google.appengine.api.datastore_types import GeoPt
import datetime

class Record(db.Model):
""" Model object encompassing a record taken on the golf course and downloaded
from a mobile phone.

>>> _test("fred@bloggs.com,newHole,Ashgrove,08-07-12 21:15:12,12.34,56.78,90.12,note one")
"{u'lie': None, u'direction': None, u'distance': None, u'club': None, u'type': u'newHole', u'altitude': 90.120000000000005, u'course': u'Ashgrove', u'stroke': None, u'location': datastore_types.GeoPt(12.34, 56.780000000000001), u'time': datetime.datetime(2008, 7, 12, 21, 15, 12), u'quality': None, u'notes': u'note one', u'user': users.User(email='fred@bloggs.com')}"
>>> _test("john@brown.com,stroke,Indooroopilly,08-09-23 09:01:22,43.21,87.65,21.09,note two"
... ",5-iron,full,fairway,clean,straight,136")
"{u'lie': u'fairway', u'direction': u'straight', u'distance': 136.0, u'club': u'5-iron', u'type': u'stroke', u'altitude': 21.09, u'course': u'Indooroopilly', u'stroke': u'full', u'location': datastore_types.GeoPt(43.210000000000001, 87.650000000000006), u'time': datetime.datetime(2008, 9, 23, 9, 1, 22), u'quality': u'clean', u'notes': u'note two', u'user': users.User(email='john@brown.com')}"
"""
user = db.UserProperty()
type = db.CategoryProperty()
course = db.StringProperty()
time = db.DateTimeProperty()
location = db.GeoPtProperty()
altitude = db.FloatProperty()
notes = db.StringProperty()
club = db.StringProperty()
stroke = db.StringProperty()
lie = db.StringProperty()
quality = db.StringProperty()
direction = db.StringProperty()
distance = db.FloatProperty()

@staticmethod
def load(values):
count = len(values)
if count == 0:
return None
if count < key =" 'k;'+values[0]+';'+values[2]" record =" Record(key_name=" user =" users.User(values[0])" type =" values[1]" course =" values[2]" time =" datetime.datetime.strptime(values[3]," location =" GeoPt(float(values[4]),float(values[5]))" altitude =" float(values[6])" notes =" values[7]"> 8:
record.club,record.stroke,record.lie,\
record.quality,record.direction = values[8:13]
record.distance = float(values[13])
record.put()
return key

def _test(line):
values = line.split(',')
key = Record.load(values)
record = Record.get_by_key_name(key)
repr = record._entity.__repr__()
record.delete()
return repr


Immediate Benefit


My first run found a non-trivial problem with writing date/time objects to the database. The first time the test was run the date recorded was adjusted by local time. Subsequent runs within a few seconds would record the date in UTC. Waiting for 30 seconds or so or changing the source would cause the fault again on the first run only. Some research found a Google issue (131) that is marked as fixed in 1.02. I am running 1.1. Fortunately a fix to the datastore file listed here still worked.

Future Improvements


The doctest method could set a HTTP return code if a test fails. This way we can execute the HTTP request from curl and use the return result to control other actions (such as checking in the code).


This package allows a single doctest to be run from the browser. It would not be difficult to integrate this with other examples where you would pass a package name and the code would walk the tree looking for and running doctests in all the modules.


For a larger team continuous integration is valuable. If you have made the changes above it would be simple to hit the URL from the continuous integration server, saving the result and pass/fail from the return code.

Thursday, 26 June 2008

Time Out - Buying a New Car

It pays to analyse in areas other than computers. Two and a half years ago I purchased a 2 year old car on the basis that it was better value. Now it is up for $7,000 in costs over and above normal services - and I found that I can't trade it in for nearly the figure I still owe. So, this time I attempted a total cost of ownership analysis.

First I looked at same or similar model in the same marque that has been around for 4 years. I compared it's new price against worst case private sell price for a car in good quality low mileage. From this I get a percentage drop in price. For example I used carsales.com.au to compare a BMW 118i hatch:

New price: $50,000
High 4 year: $40,000
Low 4 year: $20,000
% drop: 44%


I ran a few other cars through for comparison. Drop in value appears to be fairly consistent for a marque. In Australia BMW is around 45%, Peugeot is 55 and Citroen 65%.

My second table attempts to calculate the total cost of ownership over 4 years - for both the older cars and the new replacements. One column for each car and rows for:

  1. Purchase Price - being $0 for an owned car or one on lease.
  2. On-Road - again a new car cost.
  3. Repayment - Monthly repayments for lease, hire purchase or whatever
  4. Term - time lease is for.
  5. Residual - How much you would need to pay to own the vehicle after the term
  6. Resale @ 4 years The cost of the vehicle adjusted for the %drop in value from above.
  7. Trade In - The difference between 5 and 6, being positive if it is worth more than you owe.
  8. Service per year - Calculate average service over 4 years. A car that needs servicing yearly will have 3 services while one that requires 2 yearly will only need 1 in the life of the lease. If you do big mileage this will be a different calculation.
  9. Paid Out - is the total cost of repayments over 4 years.
  10. Sub-total/year - being yearly cost of repayments and servicing.
  11. Economy l/100k - used to calculate fuel costs
  12. Cost per litre - a guess on average cost of fuel over the next 4 years
  13. Fuel @ 10k/year - Cost of fuel given an estimated distance driven per year
  14. Rego - Registration costs from your local registry office.
  15. Insurance - Insurance calculation. Many insurance companies have calculators for this.
  16. TCO / year - Adding all these up gives a rough total cost of ownership
  17. Per week - Byt reducing that to a weekly cost we see how it effects our budget.
The results are interesting. I always knew that the cost of maintenance on an older vehicle offsets the higher repayments on a new vehicle, but this calculation put numbers to it. In short a 12 year old Citroen Xantia that I own outright still costs me over $100 a week to run. Replacing it with a brand new Peugeot of similar size costs $80 a week more. My 5 year old C5 came out worse - being almost the same price as a replacement new vehicle.

Don't use my figures. Make a similar calculation based on the cars you look at. It is worth the time and effort. There are non-financial consideration - or those that are at least unmeasurable. We chose BMW because if (1) safety, (2) 2 years between services and (3) because of guaranteed by-back at end of lease.


Wednesday, 4 June 2008

A Google CMS - Part 1

Synopsis
I have written extensions using the Google App Engine to use the Google Apps framework as a n active CMS (Content Management System). This series of articles documents the what, why and how - including code.

What is a Content Management System (CMS)
A content management system is a way to publish a consistent web site without resorting to basic HTML. It has a number of functions:

  1. Separate content from structure. This allows consistent templates to be created and content providers to only need to deal context specific information (words and images).
  2. Workflow - allowing content to be created by the experts but vetted by other interested parties before being published.
  3. A mechanism to keep page changes and revert to an earlier version on command.
  4. Consistent look and feel.
  5. Easy content entry from subject experts without a lot of technical knowledge.
An Active CMS
To this group I would add a new line item:
  1. The ability to combine application functionality with managed content.
Traditionally hard coded applications (stand-alone or web) had limited hard coded content - think help files and page footers. This is not very satisfactory. Most developers dislike UI work - and the result show.

A Content Management System allows a subject expert to publish static data on a web site with consistent look and feel without having technical knowedge of the web site. It separates content fron structure. An Active Content Management System adds the components to include forms, reports and results into these pages. It further separates business logic from the display.

In a perfect virtual world subject experts will create all the results they need inside an interactive Active Content Management System. From this a developer can develop the services (both input and output) needed to complete the pages created by the subject experts.

Why Google
When Google released the Google App Engine they added the final component needed to create an Active Content Management System using Google tools on Google infrastructure. In this first pass I am going to document here I use Google Sites to create content and Google App Engine to serve it. I use Google Groups for both community and private communications and Google Blogger to generate news articles to display on the site.

Once this is complete and stabilised I will be starting on the active components. I have plans to use Google Charts for display, Google Docs for reports and other Google APIs and technologies as I can find a use.

While workflow can be implemented with Google App Engine, I have no need of it at this time.

Who Needs a CMS
Because I am comfortable with HTML and am developing Golf Adept alone I have been creating web pages as part of the application and loading it directly to Google App Engine.

My daughter runs a massage clinic and needed a new web site. My knowledge of Plone and other content management systems led me to believe that this was the correct approach for her site - even though there would only be one or two content providers.

On consideration I recognised the need to change Golf Adept to the new standard. Eventually I am going to need content that is to be created and maintained by others. Better to start on the correct path.

What Will be in Part 2
I know that this part has only been words, but every concept requires words to start on the path to understanding. The next part will be technical, showing how easy it is to create a content management system with a small amount of Google App Engine code.

Monday, 2 June 2008

Sessions in Google App Engine

Google App Engine was a bit of a shock to many application server developers - no sessions.

What is a Session? Every browser request stands alone. The only connection between then are cookies passed back and forth between server and browser. Early CGI used these cookies to hold all important data. This is limited in size and does not allow for any security. Application servers usually only set one cookie - a reference to a session object. Every request from the browser can then be associated to a server-side session, including being logged in.

For efficiency on a single server these sessions are kept in memory. For small clusters the network makes sure that a session consistently accesses the same computer.

With the Google App Engine you do not know which server runs the application and which database store holds the data. This gives us massive extensibility, the vaunted Google speed advantage, redundancy, reliability and much more. The cost? Well, we can't hold a session in memory because different machines could very well serve different requests.

Why We must have a Session. Surfing the web is like reading a book. We hold the context of the context thread in our heads as we read. Using an application server is more interactive - like a conversation. A conversation requires that both participants hold the context so that it can exceed a single exchange.

The Browser Session. All modern browsers hold a session using cookies. Cookies are associated with a particular web site or path on a web site. They are held by the browser and passed back with reach request. Both browser and server can set new cookies. In-memory cookies only last until the browser is closed. Cookies can be persistent, but since they are part of the browser they are specific to a single computer. For safety cookies can have a time-out after which they are removed. Because cookies are sent back and forwards with every exchange in the conversation they are limited in size. Cookies are great when used within their limitations.

The Google Session. Yes, I know that I said Google App Engine did not provide session management. This is not entirely true. It does provide a Users API. And guess what - it is reference by a cookie. Any Google App Engine code can pick up a small amount of user specific data - name, email address and a nickname to display. It is almost certainly kept in the same data store as our own data, but it is likely to be optimised.

A Session we can use. Because I needed connectivity early on I used some of the earliest examples. There has been a lot of session discussion on the forums since, but as I have an acceptable solution I have stopped following them in detail. Because data retrieval is expensive I wanted lazy loading.

My solution was a class (session.py) that I add a reference to in the parameters from any Django template:


params["session"] = Session(request)


Session is not persistent - it does not inherit from db.model. The idea is to keep it light-weight until something is needed.


class Session:
def __init__(self,request)
self.__dict__['request'] = request


The request object is of type HttpRequest with all the relevant information available. It can also be used to hold non-persistent data for use in a single request.

Because Django will access information from session as a dictionary or a function call, session becomes all-encompassing.


def user(self):
if not self.__user:
self.__user = users.getCurrentUser();
return self.__user


So, a template can access the Google user object - as in {{session.user.nickname}}. I also use the session object for other system information:


def loginURL():
return users.CreateLoginURL('/')
def isAdmin():
return users.IsCurrentUserAdmin


If we were to inherit session from db.Expando we would have to save the whole session any time a piece of data changed. I prefer to only update the data that needs changing by overriding __getattr__ and __setattr__:


def __getattr__(self,name):
if name.startswith('_'):
return None
self.__dict__[name] = value = UserData.Load(name).value
return value

def __setattr__(self,name,value):
if name.startswith('_'):
item = value
else:
try:
def modify(data): data.value = value
item = UserData.Modify(modify,name)
except:
logging.error('Setting session data for '+name)
item = value
self.__dict__[name] = item


UserData saves data to BigTable keyed to a specific user - posted at App Engine Fan: Saving user-specific data

So, instance data starting with underscore is not saved to persistent storage. Nor is data in session.reference. Anything else is persisted as separate data objects. Because UserData is a functional db.Expando, items can be any of these properties. For larger data groups, a reference to another database object or object tree would be suitable.

Wednesday, 14 May 2008

Web 2.0 and a Searchable Website

Golf Adept is as much an application as a web site. To provide a consistent look and feel I have included Web 2.0 features from the front page on down - Panels, tabs, zoom, Ajax, JSON, etc. This raised two immediate problems to solve:

  1. Web 2.0 libraries are large to download. ExtJS, for example, is over 900kb uncompressed.
  2. A lot of the pages are invisible to the search engines. They are loaded by JSON and Ajax, then driven by clicks rather than links.
The first problem has traditional solutions. Extranious code can be excluded. I chose not to do that. Standard JavaScript compression (using jsmin) brings the size down to 500kb. Enabling gzip compression on the server reduced transfer to 155kb - a size I have considered acceptable. Broadband is not always that broad - and even 155kb can take some time. I like to give my customers something to read while waiting for the bells and whistles. Fortunately the solution for this is the same as for problem two...

Golf Adept is a Web 2.0 application - basically an empty page that is filled by JavaScript. There is nothing for the search engines to review and nothing for the visitor to read while the JavaScript code is being downloaded. Most of the relatively static pages are downloaded by Ajax to fill the tab frames on demand.

Django allows us to include HTML templates - and just like the Ajax payload downloaded to display it is a pure HTML fragment. I refactored my main Web 2.0 page into staticBase.html with main.html and a new staticPage.html extending it. Even the Web 2.0 main page loads header, home and footer as a static html template to display for the user to read while the JavaScript is loading.

The footer includes a site-map link that points to the staticPage template, giving it the name of the HTML fragments that are the contents of the tabs in the Web 2.0 implementation. The static page has the same layout and look as the main Web 2.0 pages. Where the tabs are is blank with a link to the Web 2.0 site.

In the end I have a Web 2.0 site that can still be walked by search engines - any by other systems that cannot use JavaScript. And all this is without duplication of content - thanks to Django template inheritance.

Hmmm, this has been a text-centric blog. I prefer to learn from example, so here goes:


staticBase.html


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>
{% block title %}{% endblock %}
{%if session.user%} for {{session.user.nickname|escape}}{%endif%}
</title>
{% block css %}{% endblock %}
{%if session.isDev%}
<link rel="stylesheet" type="text/css" href="/static/page/main.css" />
{%endif%}
{%if not session.isDev%}
<link rel="stylesheet" type="text/css" href="/static/all.css" />
{%endif%}
<link rel="stylesheet" type="text/css" "http://www.google.com/uds/solutions/dynamicfeed/gfdynamicfeedcontrol.css" />
<script type="text/javascript" src="/static/lib/dynamicFeeds.js"></script>
</head>
<body>
<div id="static-body">
<table style="width:760px;margin:0 auto;">
<tr><td style="height:380px;"></td></tr>
<tr><td style="height:30px;background-color:#EEEEFF;color:#BBB">
{% block loading-message %}Loading...{% endblock %}
</td></tr>
<tr><td>{% block body %}{% endblock %}</td></tr>
</table>
{%include "page/footer.html"%}

{% block end-load %}{% endblock %}

<!-- Afterwards so image loads after we have something to read -->
{%include "page/header.html"%}
</div>
</body>
</html>



main.html


{% extends "page/staticBase.html" %}

{% block title %}
Golf Adept
{% endblock %}

{% block css %}
<link rel="stylesheet" type="text/css" href="static/lib/ext/resources/css/ext-all.css" />
{#<link rel="stylesheet" type="text/css" href="static/page/main.css" />#}
{% endblock %}

{% block loading-message %}
Loading...
{% endblock %}

{% block body %}
{%include "tab/home.html"%}
{% endblock %}

{% block end-load %}
<!-- Now that we have something to read go ahead with the loading... -->
{%if session.isDev%}
<script type="text/javascript" src="static/lib/ext/adapter/ext/ext-base-debug.js"></script>
<script type="text/javascript" src="static/lib/ext/ext-all-debug.js"></script>
<script type="text/javascript" src="static/lib/Ext.ux.MaximizeTool.js"></script>
<script type="text/javascript" src="static/lib/Ext.ux.Plugin.RemoteComponent.js"></script>
<script type="text/javascript" src="static/lib/Ext.ux.IFrameComponent.js"></script>
<script type="text/javascript" src="static/lib/Ext.ux.layout.CenterLayout.js"></script>
<script type="text/javascript" src="static/lib/Ext.ux.StatefulTabPanel.js"></script>
<script type="text/javascript" src="static/lib/loadJS.js"></script>
<script type="text/javascript" src="static/page/main.js"></script>
{#<script type="text/javascript" src="static/lib/dynamicFeeds.js"></script>#}
{%endif%}
{%if not session.isDev%}
<script type="text/javascript" src="static/all.js.css"></script>
{%endif%}
<script type="text/javascript" src="active/page/main.js"></script>
{% endblock %}



staticPage.html


{% extends "page/staticBase.html" %}

{% block title %}
Golf Adept - {{ title }}
{% endblock %}

{% block loading-message %}
<a href="/">Home</a>
{% endblock %}

{% block body %}
{% include content %}
{% endblock %}



sitemap.html


<h3>Golf Adept Overview</h3>
<ul>
<li><a href="/">Golf Adept Home Page</a></li>
<li><a href="/html/tab/home.html">Know Your Game</a></li>
<li><a href="/html/tab/whatIsGolfAdept.html">What is Golf Adept</a></li>
<li><a href="/html/tab/walkthrough-thumbnails.html">Walkthroughs</a></li>
<li><a href="/html/tab/aboutUs.html">About Us</a></li>
</ul>



main.js


var homeTabs = [
{title: "Home",autoLoad:{url: "tab/home.html", params:"",scripts:true}},
{title: "What is Golf Adept?",autoLoad:{url: "tab/whatIsGolfAdept.html"}},
{title: "Walkthrough",plugins:[Ext.remoteComponent('static/tab/walkthroughs/layout.json')]},
{title: "About Us",autoLoad:{url: "tab/aboutUs.html"}}
];
var homeTabPanel = new Ext.ux.StatefulTabPanel({
id: 'home-tab-panel',
style: "margin: 0px auto 0px auto;",
width:760,
resizeTabs:true, // turn on tab resizing
minTabWidth: 115,
tabWidth:135,
enableTabScroll:true,
defaults: {autoScroll:true,layout:'fit'},
frame:true,
items:homeTabs,
activeTab: 0,
maximizable:true,
// Listener for Google Analitics
listeners: {
activate: function(tab) {trackPageView(tab.title);},
}
});

Monday, 12 May 2008

ExtJS and Saving State

I wrote an Ajax/Web 2.0 framework before those names were bandied about. Unfortunately (for me) others have forged ahead. So, for the new project I decided to consider an existing system. After research and consideration I chose Ext-JS for my Web-2.0 Ajax front-end. Very nice to look at, lots of features, good examples, good API documentation and limited general documentation.

I knew I wanted to save state and restore it on the next visit. The first components I created had a 'stateful' option. As it happens I was luck as this defaults to 'true'. For the life of me I could not find an example I trusted.

Finally, after prompting, I looked at the source. Once you know how, it is easy. For my example I will save the tab to display in a tab panel. Because this is a common occurrence, I have created a new class based on TabPanel:


Ext.ux.StatefulTabPanel = Ext.extend(Ext.TabPanel, {
stateEvents: ['tabchange'],
getState: function() {return{tab:this.getActiveTab().id}},
applyState: function(state) {this.setActiveTab(state.tab);}
});

To work, the system requires a state manager - a way to save the data. The Ext people have already written one that saves state in cookies. If you want to save the state to the server a new manager is needed. In my case I am quite happy to use cookies so that the state is kept on the browser for that user and machine. I did want it to live longer than the default 1 day:
Ext.state.Manager.setProvider(
new Ext.state.CookieProvider({
expires: new Date(new Date().getTime()+(1000*60*60*24*365)), //1 year from now
}));

Put this code in the start of your man Ext.onReady function.

Monday, 5 May 2008

A Beginning

I have chosen to write Golf Adept using technologies all totally new for me. I am hoping others will gain value if I record as I learn.

I am starting this blog a little late since this journey commenced late last year with some work on J2ME. Just when it was ready to port from simulator to phone, Google announced the Android Challenge. No change in direction, but a modification of priorities. January to March was porting and upgrading the J2ME system to run and look good on the more sophisticated Android platform.

I had originally decided to run the server side on Plone - and the first static site for the Challenge is on that platform. But, Google steps in again and introduces the Google App Engine. As an environment it is as I want it - and the benefits of running on the Google server over one limited server are too attractive to ignore.

So, I have spent April porting the static application to GAE before going back to getting the J2ME running again. My first technical article will be on pre-compressing static content to save on processing time.