forsterlewis.com  >>   Computer Science  >>   Data Driven Web Design

Principles of Data Driven Web Design

Ian Lewis, University of Cambridge, 27th Sept 2010, updated 2011, 2012, 2013

Table of Contents

The basic idea

Certain entities within the system should be recognized as important and given dedicated views, and any appearance of the name of that entity within the system should be linked to the corresponding page containing the view. These 'important entities' will be called Primary Nouns.

Amazon product page

The screenshot on the left shows the Amazon.com 'item' page for a book (you can click the image to enlarge it). Amazon.com has (unsurprisingly) good treatment of 'item for sale' as a primary noun.

Elsewhere on the Amazon.com site, the behaviour is as you would expect i.e. if you see the name of this item then you can click it and come to this page. All content related to this particular item for sale is available via this page. Linking to this page has actually been made very difficult because of the very unstructured URL, a mistake on Amazon's part, but Amazon have managed to work around that.

Amazon product page

Item for sale is not the only primary noun in the Amazon system, as can be seen if you click on the author name 'Carl Sagan' displayed immediately under the book title. This takes you to a page about that author, which (unsurprisingly) includes links to 'items for sale' pages for books by that author.

Of course, where products (i.e. books) are listed on this author page, the user rightly has no confusion about where these links will 'go'... they link to the same product page (for the book "Cosmos") that would have been found on the earlier search page.

Amazon has implemented an 'author' page, complimenting the 'book' page, such that author is another example of a primary noun in Amazon system. It should be noted that for most of the history of the Amazon website the only primary noun was 'product for sale' with the many remaining links in the system being implemented as the particular need arose on a particular page, but Amazon have successfully reverse-engineered more understandable primary nouns such as author into the system.

For another example, a movie review site will sensibly contain pages dedicated to movies and actors, with the appropriate cross-linking between movie and actor pages. When you see this approach taken on a website, you absolutely take for granted this is the only way it would work and imagine anyone that did it differently would be an idiot. Most data-driven websites do provide the information well for a single primary noun (e.g. 'item for sale'). Examples such as imdb.com which support more than one primary noun are rare, and in data driven web design terms are called Multi-Dimensional. It should be rare for a data driven website to be one-dimensional, and the second dimension is usually fairly obvious.

However, when you visit a website where this approach has not been taken, you typically find so many more things to complain about (for example ineffective searching) that you miss the fact that the basic principles of data-driven web design have been broken. The usual situation is the web-site developer will have understood a single entity as being important in the system, but treated everything else as a search. A simple example would be your typical 'help desk' system - 'trouble tickets' are managed well, so you can view a given trouble ticket, or view the 'queue' the trouble ticket is considered to be on, but the treatment of 'person' is very much an afterthought. I.e. you can search for trouble tickets from a given individual, but access to this page is via a fairly unfriendly 'advanced search' function and there is absolutely no concept of clicking on a person name and then seeing the list of user trouble tickets.

Data driven web sites have two types of links

On a data driven website not all links are the same. A page about a primary noun can contain links to other pages about the same actual entity, and in data driven web design terms these types of links are called Tabs, not least because that visual metaphor could sensibly be used. Links may also be present on the page about some other entity, these are called Deep Links.

imdb.com movie page

For example, the imdb.com page for the movie called "The Matrix" has links on it for "trailers about this movie" and "plot synopsis of this movie" (the actual text used is not so explicit) - these links are tabs, even if they don't use the usual visual metaphor. The same page also has links to actors (e.g. Keanu Reeves) and if you follow this link you are no longer viewing information about the movie called "The Matrix"... the context has changed, and the link to each actor is a deep link. There are of course many other links on the "The Matrix" page to other pages that are neither about the movie "The Matrix" nor is it a link to a page that the site has been designed to support fully as a 'primary noun'. These types of links we'll just call 'links'.

imdb.com actor page
Dpreview camera page

Tabs and deep links are typically (wrongly) scattered at random across the web pages of a data driven website, but these two types of links exist nevertheless. For example, in the screenshot on the right (click image to enlarge), from the excellent (but flawed) dpreview.com, 'camera' is treated partially primary noun and each has its own page constructed from data drawn from a database as you would expect.

In this screenshot there are links to:

Dpreview camera page with tabs

The screenshot on the left (click image to enlarge) explains the choice of the name 'tabs' for the links relating to the same entity, as the tabs visual metaphor will make some sense if the links are of that type. This does not mean this particular layout is the only sensible one - the key point is that links on a data driven web site commonly have these different behaviours and that should be understood, not ignored. For example, these 'tab' links can be drawn down the left side of the page (this is the approach taken by imdb.com).

The use of the tab visual metaphor does, however, make it obvious that the 'tab' link should be repeated on each page you visit. In my experience the user will think of this page as the 'Canon Powershot S5 IS" page regardless of which tab they click on. This is fundamentally different from the screenshot shown above, where the user is encouraged to think they are on the 'specifications page' or the 'user opinions' page etc and only secondly should they consider which camera the page is about. If you actually program the scripts behind these pages you will understand that follows the developer's thinking as each page was created but there is little understanding in the web design community that this is not the ideal view for usability.

Dpreview.com is a truly outstanding website, with 50 million visitors a month, so the comments above should be taken in context. Great content is more important than getting a few structural design points right...

The 'search box' on a data driven web site should give 'primary nouns' as the results

This principle is simply illustrated by retail websites such as Amazon. Searches result in a list of 'items for sale', with the name of each item being linked to a data-driven webpage about that item.

The example earlier of the camera review website also illustrates the requirement, with a less clear implemention on the (current) website. After you have taken the trouble to strategically present your information around a small number of primary nouns, the 'search' functionality should be crafted to maintain the illusion that (for example) a 'camera' page actually exists. I.e. each search result should have the name of the primary noun, and the link should be a deep link to the corresponding entity. To make a change from imdb.com, try searching rottentomatoes.com for 'matrix' - the deep links to the primary noun results are clearly presented. This issue is more subtle than it looks, particularly if the data driven website has more than one type of primary noun.

RT query builder page

A counter-example with the RT Helpdesk system illustrates the most common issue. RT (Request Tracker) is an open-source web-based trouble-ticket tracking system that revolves around the concept of a single primary noun, i.e. trouble ticket. In effect, all actions lead you to a trouble ticket.

The 'advanced search' illustrated here (actually the 'Query Builder' tool that allows you to save customised queries) assumes the only thing you could possibly want to find is a trouble ticket, with a comprehensive but complex search tool.

The capability that is not understood by the implementors of the RT system is that, for example, user would be an equally valid primary noun for a trouble ticket system. If you search on some element of user information (e.g. name or email address), the results would include a list of users, not just trouble tickets. Click on a user in the search results, and you go to a 'user page' which lists their trouble tickets in the same way that the 'Carl Sagan' page in Amazon lists books by Carl Sagan.

Having 'person' as a primary noun is more important than most developers think

In most systems the developer can quickly think of one or two important 'things' contained in the system, e.g. in a Help Desk System the developer will usually come up with 'trouble ticket' and 'trouble ticket queue'. But if person is a possible view that makes sense, the developer will usually underestimate the value of that view to the users of the system. Each system may have 'person' as the second, third or fourth most important 'primary noun' such that the developer never gets around to implementing that view, but in aggregate across multiple systems, these 'people views' add up to the most important usability dimension.

Some sites exist entirely by having 'person' as the only primary noun, e.g. facebook.

Multiple systems can be inter-linked via their primary nouns

In large-scale system design terms, this is probably the most powerful aspect of data-driven web design using the principles described in this paper.

Web-sites are designed to link to each other. This fact is lost on the majority of web application designers. Nearly all of the web-delivered applications in use assume all links should be internal to a single system, with at most some token customized links to the homepage perhaps of some other systems.

Given a common primary noun, two independent web systems can interlink their pages based on that particular noun. E.g. given a Customer System with contact information regarding a person, and a Finance System listing perhaps payment history for a given customer, it is relatively simple to produce a web framework that allows an employee to toggle between the contact information and the payment history for a given customer.

This technique is surprisingly powerful, and at a large corporation allows the stitching together of information from hundreds of systems, particularly where a primary noun is particularly pervasive for example 'client' or 'customer'. Merrill Lynch Bank Of America has a framework of this type allowing easy access to key information such as contact lists, detailed financials, contracts, research documents, sales events and coverage information for their clients. The key point is the data and systems containing this information already existed before the framework was designed and implemented, but the framework allowed access to that information by employees to increase more than tenfold. This is not via a new uber-system containing all of the data, but a simple web-based design that allows web-based easy navigation of the existing systems.

The alternative, and typical, usage model involves visiting the home page of each system containing the information needed by an employee and entering a unique query that retrieves the information needed from that particular system. So increasing access tenfold by providing simple web navigation between systems is not difficult to do.

The use of the actual tab visual metaphor for inter-system links makes navigation particularly simple when implemented symmetrically in the included systems. But even the mere existence of page regarding a common primary noun in multiple systems makes inter-system linking much easier than the 'return to homepage' method. This is the opportunity provided by a developer providing, for example, a 'person page' in a helpdesk system instead of only 'trouble ticket'.

Input where you output

Bill Scott and Theresa Neil coined the phrase 'input where you output' in a talk relating to their book "Designing Web Interfaces". Essentially, basic input and edit of data can most conveniently be provided by an 'edit' button on the web page you usually use to view the data, rather than in a special admin menu hidden elsewhere on the website. Users that do not have administrative rights in the system do not get shown the 'edit' buttons...

Caveat: Note that the Scott/Neil comment refereced above is actually intended (like much web design) for completely general purpose web pages, i.e. they are predominantly referring to how to edit a typical web page 'in-situ'. The principle is still sound for data-driven pages though, although the examples given by Scott/Neil rely heavily on in-page programmatic scripting techniques. I should make it clear that using javascript, flash, ajax and whatever else to sex up the interactivity of web-delivered applications is not something I have any enthusiasm for, mainly because the developer's excitement over the new tech usually leads them to make the site less usable than it might have been. Plus websites that keep changing little pieces of the UI each time you click on something but seem reluctant to ever take you to a new page really detract from the usability of the web application.

But, if a site has a well-structured primary-noun-centric navigation, then the primary noun pages are great places to provide an input capability to those that should have those 'rights'.

For example, on the dpreview.com screenshot above, that page could provide an 'edit' button seen by the content editors which allows the links and other attributes for that particular camera to be input or editted. So developers have two main choices of where to put the 'update the data in the database' functionality:

  1. on a protected admin menu from which the editor navigates to the data in question
  2. on a protected button on the camera page which provides input boxes laid out similarly to the camera page text areas

Approach #2 is the 'input where you output' method appropriate to these principles of data driven web design.

Of course you do need an admin menu for all the awkward stuff, but most edits should be done via the primary noun pages, i.e. input where you output.

In case you haven't got the idea, the edit button is hidden/secured in exactly the same way as the 'admin menu' would be, i.e. there is no change in security, just usability.

When one of your primary nouns is person the 'input where you output' takes on significant additional importance. Input/update permissions can be extended to include some input by the person the page is about, as well as the system super-users. More generally, permissions can be conveniently fine-grained at the level of the primary noun, or some attribute of the primary noun. E.g. on the camera website one editor could administer 'Canon' cameras, while another could similarly administer 'Nikon' cameras - this is relatively easy to implement and more importantly use if the 'edit' button appears on the 'camera' page (the 'Nikon' editor does get the button on 'Canon' camera pages). If the administrative function is purely provided by a separate 'super-user' menu, it is more difficult to implement this fine grained permissioning, or at the very least it is more difficult to do it in a way that editors can relate to.

Web developments are influencing client-server development

Usability features of client-server and web-based applications are converging. Users expect to need no training to use a web application, so a complex application like ebay had better be intuitive to use. Developers of client-server applications still often assume any use of that application should have some training, but those days are numbered. Of course, to do more complex functions you will at least need to refer to the documentation of the system, but to look up your expediture-to-date of your cost center in the enterprise 'financials' system should not need a PhD in Computer Science.

Some of the principles of data-driven web design are equally applicable to traditional client-server applications, in particular the structuring of the user interaction around primary nouns instead of functions. As an example, this paper shows how an email client could usefully have 'person' as a primary noun.

Summary and Scoring Methodology

Any web application (i.e. data driven website) can be assessed and scored against the following principles of data driven web design:

  1. Primary Nouns. Does the website have a clear idea of the entities the user will consider important?
  2. Multi-dimensional. Is there more than one primary noun?
  3. Tabs. Are the links to pages about the current primary noun clear, or scattered around the page and mixed with ordinary links?
  4. Deep links. Where a primary noun exists, is the linking to the corresponding page totally consistent?
  5. Search. Is it clear that the search function of the site is finding primary nouns?
  6. People. Is 'person' a possible primary noun? If so, is it implemented as such?
  7. Inter-linking. Can you navigate between multiple systems based on a common primary noun?
  8. Input where you output. For those with appropriate administrative rights, is 'in-page editting' provided for the primary noun pages?