On a recent project a client asked about overriding Page#find_by_url and when that actually occurs. I think the answer should be explained for everyone working with it.

20 second summary

This is an in-depth look at the method that gathers pages within Radiant. In short, if you want to do special page finding, create a subclass of Page and write your own find_by_url method to adjust the way Radiant behaves. Every page will respond to this method and return appropriate pages according to the requested url. In the admin interface, you can select your special page type to make that page behave as you have specified.

Simple finding

find_by_url is defined both as a class method and an instance method. Let's look at the class method Page.find_by_url from the Page model:

class << self
      def find_by_url(url, live = true)
        root = find_by_parent_id(nil)
        raise MissingRootPageError unless root
        root.find_by_url(url, live)
      end
      # ...
    end

First, it looks for the root page, which is considered the page with no parent_id. If no root page is found it raises a MissingRootPageError exception; otherwise, it calls the instance method find_by_url on the root page.

This class method takes 2 arguments: the url (really the path matched in the routes from request) to be found, and a live flag which defaults to true (more about that later).

Finding the first page

The find_by_url instance method is a bit more complex. Let's take a look:

def find_by_url(url, live = true, clean = true)
      return nil if virtual?
      url = clean_url(url) if clean
      my_url = self.url
      if (my_url == url) && (not live or published?)
        self
      elsif (url =~ /^\#{Regexp.quote(my_url)}([^\\/]*)/)
        slug_child = children.find_by_slug($1)
        if slug_child
          found = slug_child.find_by_url(url, live, clean)
          return found if found
        end
        children.each do |child|
          found = child.find_by_url(url, live, clean)
          return found if found
        end
        file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
        file_not_found_names = file_not_found_types.collect { |x| x.name }
        condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
        condition = \"status_id = \#{Status[:published].id} and (\#{condition})\" if live
        children.find(:first, :conditions => [condition] + file_not_found_names)
      end
    end

Wow. There's a lot going on there and there's room for some refactoring, but for now let's just walk through it.

First, nil will be returned if the page is virtual?. A page, by default, is not virtual. This is stored in the database in a boolean field, but you may override this in any subclass of Page that you create. For now, let's assume that your page isn't and won't be virtual and we'll get back to what it means.

Next, we clean the url if the clean flag is set to true (which it is by default). clean_url simply ensures that the url being checked is properly formatted and that any doubling of slashes is fixed. So this right//here//// becomes this /right/here/.

The next step shows us why we clean the url. A local variable is setup to compare against the page's url.

my_url = self.url
    if (my_url == url) #...

What is a page's url? It's calculated by the page's slug and the slugs of it's ancestors. In short, if your current page's slug is 'here' and it's parent page is 'right', and that page's parent is the home page (with a slug of '/') then your current page's url is '/right/here/'.

So we check to see that to see if it is the same as the url in the request. But also, in this comparison, we check to see if the live flag is set and is false or if the page is published?.

This live flag is a bit strange in appearance:

my_url = self.url
    if (my_url == url) && (not live or published?)

By default, this not live returns false (since live is true by default and we reverse it with not) so it moves on to published?. You might set live to false in other situations, but for now we'll just go with this.

A page is published? if it's status (as stored in the database) is the 'Published' Status.

So if the incoming url matches the current page's url (which at the first pass is the root or home page), then we return with the current page:

my_url = self.url
    if (my_url == url) && (not live or published?)
      self

Finding deeper pages

If it isn't true that the incoming url and the current page's url are equal, then we move on to the next step:

my_url = self.url
    if (my_url == url) && (not live or published?)
      self
    elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)

Here it matches the incoming url against a Regexp of the current page's url. When it starts, we're matching the root page which has a url of '/'. If that's the incoming url, it would have been caught in the original if block, but we ended up at the elsif. The Regexp that's used matches the next slug in the incoming url. So if the incoming url is '/right/here/' then it will match the slug 'right'.

From that match, we find the current page's children by their slug (remembering that the current page is the root, with a slug of '/'):

elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)
      slug_child = children.find_by_slug($1)

If it finds that 'slug_child', then we call find_by_url on that page to loop down the tree to find the final page that we want (which would be the page that responds to the url '/right/here' or in this simple case, the page with a slug of 'here'). If it finds the page, then it returns the found page:

slug_child = children.find_by_slug($1)
      if slug_child
        found = slug_child.find_by_url(url, live, clean)
        return found if found
      end

In that if slug_child block, the slug_child.find_by_url acts as a loop. Because every page responds to this method and will do exactly what is happening here for the root page, each page will search it's children for a slug matching the slug from the incoming url and any found page will likewise call find_by_url to search it's children as well.

There is some room here for some optimization in the way we do a lookup for a page, but for now it works and we can get to the refactoring another time.

When no slug is found: customizing the finder

If the slug_child is not found (and no child matches that slug) then this if slug_child block is never hit and we move to the next step. This is where the magic happens for subclasses of Page:

children.each do |child|
        found = child.find_by_url(url, live, clean)
        return found if found
      end

It asks each child of the current page if it responds to find_by_url and returns any found page.

So even if none of the pages are found by the slug, we still ask the children if they respond to find_by_url. Why would we do this?

The answer lies in one of the included extensions: Archive.

The ArchivePage is a subclass of page which provides it's own find_by_url method. The ArchivePage#find_by_url will check the incoming url for it's details and if it meets certain requirements (namely that there is a standard date format in the url such as 'articles/2010/06/22') then it will find the appropriate page type such as ArchiveDayIndexPage, ArchiveMonthIndexPage or ArchiveYearIndexPage and return the proper page. If none of those are found it just calls super and calls the original Page#find_by_url.

This can act as your router for your custom page types. If you want to return a particular page type, such as a ProductsPage and your url is '/products/1234' then you can create a ProductPage which has it's own find_by_url method and would find your ProductDetailsPage to display a standard view of all of your products based upon the slug '1234' which I'd assume would be a product id, but could be anything you want.

Handling 404

Lastly, if none of this finds any pages to return, Radiant has a FileNotFoundPage page which allows you to easily create your own 404 error message for content that isn't found. You can subclass a FileNotFoundPage page to provide your own behavior there too. But when searching for a match to an incoming url, Radiant will find deeply nested 404 pages. So you can create a FileNotFoundPage as a child of your root page, but you can also create a FileNotFoundPage as a child of your ProductsPage to return an appropriate message to someone looking for '/products/not-a-valid-url'.

Here's the code for that last step:

file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
      file_not_found_names = file_not_found_types.collect { |x| x.name }
      condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
      condition = "status_id = #{Status[:published].id} and (#{condition})" if live
      children.find(:first, :conditions => [condition] + file_not_found_names)

The live flag comes into play here again and optionally allows you to find pages that are not published. By default live is true, so in this instance we only check for a FileNotFoundPage that is published.

Radiant has a 'development' mode which would find unpublished pages, but that's a subject for another discussion.

I hope this gives you a good understanding of how Radiant finds its content, and how you can easily bend it to behave differently by creating a subclass of Page and writing your own find_by_url method. If I've left anything out or if you want me to cover some other aspect, let me know in the comments.