On a recent project a client asked about overriding Page#find_by_url
and when that actually occurs. I think the answer should be explained for everyone working with it.
20 second summary
This is an in-depth look at the method that gathers pages within Radiant. In short, if you want to do special page finding, create a subclass of Page
and write your own find_by_url
method to adjust the way Radiant behaves. Every page will respond to this method and return appropriate pages according to the requested url. In the admin interface, you can select your special page type to make that page behave as you have specified.
Simple finding
find_by_url
is defined both as a class method and an instance method. Let's look at the class method Page.find_by_url
from the Page model:
class << self
def find_by_url(url, live = true)
root = find_by_parent_id(nil)
raise MissingRootPageError unless root
root.find_by_url(url, live)
end
# ...
end
First, it looks for the root page, which is considered the page with no parent_id
. If no root page is found it raises a MissingRootPageError
exception; otherwise, it calls the instance method find_by_url
on the root page.
This class method takes 2 arguments: the url
(really the path matched in the routes from request) to be found, and a live
flag which defaults to true (more about that later).
Finding the first page
The find_by_url
instance method is a bit more complex. Let's take a look:
def find_by_url(url, live = true, clean = true)
return nil if virtual?
url = clean_url(url) if clean
my_url = self.url
if (my_url == url) && (not live or published?)
self
elsif (url =~ /^\#{Regexp.quote(my_url)}([^\\/]*)/)
slug_child = children.find_by_slug($1)
if slug_child
found = slug_child.find_by_url(url, live, clean)
return found if found
end
children.each do |child|
found = child.find_by_url(url, live, clean)
return found if found
end
file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
file_not_found_names = file_not_found_types.collect { |x| x.name }
condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
condition = \"status_id = \#{Status[:published].id} and (\#{condition})\" if live
children.find(:first, :conditions => [condition] + file_not_found_names)
end
end
Wow. There's a lot going on there and there's room for some refactoring, but for now let's just walk through it.
First, nil
will be returned if the page is virtual?
. A page, by default, is not virtual. This is stored in the database in a boolean field, but you may override this in any subclass of Page that you create. For now, let's assume that your page isn't and won't be virtual and we'll get back to what it means.
Next, we clean the url if the clean
flag is set to true (which it is by default). clean_url
simply ensures that the url being checked is properly formatted and that any doubling of slashes is fixed. So this right//here////
becomes this /right/here/
.
The next step shows us why we clean the url. A local variable is setup to compare against the page's url.
my_url = self.url
if (my_url == url) #...
What is a page's url? It's calculated by the page's slug and the slugs of it's ancestors. In short, if your current page's slug is 'here' and it's parent page is 'right', and that page's parent is the home page (with a slug of '/') then your current page's url
is '/right/here/'.
So we check to see that to see if it is the same as the url in the request. But also, in this comparison, we check to see if the live
flag is set and is false or if the page is published?
.
This live
flag is a bit strange in appearance:
my_url = self.url
if (my_url == url) && (not live or published?)
By default, this not live
returns false (since live
is true by default and we reverse it with not
) so it moves on to published?
. You might set live
to false in other situations, but for now we'll just go with this.
A page is published?
if it's status (as stored in the database) is the 'Published' Status.
So if the incoming url matches the current page's url (which at the first pass is the root or home page), then we return with the current page:
my_url = self.url
if (my_url == url) && (not live or published?)
self
Finding deeper pages
If it isn't true that the incoming url and the current page's url are equal, then we move on to the next step:
my_url = self.url
if (my_url == url) && (not live or published?)
self
elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)
Here it matches the incoming url against a Regexp of the current page's url. When it starts, we're matching the root page which has a url of '/'. If that's the incoming url, it would have been caught in the original if
block, but we ended up at the elsif
. The Regexp that's used matches the next slug in the incoming url. So if the incoming url is '/right/here/' then it will match the slug 'right'.
From that match, we find the current page's children by their slug (remembering that the current page is the root, with a slug of '/'):
elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)
slug_child = children.find_by_slug($1)
If it finds that 'slug_child', then we call find_by_url
on that page to loop down the tree to find the final page that we want (which would be the page that responds to the url '/right/here' or in this simple case, the page with a slug of 'here'). If it finds the page, then it returns the found page:
slug_child = children.find_by_slug($1)
if slug_child
found = slug_child.find_by_url(url, live, clean)
return found if found
end
In that if slug_child
block, the slug_child.find_by_url
acts as a loop. Because every page responds to this method and will do exactly what is happening here for the root page, each page will search it's children for a slug matching the slug from the incoming url and any found page will likewise call find_by_url
to search it's children as well.
There is some room here for some optimization in the way we do a lookup for a page, but for now it works and we can get to the refactoring another time.
When no slug is found: customizing the finder
If the slug_child
is not found (and no child matches that slug) then this if slug_child
block is never hit and we move to the next step. This is where the magic happens for subclasses of Page:
children.each do |child|
found = child.find_by_url(url, live, clean)
return found if found
end
It asks each child of the current page if it responds to find_by_url
and returns any found page.
So even if none of the pages are found by the slug, we still ask the children if they respond to find_by_url
. Why would we do this?
The answer lies in one of the included extensions: Archive.
The ArchivePage is a subclass of page which provides it's own find_by_url
method. The ArchivePage#find_by_url
will check the incoming url for it's details and if it meets certain requirements (namely that there is a standard date format in the url such as 'articles/2010/06/22') then it will find the appropriate page type such as ArchiveDayIndexPage
, ArchiveMonthIndexPage
or ArchiveYearIndexPage
and return the proper page. If none of those are found it just calls super
and calls the original Page#find_by_url
.
This can act as your router for your custom page types. If you want to return a particular page type, such as a ProductsPage
and your url is '/products/1234' then you can create a ProductPage which has it's own find_by_url
method and would find your ProductDetailsPage
to display a standard view of all of your products based upon the slug '1234' which I'd assume would be a product id, but could be anything you want.
Handling 404
Lastly, if none of this finds any pages to return, Radiant has a FileNotFoundPage
page which allows you to easily create your own 404 error message for content that isn't found. You can subclass a FileNotFoundPage
page to provide your own behavior there too. But when searching for a match to an incoming url, Radiant will find deeply nested 404 pages. So you can create a FileNotFoundPage
as a child of your root page, but you can also create a FileNotFoundPage
as a child of your ProductsPage
to return an appropriate message to someone looking for '/products/not-a-valid-url'.
Here's the code for that last step:
file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
file_not_found_names = file_not_found_types.collect { |x| x.name }
condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
condition = "status_id = #{Status[:published].id} and (#{condition})" if live
children.find(:first, :conditions => [condition] + file_not_found_names)
The live
flag comes into play here again and optionally allows you to find pages that are not published. By default live
is true, so in this instance we only check for a FileNotFoundPage
that is published.
Radiant has a 'development' mode which would find unpublished pages, but that's a subject for another discussion.
I hope this gives you a good understanding of how Radiant finds its content, and how you can easily bend it to behave differently by creating a subclass of Page
and writing your own find_by_url
method. If I've left anything out or if you want me to cover some other aspect, let me know in the comments.