Following the rules for East-oriented Code helps me organize behavior in my code but it can lead to other benefits as well. As a result of following the rules, I find that my code is better prepared for restrictions like that which immutable objects introduce.
I recently went looking for samples of how people are using instance_eval
and instance_exec
and ended up with a great example from FactoryGirl thanks to Joshua Clayton. As I was searching, I came upon some code which happened to use instance_eval
. Although it was a simple use case for that method it lent itself as a much better example of commands, immutability, and East-oriented code.
Here's the details...
If we want to use nested blocks to create a tree structure, we might create some pseudo-code like this to illustrate our desired code:
Node.new('root') do
node('branch') do
node('leaf')
node('leaf2')
node('leaf3')
end
end
The representation of this set of objects should look something like this:
"['root', ['branch', ['leaf', 'leaf2', 'leaf3']]]"
This shows that the created tree is a pair of a named node and an array of named children (who can also have children).
Imperative approach
A simple solution is to initialize a Node
and, using an imperative approach, to change its state; that is to say that we alter its collection of children.
class Node
def initialize(name, &block)
@name = name
@children = []
instance_eval(&block) if block
end
attr_reader :children, :name
def node(name, &block)
children << Node.new(name, &block)
end
end
When each node is created, its collection of children
is set to an empty array. With each call to the node
method, a new Node
object is created and shoveled into the collection of children.
If we refactor our sample to inline the methods and show us exactly what's going on, it would look something like this:
Node.new('root') do
self.children << Node.new('branch') do
self.children << Node.new('leaf')
self.children << Node.new('leaf2')
self.children << Node.new('leaf3')
end
end
We can more clearly see what's happening inside of the node
method with this change to our code.
Eastward flow
As I worked with this problem I wondered: what would happen if I started following the 4 rules of East-oriented code?
If our node
method returns self
, how does that affect our code?
class Node
# initialize omitted...
def node(name, &block)
children << Node.new(name, &block)
self
end
end
Fortunately, because our code relies on an imperative approach by changing the state of the children
, the code still works.
If we want, we can shrink the space we use by chaining commands together:
t = Node.new("root") do
node("branch") do
node("subbranch") do
node("leaf").node("leaf2").node("leaf3")
end
end
end
I think that's actually a little more difficult to read, so we can go back to the regular style:
node("leaf")
node("leaf2")
node("leaf3")
When seeing techniques like returning self
to encourage an East-oriented approach, it's easy to fixate on the chaining. But it's commands that we want to introduce, not chaining. The chaining is incidental here.
If you do chain your method calls together, it at least appears more clearly that each subsequent method is operating on the return value of the last one.
If we want to be clear that we're operating on the last return value, we can maintain the readability of the multiline option by writing it like this:
node("leaf").
node("leaf2").
node("leaf3")
Each line chains the next by adding the dot character. We don't have a specific need to do this, but it's good to know how it works.
Not much has changed after introducing our East-oriented approach. We're still updating that collection of children
.
Introducing immutability
What will we see if we introduce immutable objects to our solution?
Immutable objects might just help us make our code more predictable. An object which never changes, of course, stays the same. This allows you to better handle the behavior of the system and, without changing any objects, makes a multithreaded approach much less likely to introduce headaches.
The simplest way to add immutability is to freeze objects as they are initialized:
class Node
def initialize(name, &block)
@name = name.freeze
@children = [].freeze
instance_eval(&block) if block
end
attr_reader :children, :name
def node(name, &block)
children << Node.new(name, &block).freeze
self
end
end
This, of course, breaks everything. Our code relies upon the fact that the children
array may be mutated. Instead of doing the mutation, we'll see this:
RuntimeError: can't modify frozen Array
Now what?
If we can't alter the collection, we're left at creating an entirely new one.
One thing we could do is change the constructor to accept a collection of children when the Node
is initialized. Instead of altering the children, we'd use a constructor like this Node.new(name, chlidren)
. Here's what that looks like:
class Node
def initialize(name, children=[], &block)
@name = name.freeze
@children = children.freeze
instance_eval(&block) if block
end
# ... omitted code
end
That still doesn't allow us to change anything until we also change the way our node
method works (since it is responsible for handling changes to the children).
If the node
method created a new Node
instead of altering the children, that would get us what we want. Let's break it down.
First, when the node
method is called, it needs to create the node to be added to the collection of children:
def node(name, &block)
new_child = Node.new(name, &block)
# ... ?
self
end
Since we're trying to avoid mutating the state of this object, we don't want to just shove the new node into the collection of children (and we can't because we used freeze
on it).
So let's create an entirely new node, with an entirely new collection of children. In order to do that, we need to ensure that for every existing child object, we creat a corresponding new node.
For each command to the object with node
, we'll get the representation of what the children should be. So let's build a method to do that:
def next_children
children.map{|child| Node.new(child.name, child.next_children) }.freeze
end
When we changed our initializer, that allowed us to set the list of children. Our new next_children
method relies on that feature and a recursive call to itself to build the collection of children for that new node with Node.new(child.name, child.next_children)
.
Looking back at our node
method we'll need to break the rules of East-oriented Code. Since we have immutable objects, we'll return a new node instead of self
.
def node(name, &block)
new_child = Node.new(name, &block)
Node.new(self.name, next_children + [new_child])
end
But there's still a problem left. Because we need our initialized object to execute a block and the contstructor new
might actually need to return a different object than the one originally created. The call to node
inside the block changes the return value from the instance that new
creates, to the instance that node
creates.
Controlling the constructor
To better handle our immutable objects and the return values from the methods we created, we can alter the way the new
method works on our Node class.
Instead of handling a block in the initialize
method, we can move it to new
.
Here's the new new
method:
def self.new(*args, &block)
instance = super.freeze
if block
instance.instance_eval(&block)
else
instance
end
end
The first step is to call super
to get an instance the way Ruby normally creates them (as defined in the super class of Node). Then we freeze
it.
If we haven't provided a block to the new
method, we'll want to return the instance we just created. If we have provided a block, we'll need to evaluate that block in the context of the instance we just created and return it's result.
This means that the block can use the node
method and whatever is returned by it.
We need to alter the new
method this way because we're not always just returning the instance it creates. Since our objects are frozen, we can't allow the block to alter their states.
The way new
usually works is like this:
def self.new(*args, &block)
instance = allocate
instance.send(:initialize, *args, &block)
return instance
end
You can see the reason that Ruby has you call new
on a class but in practice you write your initialize
method. This structure ensures that no matter the result of your initialize
method, new
will always return an instance of the class you've used.
We're bending the rules to allow us to evaluate the given block and return its result, instead of the instance typically created by new
.
After that, we can remove the block evaluation from initialize
:
def initialize(name, children=[])
@name = name.freeze
@children = children.freeze
end
While the method signature (the list of accepted arguments) has changed for initialize
, it's still the same for new
: a list of arugments and a block.
Believe it or not, there's still one more problem to solve.
Operating on values
We looked at how returning self
allows you to chain your method calls. Although we've broken that rule and are instead returning a new Node object, it's important to consider that chaining.
Our initial code still doesn't work quite right and it's all because we need to think about operating on the return values of our commands and not relying on an imperitive approach to building and changing objects.
First, here's what our Node class looks like:
class Node
def self.new(*args, &block)
instance = super.freeze
if block
instance.instance_eval(&block)
else
instance
end
end
def initialize(name, children=[])
@name = name.freeze
@children = children.freeze
end
attr_reader :children, :name
def node(name, &block)
new_child = self.class.new(name, &block)
self.class.new(self.name, next_children + [new_child])
end
def next_children
children.map{|child| self.class.new(child.name, child.next_children) }.freeze
end
def inspect
return %{"#{name}"} if children.empty?
%{"#{name}", #{children}}
end
end
We didn't discuss it, but there's an inspect
method to return either the name of the node if it has no children, or the name and a list of children if it has some.
Here's what the code to create the tree looks like:
Node.new('root') do
node('branch') do
node('leaf')
node('leaf2')
node('leaf3')
end
end
If we assign the result of that to a variable and inspect it we'll get a surprising result.
t = Node.new('root') do
node('branch') do
node('leaf')
node('leaf2')
node('leaf3')
end
end
puts [t].inspect
The output will only be
["root", ["branch", ["leaf3"]]]
So what happened to the other leaf
and leaf2
objects? Why aren't they there?
Remember that each node
call returns a new node. With every node
a new result is returned. The node('leaf')
returns an object, but node('leaf2')
is not a message sent to the object returned by the first. It is a message sent to the node('branch')
result.
Each of those calls is returned and forgotten. Here it is annotated:
t = Node.new('root') do
node('branch') do
node('leaf') # returned and forgotten
node('leaf2') # returned and forgotten
node('leaf3') # returned and used as the final result
end
end
puts [t].inspect
#=> ["root", ["branch", ["leaf3"]]]
The answer to this problem is to command each object to do the next thing. We can achieve this by chaining the methods. The result of one method is the object which will receive the next command.
t = Node.new('root') do
node('branch') do
node('leaf'). # dot (.) charater added to chain
node('leaf2'). # executed on the result of the last node
node('leaf3') # executed on the result of the last node
end
end
puts [t].inspect
#=> ["root", ["branch", ["leaf", "leaf2", "leaf3"]]]
An alternative way to look at this is to store the result of each command:
t = Node.new('root') do
node('branch') do
branch = node('leaf')
next_branch = branch.node('leaf2')
final_branch = next_branch.node('leaf3')
end
end
puts [t].inspect
#=> ["root", ["branch", ["leaf", "leaf2", "leaf3"]]]
Following the rules so you know when to break them
What was interesting about this to me was that my code was prepared for the immutable objects when I prepared it to operate on the same one. By structuring my code to return self
and send the next message to the result of the last, I was able to change the implementation from an imperative style to a functional style.