高見龍

iOS app/Ruby/Rails Developer & Instructor, 喜愛非主流的新玩具 :)

String and Symbol in Ruby

中文版本

As a Ruby or Rails developer, I think you probably have seen some code snippets like this:

1
2
3
4
5
6
7
8
class User < ActiveRecord::Base
  has_many :products
  validates :name, presence: true
end

class Product < ActiveRecord::Base
  belongs_to :user
end

or

1
2
3
4
5
6
7
8
9
10
class ProductController < ApplicationControl
  before_action :find_product

  # .. skip

  private
  def find_product
    @product = Product.find_by(id: params[:id])
  end
end

They are quite common in every Rails project, but what does the :products, :user, :name and :find_product mean? Are they some kind of variables or strings? Can we replace them with variables or strings?

It might be one of the most frequently asked questions while people learning Ruby or Rails, especially for those who came from other programming languages, and even experienced Rails developers may not be able to explain it very well.

This thing is called “Symbol”, which looks like a variable with a prefix colon.

The rule of naming a symbol is almost just like a normal variable, you can use English letters with numbers or underscore, like :my_name or :title32, or non-English character, such as :姓名, :おはよう, even space characters are also fine, but you have to quote them with single or double quotes if you use space as part of the name, like :"hello world"

What’s a Symbol

Symbol is a little hard to understand for Ruby/Rails beginner even have other programming languages background. Some people think it’s a variable, or just a name, but it’s not that simple. You can think the symbol is “an object with a name":

The :name symbol is an instance of Symbol class:

1
2
>> :name.class
=> Symbol

It can be used to represent something. For example, I might define some constants when I write iOS app with Objective-C:

1
2
3
#define OrderStatusPending    0
#define OrderStatusProcessing 1
#define OrderStatusComplete   2

or using Enum:

1
2
3
4
5
enum OrderStatus {
    OrderStatusPending    = 0,
    OrderStatusProcessing = 1,
    OrderStatusComplete   = 2
};

then I can use them just like this:

1
2
3
if (order.status == OrderStatusPending) {
    NSLog(@"order is pending");
}

I can do the same thing in Ruby, but I don’t have to define or declare anything in advance, because the symbol is “an object with a name”, I can use symbols to represent something directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Order
  attr_reader :status

  def initialize(items, status = :pending)
    @items = items
    @status = status
  end

  def compete
    @status = :complete
  end
end

order = Order.new(["item A", "item B", "item C"])

if order.status == :pending
  puts "order is pending"
end

You might be curious about what the heck are :pending and :complete? Actually they are just some objects which represent the state of the pending and complete Order. Just because symbol is “an object with a name”, symbol, as its name, can represent something with this object.

hmmmm… in this case, is that possible if I replace those symbols with strings? Sure you can.

What’s different between Symbol and Variable?

Variable is a name point to an object, like:

1
greeting = "Hello Ruby"

In the above syntax, it means the name greeting is pointing to a string object "Hello Ruby", the greeting name can not live along if there’s no "Hello Ruby" string object to be pointed at.

But Symbol can. Symbol is “an object with a name”, it can live along and can be used even doesn’t point to anything, just like the :pending and :complete example above.

Actually you can NOT use symbol as a variable, that will cause a syntax error.

1
2
>> :name = "eddie"
>> SyntaxError: (irb):27: syntax error, unexpected '=', expecting end-of-input :name = "eddie"

In fact, when you declare a new variable in Ruby, for example:

1
my_name = "eddie"

Ruby will also create a symbol named :my_name in the background, let’s open irb console and do some experiments:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# all_symbols method can list all symbols in Ruby.
# the result will vary with different Ruby version and loaded modules.
>> Symbol.all_symbols.count
=> 3489

# OK, I got 3489 symbols...
# and now I define a local variable named my_name
>> my_name = "eddie"
=> "eddie"

# count again, you will find the count of symbols is increased.
>> Symbol.all_symbols.count
=> 3490

# just for sure, you can check if it really exists in the symbol table?
>> Symbol.all_symbols.map(&:to_s).include?("my_name")
=> true

Not only defining a new variable will create new symbol, but also defining a new method or class.

What’s different between Symbol and String?

One of the most FAQ in my training courses, is “What’s the different between Symbol and String?”.

String is mutable, but Symbol isn’t.

Symbol is a little like string, symbol also has some methods like string, such as lengthupcasedowncase..etc. Sring is mutable, you can change the content of a string if you like. But symbol is not. Let’s try it in irb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# just like string, you can use length method to count the letters.
>> :hello.length
=> 5

# or upcase method to return a symbol with all capital letters.
>> :hello.upcase
=> :HELLO

# let's say we have a "hello" string,
# we can use brackets with index number to get the letter.
>> "hello"[0]
=> "h"
>> "hello"[3]
=> "l"

# and so can symbol
>> :hello[0]
=> "h"
>> :hello[3]
=> "l"

# we can also use brackets with index number to change the letter
>> "hello"[0] = "k"
=> "k"

# but it doesn't work on symbol. because symbol doesn't have the []= method
>> :hello[0] = "k"
NoMethodError: undefined method `[]=' for :hello:Symbol'`

So, you can also think symbol as a kind of immutable string.

Symbol has better performance

In Ruby, when you create a new string, it will ask Ruby to allocate a new memory for it, check this out:

1
2
3
4
5
6
7
8
9
5.times do
  puts "hello".object_id
end

# => 70199659402580
# => 70199659366640
# => 70199659366560
# => 70199659366500
# => 70199659366420

The object_id method will return the unique serial number in the Ruby world, it will vary with different computer or Ruby version.

In the Ruby world, the same object will has the same object id, and objects with the same object id means they are the same object.

And you can see the example above, even the same string object "hello" has different object id and occupied some memory spaces, means they are 5 different objects in Ruby.

But let’s check out symbols:

1
2
3
4
5
6
7
8
9
5.times do
  puts :hello.object_id
end

# => 899228
# => 899228
# => 899228
# => 899228
# => 899228

The result showed they have the same object id, means they’re the same object. When you use the :hello symbol at the first time, Ruby will allocate a memory and create this symbol for you, when you try to access that symbol again, Ruby will retrive it from memory instead of generating a new one, so symbol will cause less memory usage.

Although symbol saves memory, but before Ruby 2.2, the memory can not be recycled automatically, you might have to restart the application to release those memory, so it might cause memory leak issue if you create lots of symbols. After Ruby 2.2, the Symbol GC(Garbage Collection) mechanism was introduced, those symbols who were dynamic generated by to_sym or intern methods can be recycled just like the other objects.

reference: Symbol GC

BTW, in the above example, if you “freeze” the string to make the string immutable, the object id will be the same.

1
2
3
4
5
6
7
8
9
5.times do
  puts "hello".freeze.object_id
end

# => 70314415546380
# => 70314415546380
# => 70314415546380
# => 70314415546380
# => 70314415546380

Comparsion of symbols is faster than strings

Let’s do a little benchmark and check the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
require 'benchmark'
loop_times = 100000000

str = Benchmark.measure do
  loop_times.times do
    "hello" == "hello"
  end
end.total

sym = Benchmark.measure do
  loop_times.times do
    :hello == :hello
  end
end.total

puts "Benchmark"
puts "String: #{str}"
puts "Symbol: #{sym}"

# => Benchmark
# => String: 12.299999999999999
# => Symbol: 5.750000000000002

As you can see, comparing symbols is much faster than string, that’s because symbols are just comparing if they’re the same object(which have same object id). Let’s dig into some Ruby source files, symbol use rb_obj_equal function to do comparison:

1
2
3
4
5
6
7
8
// file: object.c

VALUE
rb_obj_equal(VALUE obj1, VALUE obj2)
{
  if (obj1 == obj2) return Qtrue;
  return Qfalse;
}

and let’s check what happened when comparing strings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// file: string.c

static VALUE
str_eql(const VALUE str1, const VALUE str2)
{
  const long len = RSTRING_LEN(str1);
  const char *ptr1, *ptr2;

  if (len != RSTRING_LEN(str2)) return Qfalse;
  if (!rb_str_comparable(str1, str2)) return Qfalse;
  if ((ptr1 = RSTRING_PTR(str1)) == (ptr2 = RSTRING_PTR(str2)))
    return Qtrue;
  if (memcmp(ptr1, ptr2, len) == 0)
    return Qtrue;
  return Qfalse;
}

In string comparison, Ruby call str_eql function, as you can see Ruby compares the letters one by one. So, the time complexity of string comparison is O(N), it will increase by the length of the string N, but symbol is constantly O(1) because it just compare if they are the same object.

String and Symbol are convertable

String and Symbol classes are both provide some methods to covert to each other:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# to_sym method can convert string to symbol
>> "name".to_sym
=> :name

# or intern method, it's identical with to_sym method
>> "name".intern
=> :name

# you can also use literal notation %s
>> %s(name)
=> :name

# to_s can convert symbol to string
>> :name.to_s
=> "name"

# id2name method do the same thing with to_s
>> :name.id2name
=> "name"

When should use symbol?

So, When should I use strings, when should use symbols?

Symbol as the key of Hash

1
2
>> profile = { name: "eddie", age: 18 }
=> {:name=>"eddie", :age=>18}

The :name and :age are Symbols, for more information you can check this post。

Because the immutable of symbol and lookup and comparison performance is faster than string, it’s very suitable to be the key of Hash.

String has more powerful and useful methods than Symbol

Although you can cover symbol to string if you like, but after all symbol is not string, symbol doesn’t has methods as many as string, so, if you want to utilize those useful methods of String class, choose String.

And if you want to print out something on screen, choose String. Because symbols will be converted to strings implictly when you call some printing method such as puts, or doing string interpolation, which cause some extra method calls and lose a little performance.

Use String or Symbol as parameters?

Check this example:

1
2
3
4
5
6
7
class Cat
  attr_accessor :name
end

kitty = Cat.new
kitty.name = "Nancy"
puts kitty.name       # => Nancy

It can still work if I replace attr_accessor :name with attr_accessor "name".

Some methods use strings as parameters, some use symbols, some can use both, then how can I know which one should I use? The answer is pretty simple, just READ THE FANTASIC MANUAL! Don’t know how to use? just look up the API manual.

Conclusion

Symbol is an easy concept but not so easy to understand, hope this post can help you know more about the symbol in Ruby. Feel free to leave your comment below if you have any question, or there’s something wrong about this post.

Comments