* Encountering Ruby

Introduction

In my 30 years in this business I've seen many programming languages come and go, but I've done some recent coding in Ruby and I think it may have — what term shall I use? — staying power. Ruby was first released in 1994, so it is not exactly new, but because it was first released in Japan without any English documentation, it is only now being explored by English speakers.

Ruby was written by Yukihiro Matsumoto, nicknamed "Matz" online, for the purpose of avoiding some of the more glaring defects in other languages. Unlike Perl, Python and C++, in Ruby everything is an object. And unlike Perl, Ruby programs can still be understood by their authors weeks or months later.

Playing in Interactive Mode

On hearing me say "everything is an object" in Ruby, readers may think I'm exaggerating. But:


13.class
=> Fixnum
13.0.class
=> Float
"my text".class
=> String
[ 1,2,3 ].class
=> Array
String.class
=> Class
String.ancestors
=> [String, Enumerable, Comparable, Object, Kernel]
Class.ancestors
=> [Class, Module, Object, Kernel]
1.class.ancestors
=> [Fixnum, Integer, Precision, Numeric, Comparable, Object, Kernel]

(The above output was created using the "irb" Interactive Ruby utility, more on this below)

Matz (or, if you prefer, Matsumoto-san) believes in the Principle of Least Surprise, which can be taken to mean languages should try to act as one would expect, without necessarily consulting a manual, as well as being as consistent as possible. Ruby certainly meets this criterion — to a greater degree than most languages I've learned over three decades, I find Ruby requires minimum time spent poring over instructions, and maximum time spent producing useful, readable code. I think history will judge Ruby kindly, and possibly even identify it as the first computer language that is kind to its users.

Before going on, I ask that my readers check for the presence of Ruby on their systems, and if necessary acquire Ruby from the first of the reference links provided at the bottom of this page. If you are running Linux, you may find that Ruby is already installed, or you can get it from your installation CDs or online using your distribution's package utility.

NOTE: This article assumes you are running Linux. Those with other operating systems will probably be able to follow along in a general way, with some minor adjustments to code and procedures.

To see if you have Ruby available, open a command shell and type:


$ ruby -v
ruby 1.8.2 (2004-12-25) [i386-linux]

NOTE: Don't type the '$' symbol above, that is just my way of identifying a shell interpreter context.

Now type:


$ irb
irb(main):001:0>

Now you are in the Interactive Ruby utility, so you can experiment to your heart's content. Try these experiments:


111111111**2
=> 12345678987654321

That's a string of nine '1's, and '**' means "raised to the power" in Ruby.

Whoa. Isn't "12345678987654321" more than what one should expect from a 32-bit integer? Well, yes, but this is because, when needed, Ruby automatically segues between the Fixnum class and something called "Bignum":


(111111111**2).class
=> Bignum

Now try this:


111**111
=>
10736201288847422580121456504669550195985072399422480
48047759111756250761957833470224912261700936346214661
03743092986967777786330067310159463303558666910091026
01778558729553962214205731543706973022937535754649410
3400699864397711

Wow. Looking at that example, you may be able to guess why cryptologists love Ruby. Now try this:


0.upto(32) { |n| print "#{2**n}, " }
=>
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
268435456, 536870912, 1073741824, 2147483648, 4294967296,

I will leave additional interactive experiments up to my readers' fertile imaginations. :)

Creating Ruby Executable Files

Now let's create a plain-text file, name it "experiment.rb," make it executable, and put it in any convenient directory. Give it this default content:

#!/usr/bin/ruby -w 
puts "Hello world."

Run it like this (remember you don't type the '$', it simply reminds you to get into a command shell):


$ ./experiment.rb
Hello World

Now let's create something more useful, a simple factoring program (Program 1):

#!/usr/bin/ruby -w 
if ARGV.length == 0
	puts "Usage: integer to factor"
	exit 0
end
q = ARGV[0].to_i
qd2 = q/2
n=2
prime = true
while n <= qd2
	if q % n == 0
		puts n
		prime = false
	end
	n += 1
end
puts "#{q} is prime." if prime

Just copy this program from your browser's display and paste it into a suitable text file. Now let's test it:


$ ./experiment.rb 1234567
127
9721
$ ./experiment.rb 104729
104729 is prime.

Look carefully at the listing for Program 1 above. I am sure you can figure out that ARGV represents an array of user entries passed to the program. But let's test this assumption in the "irb" Interactive Ruby utility:


ARGV.class
=> Array

I cannot leave this subtopic without mentioning that there is another similar array named "ENV," whose purpose I think you can guess. Try these experiments:


ENV.length
=> 41 (on my system)
ENV.each { |key,value| puts "#{key} = #{value}" }
(a list of environment variables and values)

I hope you are getting the impression that I intend with these examples — that Ruby does what you would expect in most cases, thus meeting its creator's goal of minimizing surprise.

Here is another program (Program 2), one with a twist. It finds palindromes in a list of words:

#!/usr/bin/ruby -w
# add a new method to the String class
class String
	def isPalindrome(min_length = 5)
		# Match words that:
		# are not words that repeat one character
		self !~ /^(\w)\1*$/i &&
		# are long enough to be interesting
		length >= min_length &&
		# have at least one vowel
		self =~ /[aeiouyw]/i &&
		# and ... ahem ... are a palindrome
		self.downcase == reverse.downcase
	end
end
# find palindromes in a file
def findPalindromes(source,min_len)
	words = 0
	list = Array.new
	File.foreach(source) { |word|
		word.chomp!
		if word.isPalindrome(min_len)
			list << word
		end
		words += 1
	}
	return words, list
end
min_length = 5
words, list = findPalindromes("/usr/share/dict/words",min_length);
puts list.sort.join(", ")
puts "words: #{words}, palindromes: #{list.length}"

Again, copy this program into a suitable text file and run it. If you are not running Linux, you will have to establish a directory path to a dictionary file, but for most Linux users the program will work without any changes.

Now satisfy yourself that it finds palindromes (words that read the same forward and backward, words like "kayak"), then carefully examine the listing. Notice that I seem to be creating a class named "String" near the top of the listing. But in fact, I am adding a custom method named "isPalindrome" to the existing String class. Ruby allows you to add methods to existing classes, including classes that are part of the language itself.

Notice about this listing that you can tell what is going on without much effort. Unlike Perl, famous for being a write-only language because of its peculiar syntax, and unlike many other languages that would require many more lines of code to accomplish the same result, Ruby programs show a concise efficiency.

I have a personal test for languages, based on long experience reading, processing and writing files. My test is: How short is the code to get a named file's entire contents into a string? Here's the Ruby code:


str = File.read("/path/file")

That's pretty short. Placing an entire file into a string isn't always the best thing to do, in particular when dealing with huge files, but for lexical analysis (as one example) it is really the best approach. I find I must regularly slurp up (to borrow Larry Wall's favorite expression for this activity) entire Web pages and then process their contents without regard to line breaks.

Data Filter

Now for a program (Program 3) that processes a classic data source — records delimited by linefeeds and fields delimited by tabs — and creates a Web page out of the table. Here is the code:

#!/usr/bin/ruby -w
# filter: TSV data to HTML table
class TsvToHTML
	@@styleBlock = <<-ENDMARK
	<style type='text/css'>
	td {
		border-left:1px solid #000000;
		padding-right:4px;
		padding-left:4px;
		white-space: nowrap;
	}
	.cellTitle {
		border-bottom:1px solid #000000;
		background:#ffffe0;
		font-weight: bold;
		text-align: center;
	}
	.cell0 { background:#eff1f1; }
	.cell1 { background:#f8f8f8; }
	</style>
	ENDMARK
	def TsvToHTML::wrapTag(data,tag,modifier = "")
		return "<#{tag} #{modifier}>" + data + "</#{tag}>\n"
	end # wrapTag
	def TsvToHTML::makePage(source)	
		page = ""
		rowNum = 0
		source.readlines.each { |record|
			row = ""
			record.chomp.split("\t").each { |field|
				# replace blank fields with &nbsp;
				field.sub!(/^$/,"&nbsp;")
				# wrap in TD tag, specify style
				row += wrapTag(field,"td","class=\"" +
				((rowNum == 0)?"cellTitle":"cell#{rowNum % 2}") +
				"\"")
			}
			rowNum += 1
			# wrap in TR tag, add row to page
			page += wrapTag(row,"tr") + "\n"
		}
		# finish page formatting
		[ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html" ].each { |tag|
			page = wrapTag(@@styleBlock,"head") + page if tag == "html"
			page = wrapTag(page,*tag)
		}
		return page
	end # makePage
end # class
# stdin -> convert -> stdout
print TsvToHTML.makePage(STDIN)

This listing would be a lot shorter were it not for the style block at the top, necessary to make the HTML page look the way I want. But even so, for what it accomplishes, this is a very short listing.

Notice about this program that there are some very handy Ruby-specific methods in use. For example, in this code:


		# finish page formatting
		[ [ "table","cellpadding=0 cellspacing=0 border=0" ], "body","html" ].each { |tag|
			page = wrapTag(@@styleBlock,"head") + page if tag == "html"
			page = wrapTag(page,*tag)
		}

Notice that one of the array's elements is itself an array. And if you examine the listing carefully, you will realize that, later on, this array element is applied to a method argument declared as "*tag". As it turns out, in Ruby the asterisk in this declaration means that, if an array is supplied as an argument, the method will expand the array into individual arguments. Then you will notice that the method "wrapTag" will accept an optional argument named "modifier."

In essence this means my having provided an array instead of a scalar causes a third argument to be supplied to "wrapTag," which is an easy way to specify optional HTML tag arguments.

The last line in the program makes this a "filter," a program that receives its data from standard input and that pipes its result to standard output. To use it as written, one would do this:


$ ./experiment.rb < data.tsv > output.html

If the file "data.tsv" is a database with tabs separating fields and linefeeds separating records, the resulting file "output.html" will be a rather attractive listing of the table's contents. Here's an example of the output:

Title Comment Page Pocket pk

Alien Versus Predator 1 Sanaa Lathan, Raoul Bova, Lance Henriksen 1 1 1

Alien Versus Predator 2 Sanaa Lathan, Raoul Bova, Lance Henriksen 1 2 2

Aviator 1 Leonardo DiCaprio, Cate Blanchett, Kate Beckinsale 1 3 3

Aviator 2 Leonardo DiCaprio, Cate Blanchett, Kate Beckinsale 1 4 4

Batman Begins Christian Bale, Michael Caine, Liam Neeson 2 1 5

Beyond The Sea Kevin Spacey, Kate Bosworth 2 2 6

I created this filter because, notwithstanding my having written one of the better-known free HTML editors (Arachnophilia), I am rather tired of writing HTML pages, and one of my recent goals in life is to avoid writing HTML by hand, automating page creation whenever possible. I was able to create and test this Ruby page generator in a matter of hours.

Tree Search

Under Linux, there are utilities to search for files by name (find), or by content (grep). As it turns out, if you want to locate files that have both particular names and particular contents, either find or grep can be used, but in neither case is the program suited to the task, and they are difficult to manage when used this way.

Because of languages like Ruby, these older utilities seem dated, indeed their presence and behavior reminds programmers of the older, hierarchical structure of computer programming, where an end-user's requirements were met by someone wearing a white lab coat and living behind glass.

When I want to search for files based on both name and content, I would prefer not to use a program that has a staggering number of command-line options, seemingly written by someone intent on creating a miraculous glove meant to fit any conceivable hand:

Program Command-line options

grep 47

find 89

Having tried to use both find and grep for something that neither is particularly good at, daunted by trying to remember the meaning of all those command-line options, and recently inspired by learning Ruby, I decided to write something a little easier to use. Here it is:


#!/usr/bin/ruby -w 
require "find"
if(ARGV.length == 0)
	puts "usage: path -f (fn filter) -s (search str) -h (strip HMTL) -v (rev. match)"
	exit 0
end
# default values
path = "." # search path
fnFilter = ".*" # filename filter
search = ".*" # search contents for this
reverse = false # list only no-match files
striptags = false # strip HTML tags before match
oldarg = ""
# process command-line args
ARGV.each { |arg|
	if path.length == 0 && FileTest.exist?(arg) # path
		path = arg
	elsif oldarg == "-f" # filename filter
		fnFilter = arg	
	elsif oldarg == "-s" # search string
		search = arg
	elsif arg == "-v" # reverse search
		reverse = true
	elsif arg == "-h" # strip HTML tags
		striptags = true
	end
	oldarg = arg
}
files = 0
matches = 0
begin
	Find.find(path) { |fn|
		# if fn is a file and if it matches filter
		if FileTest.file?(fn) && fn =~ /#{fnFilter}/
			files += 1
			# read the file
			data = File.read(fn)
			# strip HTML tags
			if striptags
				data.gsub!(/<.*?>/,"")
			end
			# look for search string
			r = (data =~ /#{search}/i)?true:false
			# test for reverse logic
			if r != reverse
				puts fn
				matches += 1
			end
		end
	}
	puts "Files #{files}, Matches #{matches}"
rescue StandardError
	puts "Error: #{$!}"
end

Granted this program's limitations, it has the advantage that, if I forget how it works, I can look at its plain-text source, and I can change how it behaves in a flash. It also has the advantage that, if I should lose the source, it might take 20 minutes to recreate it.

I can't leave this example without explaining that it touches on one of my favorite issues: the gradual breakdown of the distinction between writing programs and using programs. Ruby makes it so easy to create task-specific code that I think much of the archive of pre-written utilities may in time wither away from lack of use.

Read, Edit, Write

This represents another task I find myself doing again and again. I sometimes must search a tree of files and edit each of them in the same way — many files, one change. I've used Perl for this in the past, but the Ruby code is much cleaner and easier to understand and modify:


#!/usr/bin/ruby -w
require "find"
path="/path/to/files"
Find.find(path) { |f|
	if f =~ /\.html?$/ # if it's an HTML page
		data = File.read(f) # read the file
                # put the desired regexp here
		changed = data.gsub(/find/,"replace") 
		# write only changed files
		if(changed != data)
			File.new(f,"w").write(changed)
		end
	end
}

People who think about using programs like this one should be aware that it can wipe out an entire directory tree of files if given an ill-constructed regular expression, so always make a backup of the files being processed.

Network Examples

Now for a program (Program 4) that can read Web pages and other content from the Internet. Here is the listing:


#!/usr/bin/ruby -w 
require 'net/http'
resp = Net::HTTP.new('www.google.com', 80).get('/',nil )
puts "Code = #{resp.code}, Message = #{resp.message}"
resp.each { |key, val| printf "%-18s = %-40.40s\n", key, val }
puts "Data length: #{resp.body.length}"
print "Show data (y/n):"
reply = STDIN.readline.chomp!
if reply =~ /[Yy]/
	puts resp.body
end

If you run this program, you will see something like this:


Code = 200, Message = OK
cache-control      = private
date               = Thu, 29 Dec 2005 03:17:30 GMT
content-type       = text/html
server             = GWS/2.1
set-cookie         = PREF=ID=05b7af4fc753f4ec:TM=1135826250:L
transfer-encoding  = chunked
Data length: 2518
Show data (y/n):

Now that's a small program with a big result! In fact, if you look at the listing carefully, you'll realize that all the lines past this line —


resp = Net::HTTP.new('www.google.com', 80).get('/',nil )

— are merely ways to process the data acquired in that one line.

I have used this program to quickly scan through long lists of off-site URLs to verify that they are valid, or, if not, why not. This is a big improvement over the old way of verifying URLs, which I will leave up to the reader's imagination.

Moving right along, here is a listing (Program 5) that represents a complete HTTP server:


#!/usr/bin/env ruby
# this HTTP server logs its actions to stdout
require 'webrick'
include WEBrick
# Create server
server = HTTPServer.new(
  :Port            => 8000,
  :DocumentRoot    => "/path/DocumentRootDirectory"
)
# When the server gets a control-C, kill it
trap("INT"){ server.shutdown }
# Start the server
server.start

You may not believe this, but if you run the program in listing 5 (editing the listing to point to a directory with one or more Web pages), then run your favorite browser like this —


firefox localhost:8000

— this mini-server will supply the browser with your Web pages, and it will also print a running account of what it is doing. This is a terrific way to discover what HTTP servers do behind the scenes.

Conclusion

This is meant to be a quick introduction to Ruby, and hopefully to encourage my readers to familiarize themselves with this terrific language. Please click some of the links in the resources section below to get a better grasp of Ruby.

I must say, having started out writing assembly-language programs for the Apple II in the mid-1970s and having sampled most of the newer languages since then, I am amazed at the speed with which I can create working code using Ruby. I am also amazed at the number of programs I've written over the years that I cannot even run any more, including an embarrassing number of comparatively recent Java programs that are no longer compatible with current Java runtime engines, something that should not have happened, at least not so quickly.

My point? I think it's unavoidable that programs will go out of date, either because they can no longer be compiled or run, or because the problem they were meant to address has disappeared, or both. Because of this reality, I think it's a good idea to minimize the effort required to write computer programs, since history tells us they will have to be written over again. In my view, Ruby represents a big step toward that goal.

I am not recommending Ruby as an across-the-board replacement for such core languages as C++ (at the time of writing Ruby is interpreted, not compiled). I do think it is a much better choice than either Java or Perl, for the kinds of projects these languages are applied to.

I also think Ruby speaks for itself, eloquently. It creates a nicer impression when used than any article can produce, including this one. :)

Be sure to browse the drop-down lists on this page for more Ruby code examples.

References

Title	Comment	Page	Pocket	pk
Alien Versus Predator 1	Sanaa Lathan, Raoul Bova, Lance Henriksen	1	1	1
Alien Versus Predator 2	Sanaa Lathan, Raoul Bova, Lance Henriksen	1	2	2
Aviator 1	Leonardo DiCaprio, Cate Blanchett, Kate Beckinsale	1	3	3
Aviator 2	Leonardo DiCaprio, Cate Blanchett, Kate Beckinsale	1	4	4
Batman Begins	Christian Bale, Michael Caine, Liam Neeson	2	1	5
Beyond The Sea	Kevin Spacey, Kate Bosworth	2	2	6