Who's the idiot disabling SSL on posterous.com?

Has he been fired yet? If not, why not?

30 Years

The things that happened in the 30 years I’m around on this ball o' mud, in roughly chronological order:

  • The fall of the Berlin Wall, and the end of the Cold War
  • The end of border checks within Europe
  • Two expansions of the European Union
  • The Internet moving from curiosity to being ubiquitous, and grander than anything cyber-prefixed could ever have been
  • 9/11
  • The Euro
  • Germany’s first involvement in military action as a combatant since WW2 in the Bosnia War
  • Mobile phones becoming as ubiquitous as the Internet, if not more so
  • The rise and fall of EPOC/Palm
  • 9 versions of the Windows OS (From 3.11 to NT to Windows 7)
  • Linux becoming less picky about who its friends are (thanks, Canonical!)
  • The rise of the Chinese Dragon
  • Making friends all over the world (Hi, folks!)
  • Live shows on the internet that I watch every day (Hi, TWiT.TV!)

A happy birthday indeed.

Lazy Ruby Cryptography

What is Cryptography?

Cryptography is the art of making clear text illegible by anyone but the original recipients. It is an art that is at least as old as the Roman Repulic, when Julius Caesar used the Caesar cipher to encrypt orders to his centurions in the Gallic wars.

Since then, cryptography has evolved into a science unto itself, which helped kickstart computer science (and thus programming) during World War 2.

I encourage you to discover the history of cryptography. It makes for a good techno-thriller with lots of cloak-and-dagger.

The Two Areas of Cryptography

Cryptogrpahic algorithms are used in two main areas: Validation and encryption. Validation algorithms, or message/hash digests, provide a means to verify that data has not been modified somehow. A rather mundane use of this verification is testing the hash finger print of a file after a download to make sure that the file wasn’t corrupted during the transfer.

Encryption and decryption are used to prevent a third party from reading the message. Online banking or e-commerce websites use this to secure the communication between your browser and their servers when you send payment details.

Get It

You don’t have to install anything, since Ruby includes OpenSSL bindings.

Using Hashes

Important Upfront Note

Unless you have to deal with already existing data, do not use MD5. It is considered broken.

Hashing Data

Consider the following snippet:

require 'digest/sha2'  sha256 = Digest::SHA2.new(256)  sha256.digest("data to be hashed")

After requiring the SHA2 hashing algorithm, we first have to initialize an SHA2 oject with the desired key-length (Ruby supports both 256 and 512 bit keys), before we can digest data.

The key length is important:

sha2_256 = Digest::SHA2.new(256)  sha2_512 = Digest::SHA2.new(512)  sha512_hashed = sha2_512.digest("super secret password")  sha256_hashed = sha2_256.digest("super secret password")  puts sha256_hashed == sha512_hashed

If you run this, you will see “false”. This is obvious enough, since the keylength is different, but it is a gotcha. Make sure you know what algorithm was used to hash the data, otherwise you’ll get false negatives.

The MD5 and SHA1 digests work similarly, but you don’t have to instantiate a new object:

require 'digest/md5' require 'digest/sha1'  Digest::MD5.digest("super secret password") Digest::SHA1.digest("super secret password")

Symmetric Cryptography with AES

Symmetric cryptography, or symmetric-key cryptography means that data is encrypted and decrypted with the same key. That has the benefit that it is much easier to work with, but also means that both the encrypted data and the key used to encrypt this data has to be exchanged somehow. Pulic/Private key (or asymmetrical) cryptography avoids this problem: One key is used to encrypt data, another is used to decrypt data (however, it is theoretically possible to derive one key from another, but there are no known, working attacks of this sort).

Public/Private key encryption deserves its own tutorial, so we will only deal with symmetric cryptography this time.

Let us look at some source code, where we encrypt, and then immediately decrypt clear text:

require 'openssl'  aes = OpenSSL::Cipher.new("AES-256-ECB")  key = sha2_256.digest("super secret password")  aes.encrypt aes.key = key  payload = aes.update("A very secret message.") + aes.final  puts payload  aes.reset  aes.decrypt aes.key = key  puts aes.update(payload) + aes.final

This looks more complex than it really is.

First, we load the OpenSSL bidings, which Ruby uses to do encryption.

Then, we instantiate a new AES cipher object. If you want to know which algorithms Ruy supports, OpenSSL::Cipher.ciphers will tell you.

Now that we have an AES object, we need a key to encrypt data. AES-256 expects an encryption key that is at least 256 bits long (the key size is determined by which AES variant you use). To achieve that, and to make the key harder to guess, we hash a passphrase with SHA2-256.

The next step is to call the encrypt method. That initializes the cryptographical systems needed (like acquiring a source of entropy). Both the decrypt and the encrypt method must be called before any other method!

Finally, we encrypt our data by calling

payload = aes.update("A very secret message.") + aes.final

The important part of this is the aes.final call. This adds the last block of encrypted data to our payload variable plus any necessary padding for AES, which is determined by the algorithm’s block size—the block size are 128 bit chunks of data that each get encrypted, more or less.

To decrypt the data, we reset our AES object, and call decrypt on it. The decryption process works analogous to the encryption. It is symmetric, if you allow me such a lazy pun.

Enhanced Security

If you use the same key to encrypt data several times, you are suverting the security that encryption provides. Sooner or later, parts of the message will repeat themselves and look the same even when encrypted (this is one of the vectors that were used in Bletchley Park by Alan Turing et al to break the Kriegsmarine Enigma system).

Thus, almost all modern ciphers have an Initialization Vector or IV (link is in the Resources section). Ruby cipher ojects have the iv method, which allows you to feed an IV into your cryptography. Make sure that the IV source is random (like Ruby’s rand method), and does not repeat itself.

Do Not Use DES

The DES algorithm has been broken (link below).

Resources

Footnotes

Lazy Ruby Exceptions

Lazy Ruby Exceptions

What Are Exceptions?

Simply put, an exception is an error.

Less simply put: An exception puts a program in a state that is, for lack of a better word, undefined, or cannot—safely—be handled by the language runtime itself.

Get It

Not needed: Ruby has them out of the box.

How do You Use Exceptions?

First things first: Exceptions are not control statements. If you rely on Exceptions to determine what your software is supposed to do, something’s awry.

Use exceptions for:

  • Program state (awfully close to program flow, but I’ll explain later)
  • Resource access
  • Unrecoverable errors

Exception Statements in Ruby

Ruby’s exception syntax is simple, yet powerful:

begin # Awesomeness rescue # Recover from an Exception ensure # Clean up after ourselves end

Two statements in this example are of particular importance:

rescue can take one or more Exception classes (for example: SyntaxError, LoadError, IOError, etc) as an argument. The hash rocket => is used to get a reference to the exception.

ensure is an optional argument, and gets executed no matter what. The code following ensure is, quite literally, ensured to run.

Here’s a toy example:

begin raise "I'm an exception" rescue puts "In the rescue clause..." puts "I give up!" error = "Cannot recover from this!" ensure puts "Ensured to run." puts error end

raise triggers Ruby’s exception handling whenever you need it.

Ensuring Sanity

ensure is useful when you gain access to a resource that should not be left open or lying around. A good example would be a database connection: Instead of leaving an unused database connection around until the database server decides that nobody’s going to use this connection, it gets closed. A good rule of thumb to using ensure is: Am I doing input or output? If so, ensure that my IO access is properly closed.

Eventually this’ll be handled elsewhere, but you cannot rely on it, nor is it very neighborly if we don’t clean up after ourselves.

A Note on Blocks

Ruby provides block methods for IO, like File#read. These blocks have exception handling built in: If you open a file or network socket in a block, Ruby will ensure that it cleans up behind you. So, whenever you can use the black form to access IO (it is a much more natural or readable style, too, in my experience).

Raising and Creating Exceptions

Ruby provides an Exception class from which all other exceptions are derived. I suggest that you, too, derive your custom exceptions from this base.

The base class provides you with backtrace information (where and how was the exception invoked?) at no extra work, which is vey useful:

class CustomException < Exception end  error = raise CustomException.new "Error message" puts error

And as you can see, you can treat an exception like any other object in Ruby with its own constructor.

Exceptions For Program State

Let’s say you have a nifty Ruby program which interfaces with databases. You have wrappers for all the different database engines out there, from SQLite to Oracle, but with littel changes here and there (like DB connections, file access, &c.).

If you don’t yet know which databases are available, you can try to probe for their availability, and if you encounter an exception, you can cross this engine off your list of database wrappers to load.

Another example is checking for teh vailability of gems that are useful, but not necessary for your own program. For example, if you rescue the LoadError if term-ansicolor cannot be found by require, you can still proceed with your program, just without pretty colours.

And, of course, a more mundane use of exceptions and program state is saving a local copy of data when a remote resource becomes unavailable.

Do Not Swallow Exceptions

Exceptions are something the user has to be informed of when they happen (unless you can silently recover, but even then a message is appropriate). Even if you can clean up after yourself, make sure the underlying exception is propagated and kept intact! That you can recover from a specific exception doesn’t mean that everything else can. It’s all abotu being a good neighbor.

Further Reading

Avdi Grimm has written a whole book on Ruby exceptions: Exceptional Ruby: Master the art of handling failure in Ruby. I’ve seen rave reviews of the book, so I recommend checking it out if you have the cash to spare.

Resources

Lazy PDF creation with Prawn: A Tutorial In One Part

What is Prawn?

Prawn is a pure Ruby library to generate PDFs. It takes much of the pain away (though, using a publishing tool like Scribus or Adobe InDesign CS5.5 make a more visual approach much easier. However, to create PDFs on the fly, Prawn is the most convenient tool you can find.

Get It

gem install prawn

Use It

require 'prawn'

Creating PDFs

No matter what the PDF shall contain, you will always use the Prawn::Document.generate method. I prefer the block invocation:

Prawn::Document.generate "example.pdf" do end

Sample Content

I want to show off as many of Prawn’s features, so I’ve prepared some sample data:

# Our headings  heading = "The Traditional Filler Content"  sub_heading = "It Has No Inherent Meaning"  # Our body body = <<-EOS Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed doeiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quisnostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.  EOS  formatted = "<b>Inline</b> <i>formatting</i> is a <strikethrough>useless<strikethrough> <u>useful</u> feature!"

Simple Text And Inline Formatting

PDF is, first and foremost, a text format. Everything else is just sugar (or a liability, as the case may be). Since an introduction to everything Prawn (and PDFs) have to offer would be far from a Lazy Tutorial, we will deal with outputting text as PDF.

Simple Text…

Let’s take our sample text, and create a PDF with a large heading, a smaller sub-heading, and a creation date as cover page, and let’s fill the PDF with a bit of dummy text:

Prawn::Document.generate "example.pdf" do text heading, :align => :center, :size => 48  text sub_heading, :align => :center, :size => 32  text "\nCreated on #{Time.now}\n", :align => :center  start_new_page  10.times do group do text body end end end

As you have already guessed, the text method takes text, and outputs it to the PDF, and it takes the :align and the size arguments. If you look at the freshly created PDF, the headings are larger than the creation date, and all of them are aligned in the center of the document.

start_new_page does exactly what it says on the tin, and starts a new page. text body outputs the text we’ve assigned to the variable. And you will have noticed the group do…end block. This tells prawn to keep the stuff enclosed with group do…end as a group, and, if possible, keep them on the same page. Otherwise, the paragraph would overflow.

…And Inline Formatting

I have hinted at it quite a bit, so it won’t be surprising you when I say that Prawn allows for inline formatting, with the help of the tiniest HTML syntax subset: , , , and are supported. However, Prawn will not automatically parse HTML for you. Instead you have to manually invoke the feature:

Prawn::Document.generate "example.pdf" do # ...  font_size 12 do text formatted, :inline_format => true end end

It’s as simple as that. Just so that it looks like I did some work, I’ve also used the font_size size do…end, which does what it says on the tin for everything that’s within the block.

Custom Fonts

One of the strengths of the PDF format is typography: Preserving the typeface of the text you added. Since not every document is (or can be) limited to the “built in” fonts—see Standard Type 1 Fonts for details which are defined, Prawn lets you define custom font_families:

font_families["Nobile"] = { :normal => "./nobile.ttf", :bold => "./nobile_bold.ttf", :bold_italic => "./nobile_bold_italic.ttf", :italic => "./nobile_italic.ttf" }  text formatted, :inline_format => true, :font => "Nobile"

For typographical / dead tree print reasons, it is expected that you define font faces for bold, italic (or cursive), and italicized bold weights. With font my_font you can assign this font for everything that follows. If you want to limit the font directive, you can use the block form, or a :font => my_font hash.

Pagination and Page Numbering

Few documents are exactly one page long, so it’s no surprise that Prawn offers page numbering:

number_pages "Page  of  pages", :at => [bounds.right - 100,0], :page_filter => :all

Simple enough. :at determines where the numbering is printed, and page_filter determines on which page (:all, :odd, or :even).

Resources

We moved

Yes, we are again open for business, under old management. However, I've moved to my own domain: blog.thimian.com. Please adjust your bookmarks and feedreaders.

See you on the other side. ;)

Story Telling

To provide an application that does what the user wants and does what the user wants, we need, obviously enough, requirements and maybe even a specification (depending on where and how the application gets deployed, if legal requirements have to be figured in, or high availability has to be guaranteed, for example).

For the scope of this project (quick recap: Document management), requirements are enough.

Now, we have to capture these requirements in a way that our client (who isn't necessarily tech-savvy) understands, and that allows us, the developers, to know exactly what has to be done to fulfill the user's needs.

That means we have to use a language that both sides understand and that is free from ambiguities.

Both vanilla English or your native language as well as a programming language are ill suited to this task: Spoken languages are filled with ambiguities, and programming languages are not necessarily understood by the client.

What we need is something that is non-ambiguous, but easily understood.

The solution is rather obvious: An English (or your native language) with limited vocabulary, that can be intuitively understood and has only a few rules, internalized quickly.

For that, we should look at what Extreme Programming has to offer: User stories.

And we can take this to the extreme (so to speak, and pardon the pun) in the Ruby world with RSpec, a BDD testing framework. In the next few posts I'll take a look at RSpec and BDD, and share my thoughts and examples (bear with me: I'm taking a serious look at BDD/RSpec for the first time, myself).

Collecting requirements: Tedious, but fun

My client finally decided on a back end for a central storage solution for the users: LDAP.

With that said, I have decided to use RubyCAS server and client for my user authentication scheme. It supports SSL, which is a big win for security.

With that knowledge, I can now work on how to handle users. Fortunately, the role system isn't all too complex: there's users, and there's admins. That's it.

Users shall be able to manage documents in the app, and admins can, additionally, configure the application.

However, I wonder if I really need an admin panel for this application. After all, I could use YAML for application configuration, where settings are necessary. Which won't be a whole lot of options, either, as the application is simple.

Let's recap for a bit:

My client wants a document management system, that stores electronic representations, and the physical location of documents.

To achieve that, users need to be able to add documents (obvious), and add metadata.

Of course, we are using computers here, so we should automate as much as possible. A prime candidate is importing the documents.

How can we do that?
Basically, there are three options coming to my mind:

  1. Using an application running on the client computers that scans the client, gathers new documents, and pushes that information into the web application.
  2. Mounting a network share (via SMB or NFS, for example), and put the electronic documents on that, and have the web application scan this network share to import the documents.
  3. Use a script that scans an upload directory for new documents, and add them to the web application's dataset.
Let's look at these approaches in detail:

Number 1 has several downsides: It requires an install on the client, it requires syncronization of data between the server and the client to figure out what is new (or we need an 'imported' flag), we add to the overhead that is transmitted over the network (not by much, but every little bit matters).
On the upside, we could import only only what the user wants to import, and he can add all the metadata befopre the data is imported.

With approach 2, we have to break the web application out of the webserver's environment, and grant it access to an external (to the web-root) directory. This is a really bad choice from a security standpoint.

Approach number 3 side steps several issues: It allows to import data from an arbitrarily chosen directory, we can hook up virus scanners if we needed, we don't have to expose the webserver more than necessary, we can use the SMB/NFS/whatever security for file transfers, and we don't have to worry about syncronization issues.

Additionally, we can use the server's file system to fill in a bit of metadata (date created, user who created it). And we also don't have to worry about uploads, either, and we can secure the network share via, for example, TLS.

We also don't need to do fancy tricks. The application can do one thing, and do it well, the script does one thing, and do it well.

So, where do administrative tasks figure into this? Apparently, they don't. The web application doesn't need to be configured, or administrate anything as it stands.

So, all we need is to configure the script which directory and sub-directories to scan, how to get the metadata for the files imported, and have it import the data into the database.

The script can also send out alerts if documents need additional information, and provide one or multiple links to the documents needing additional treatment from within the webapplication itself.

That sounds like a good approach, doesn't it?

Choices, choices, choices: CAS or OpenID?

With RubyCAS and Ruby-OpenID you have two choices to enable authentication for your application.

But which choice is the best one? Or rather the correct one? That depends on your usage scenario.

RubyCAS and OpenID solve, roughly, two different problems:

  • Single Sign On
  • User account management
Solving the Single Sign On problem
This is RubyCAS' strength. If you want to offer multiple applications to your users (be it on the internet, or in an intranet), RubyCAS is the better choice. Since it allows proxy authentication, users only have to sign into their account once, and all applications available to them can be used without retyping their credentials when switching applications.

This is the classic environment prompting the need for SSO solutions in general, and RubyCAS fits the bill (especially since it provides Authenticators for common enterprisey storage solutions, like LDAP).

Simplifying sign up
This is where OpenID shines. User's only have to maintain one set of credentials, and can use it whereever they can log in with OpenID. This is a big bonus for you. No need to store passwords, you can automate account creation at the first sign in of your users (you can request account data like passwords, nicknames, first and last names, etc.), and don't have to worry ( alot) about validation of this data. The user's OpenID provider took care of that for them.

You can of course offer them an OpenID services with your application, allowing them to use the credentials they use for your application to login everywhere else.

However, it seems that OpenID doesn't allow proxy authentication out of the box (you could add it, or maybe the next version will provide support for that, but that is difficult to do in an essentially untrusted network, which leads to things like Kerberos).

So, what should you use?

If you are user-centric, use RubyCAS. Examples of user-centric scenarios would be Google Apps for Domains: One account for all these services.

If you are application-centric use OpenID. Users will only use one or few applications you offer, and you can thusly simplify the process for them, by cutting the amount of username/password credentials your users have to maintain drastically.

Remember, though, that OpenID is not an ID verification service! If you plan to use OpenID in an intranet, you should have users use an OpenID server you provide on the intranet, and not have them authenticate via, say myopenid.com. This also allows you to fine-tune the data stored with OpenID accounts, for example organizational units, supervisors, etc.

As you can see, there is no single correct answer. Neither RubyCAS nor Ruby-OpenID are silver bullets, solving all your account problems. It is a question of what fits your usage-scenario the best.