Collecting requirements: Tedious, but fun

My client finally decided on a back end for a central storage solution for the users: LDAP.

With that said, I have decided to use RubyCAS server and client for my user authentication scheme. It supports SSL, which is a big win for security.

With that knowledge, I can now work on how to handle users. Fortunately, the role system isn't all too complex: there's users, and there's admins. That's it.

Users shall be able to manage documents in the app, and admins can, additionally, configure the application.

However, I wonder if I really need an admin panel for this application. After all, I could use YAML for application configuration, where settings are necessary. Which won't be a whole lot of options, either, as the application is simple.

Let's recap for a bit:

My client wants a document management system, that stores electronic representations, and the physical location of documents.

To achieve that, users need to be able to add documents (obvious), and add metadata.

Of course, we are using computers here, so we should automate as much as possible. A prime candidate is importing the documents.

How can we do that?
Basically, there are three options coming to my mind:

  1. Using an application running on the client computers that scans the client, gathers new documents, and pushes that information into the web application.
  2. Mounting a network share (via SMB or NFS, for example), and put the electronic documents on that, and have the web application scan this network share to import the documents.
  3. Use a script that scans an upload directory for new documents, and add them to the web application's dataset.
Let's look at these approaches in detail:

Number 1 has several downsides: It requires an install on the client, it requires syncronization of data between the server and the client to figure out what is new (or we need an 'imported' flag), we add to the overhead that is transmitted over the network (not by much, but every little bit matters).
On the upside, we could import only only what the user wants to import, and he can add all the metadata befopre the data is imported.

With approach 2, we have to break the web application out of the webserver's environment, and grant it access to an external (to the web-root) directory. This is a really bad choice from a security standpoint.

Approach number 3 side steps several issues: It allows to import data from an arbitrarily chosen directory, we can hook up virus scanners if we needed, we don't have to expose the webserver more than necessary, we can use the SMB/NFS/whatever security for file transfers, and we don't have to worry about syncronization issues.

Additionally, we can use the server's file system to fill in a bit of metadata (date created, user who created it). And we also don't have to worry about uploads, either, and we can secure the network share via, for example, TLS.

We also don't need to do fancy tricks. The application can do one thing, and do it well, the script does one thing, and do it well.

So, where do administrative tasks figure into this? Apparently, they don't. The web application doesn't need to be configured, or administrate anything as it stands.

So, all we need is to configure the script which directory and sub-directories to scan, how to get the metadata for the files imported, and have it import the data into the database.

The script can also send out alerts if documents need additional information, and provide one or multiple links to the documents needing additional treatment from within the webapplication itself.

That sounds like a good approach, doesn't it?

3 comments:

Anonymous said...

For directory monitoring, perhaps using something like FAM would be a good idea, writing a separate server to set the database information on file creation/modification.

Unknown said...

It's an idea. But it might be overkill.

I don't expect constant polling of directories, rather a regular import job, for batch processing.

I'll take a look, though, so thanks. :)

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!