A lot of the sites that we visit on the web today are generated by a script of some description and a great deal of them will use a database in one form or another. Like it or loathe it, building pages dynamically from databases is a technique that is here to stay. There are two main types of database; flat-file and relational.
Types of Database
A lot of the sites that we visit on the web
today are generated by a script of some description and a great deal of them
will use a database in one form or another. Like it or loathe it, building
pages dynamically from databases is a technique that is here to stay. There are
two main types of database; flat-file and relational.
Flat-File
The flat-file style of database are ideal for
small amounts of data that needs to be human readable or edited by hand.
Essentially all they are made up of is a set of strings in one or more files
that can be parsed to get the information they store; great for storing simple
lists and data values, but can get complicated when you try to replicate more
complex data structures.
That’s not to say that it is impossible to
store complex data in a flat-file database; just that doing so can be more
costly in time and processing power compared to a relational database. The
methods used for storing the more complex data types, are also likely to render
the file unreadable and un-editable to anyone looking after the database.
The typical flat-file database is split up
using a common delimiter. If the data is simple enough, this could be a comma,
but more complex strings are usually split up using tabs, new lines or a
combination of characters not likely to be found in the record itself.
One of the main problems with using flat files
for even a semi-active database is the fact that it is very prone to
corruption. There is no inherent locking mechanism that detects when a file is
being used or modified, and so this has to be done on the script level.
Even if care is taken to lock and unlock the
file on each access, a busy script can cause a “race condition” and it is
possible for a file to be wiped clean by two or more processes that are fighting
for the lock; the timing of your file locks will become more and more important
as a site gets busy.
Database Management (DBM)
The Database Management Layer allows script
programmers to store information as a pair of strings; a key, which is used to
find the associated value.
Essentially, a DBM adds more functionality and
better sorted during storage to the binary flat-files that it uses. There are
several versions of DBMS available, but the most popular is the Berkley
Database Manager; also known as the Berkley DB.
The Berkley DB is an improvement over normal
flat-files, as it provides a way for programmers to use the database without
having to worry about how the data is stored or how to retrieve the values.
Retrieval of data using the Berkley DB is often
much faster than from a flat-file, with the time savings being made by storing
data in a way that speeds up the locating of a specific key-value pair.
Creating, editing and deleting data when using
the Berkley DB is actually quite simple; once the database has been tied to the
script you just use and manipulate the variables as normal.
The problem of file locking that plagues
flat-file databases is still apparent when using DBM, so you should still take
care when planning scripts that utilize it.
Relational
The relational databases such as MySQL,
Microsoft SQL Server and Oracle, have a much more logical structure in the way
that it stores data. Tables can be used to represent real world objects, with
each field acting like an attribute. For example, a table called books could
have the columns title, author and ISBN, which describe the details of each
book where each row in the table is a new book.
The “relation” comes from the fact that the
tables can be linked to each other, for example the author of a book could be
cross-referenced with the authors table (assuming there was one) to provide
more information about the author. These kind of relations can be quite complex
in nature, and would be hard to replicate in the standard flat-file format.
One major advantage of the relational model is
that, if a database is designed efficiently, there should be no duplication of
any data; helping to maintain database integrity. This can also represent a
huge saving in file size, which is important when dealing with large volumes of
data.
Having said that, joining large tables to each
other to get the data required for a query can be quite heavy on the processor;
so in some cases, particularly when data is read only, it can be beneficial to
have some duplicate data in a relational database.
Relational databases also have functions “built
in” that help them to retrieve, sort and edit the data in many different ways.
These functions save script designers from having to worry about filtering out
the results that they get, and so can go quite some way to speeding up the
development and production of web applications.
Database Comparisons
In most cases, you would want your database to
support various types of relations; such databases, particularly if designed
correctly, can dramatically improve the speed of data retrieval as well as
being easier to maintain. Ideally, you will want to avoid the replication of
data within a database to keep a high level of integrity, otherwise changes to
one field will have to be made manually to those that are related.
While several flat-files can be combined in
such a way as to be able to emulate some of the behaviors’ of a relational
database, it can prove to be slower in practice. A single
connection to a relational database can access
all the tables within that database; whereas a flat file implementation of the
same data would result in a new file open operation for each table.
All the sorting for flat-file databases need to
be done at the script level. Relational databases have functions that can sort
and filter the data so the results that are sent to the script are pretty much
what you need to work with.
It is often quicker to sort the results before
they are returned to the script than to have them sorted via a script, few
scripting languages are designed to filter data effectively and so the more
functions a database supports, the less work a script has to do.
If you are only working with a small amount of
data that is rarely updated then a full blown relational database solution can
be considered overkill. Flat-file databases are not as scalable as the
relational model, so if you are looking for a suitable database for more
frequent and heavy use then a relational database is probably more suitable.
Limitations of a File System
Separated and Isolated Data - Makes
coordinating, assimilating and representing data difficult
Data Duplication - Wastes space and can lead to
data integrity (inconsistency) problems
Application Program Dependencies - Changes to a
single file can require changes to numerous application programs
Incompatible Files
Lack of Data Sharing - Difficult to control
access to files, especially to individual portions of files