[olug] [Omaha.pm] Mapping Perl structures to a SQL table...

Todd Christopher Hamilton netarttodd at gmail.com
Fri Oct 30 22:24:04 UTC 2009


Here are some thought:

I have found that when designing a database one needs to be at peace 
with the flexibility / performance balance.  (Come to think of it isn't 
that why we use Perl?  You can write faster code with C but you can 
write code faster with Perl)

If you want high flexibility you might use a Entity-Attribute-Value 
model.  But eventually you come to the point where performance 
requirements force you to think in more ridged terms.  You will then 
apply performance based design principles within the context of the 
information domain.

Since you are converting a file system database to a DBMS structure, you 
already have some structure you can exploit. Files, Directories (which 
are files), file attributes (fixed number of attributes and values). 
The real flexibility comes when you need to store the contents of the 
file. But then again how much performance do you need/  Your performance 
comes in when you are trying find and map the structure. Once you find 
the file base on the attributes you then will display it.

I would start with three tables:
1. Files
2. FileAttributes
2. FileContent

Files would have the following columns
   1. Id
   2. ParentId ( the id of the directory this file is in )
   3. Name
   4...  All your permission attributes like user read permissions, 
group   permission
   5... All you file attributes like creation date time, mod date time, size

Then you second table would contain non indexed attributes of the file 
or directory. Category, Subject, Author, status

Then your third table would be the actual file contents.

You would then use the DBMS to provide performance structure (indexes, 
constraints, aggregation, transactions, journaling)








Mario Steele wrote:
> Heya Dan,
> 
> It would help to have a bit of information about the general structure 
> you are currently using, before giving any ideas about storing the data 
> in a SQL Database.  Obviously your doing Directories and Files, but the 
> structure helps in determining the way to make things work in the SQL 
> Database.  Poor design of the SQL Database, will lead to poor execution 
> of SQL instructions, as much more is required to get the data you want.
> 
> To give you some idea of how I would convert a file system database into 
> a SQL Database, I'll give you an example of a file structure database, 
> and a description of what entry means, then show the resulting database 
> structure for SQL.
> An example, of a simple design using Folders:
> 
> db/
>   cust_records/
>     cust1_info.dat
>     cust1_purchase.dat
>     cust2_info.dat
>     cust2_purchase.dat
>   inventory/
>     item1_info.dat
>     item2_info.dat
> 
> cust#_info.dat contains:  Customer Name, Address, Phone Number, Shipping 
> Info, and such.
> cust#_purchase.dat contains: Customer's a record of all the purchases 
> that a customer has made.
> 
> item#_info.dat contains: Name of Item, Description, Price, Quantity, 
> Shipping price, Shipping Weight.
> 
> Now, to convert this into a SQL Database, I would formulate it as such:
> 
> db
>   customers
>     id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
>     name                 - STRING
>     address              - STRING
>     phone                - STRING
>     ship_to              - STRING
>   transactions
>     id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
>     cust_id              - INTEGER, PRIMARY KEY
>     item_id              - INTEGER
>     quantity             - INTEGER
>     purchased_date       - INTEGER
>   inventory
>     id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
>     item_name            - STRING
>     description          - STRING
>     price                - STRING
>     total_per_quantity   - INTEGER
>     ship_price           - INTEGER
>     ship_weight          - INTEGER
> 
> Now some explaining about what the right hand side is all about in the 
> above layout. The first field in all tables are 'id', which is marked as 
> INTEGER, PRIMARY KEY, AUTOINCREMENT.  Integer denotes a number, of 
> course, Primary Key tells the SQL engine to make quick look ups based 
> upon this field being one of the more often checked fields to look up 
> records in a database.  Finally the Auto increment (Which is one word in 
> SQL), denotes the fact that each new record put into the SQL database, 
> should take the total number of rows, and add one to that number, to 
> assign the identification number for this record.   And lastly, a String 
> is a variable length of text data to be stored.  Most SQL engines will 
> allow for 5 or 6 paragraphs worth of text, but this can be expensive in 
> storage and retrieval.  If you know that a field is only going to be so 
> many characters, such as Phone, maximum being 13 characters, then you 
> can use VARCHAR(13) as the maximum length of the data that is going to 
> be stored in that field.
> 
> There's also FLOAT, which allows for decimal points, but Integers in 
> most SQL Engines will take decimal numbers, and keep the decimals.  But 
> it's always best to see what data types a SQL engine supports, before 
> making a final decision.  Most DBI's will automatically provide a way to 
> store common data types in the database, at their best formulation to 
> save as much space for the database engine to handle.  So look at Perl's 
> DBI for Constructing Tables to see what assistance it will bring you.  
> Lastly, one other data type I didn't cover in the above database, is the 
> BLOB data type.  BLOB data types are for storing Binary data in, should 
> you find the need to store some binary data in the database.
> 
> With blobs, there are no conversions done to store the data in the 
> database, it's stored as is (As in, as you provide it to the SQL 
> Database), and can contain any valid byte sequence in it.  Meaning, 
> anything between 0 and 255 can be stored here.  Most SQL Engines will 
> store UTF-8/16 characters in strings without stripping them, but when in 
> doubt, you can use the Blob data type.
> 
> Now, with the explanations of the data types out of the way, the 
> structure is efficiently designed, for the simple fact, that if you have 
> the ID of the customer, you can get all the items that they purchased, 
> and get each items information from the id's that you get from the 
> transaction table.  You can even use the SQL Instruction JOIN to get all 
> the data you need in a single execute, for example, if you wanted to get 
> the name of the person, the name of the item, and the total cost, you 
> could simply do:
> 
> SELECT name, item_name, price
> FROM transactions
> JOIN customers
>   ON customers.id <http://customers.id> = transactions.cust_id
> JOIN inventory
>   ON inventory.id <http://inventory.id> = transactions.item_id;
> 
> This will return a list of all transactions in the format of:
> name | item_name | price
> 
> Examples being:
> 
> "John Doe","ASUS PC",299.99
> "Mary Johnson", "Microsoft Mouse", 19.99
> etc, etc.
> 
> It makes cross-table look ups a lot easier, to get the relevant data for 
> what you need, and only what you need.  And all of it is handled by the 
> SQL Engine, not Perl, or whatever high level language you use, so the 
> execution speed is greatly improved.
> 
> HTH,
> 
> Mario
> 
> 
> On Fri, Oct 30, 2009 at 1:31 PM, Dan Linder <dan at linder.org 
> <mailto:dan at linder.org>> wrote:
> 
>     I'm taking on the task of converting our in-house tool to use the Perl
>     DBI module to replace the Data::Dumper/eval() it currently uses to
>     store and retrieve data.  Not pretty, but it has worked pretty well
>     for the small data sets we've been using.
> 
>     We now have some people commenting on the speed - some have pages take
>     7+ minutes to bring up waiting for the back-end perl code to ripple
>     through the directory structure and eval() the necessary files to
>     build the page.  The "eval" function seems to be the bulk of the time
>     as I expected...
> 
>     What I'm looking for is some general comments and discussion about the
>     mental task of mapping these hash tables into a SQL table.  I'm not
>     really looking for a tool, more a high level discussion about ways to
>     store the data and still remain flexible.
> 
>     Dan
> 
>     --
>     ******************* ***************** ************* ***********
>     ******* ***** *** **
>     "Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
>     the Satires of Juvenal
>     "I do not fear computers, I fear the lack of them." -- Isaac Asimov
>     (Author)
>     ** *** ***** ******* *********** ************* *****************
>     *******************
>     _______________________________________________
>     Omaha-pm mailing list
>     Omaha-pm at pm.org <mailto:Omaha-pm at pm.org>
>     http://mail.pm.org/mailman/listinfo/omaha-pm
> 
> 
> 
> 
> -- 
> Mario Steele
> http://www.trilake.net
> http://www.ruby-im.net
> http://rubyforge.org/projects/wxruby/
> http://rubyforge.org/projects/wxride/
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm



More information about the OLUG mailing list