Quantcast

Maximum PC

It is currently Thu Dec 18, 2014 9:32 am

All times are UTC - 8 hours




Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: database design question for the MySQL users...
PostPosted: Thu Nov 04, 2010 8:09 am 
Bitchin' Fast 3D Z8000
Bitchin' Fast 3D Z8000
User avatar

Joined: Mon Jun 14, 2004 4:04 pm
Posts: 987
Location: Earth
Gadget wrote:
In my experience (where there were over 100K named records in some tables), writing the additional query code was a PITA. Plus, if you do the name->id lookup as part of the query, what performance benefit is there? You're actually creating more overhead via the subquery and additional burden on the query processor.

Plus, you make the database ugly and harder to understand. If you need blinding performance, then sure, you have to start making some compromises with the database; Of course, if you really need blinding fast performance, you might want to skip a DBMS altogether.



If I understand what you're saying, there's an additional overhead if you're searching for name -> id on the tag fields, yes? So if a user queries for pictures using tags, then the additional overhead comes when you have to query the tags table first and then query the photos table based on the ID's being returned from the tags table. Did I understand it correctly?

The way I think about it is: you can put the tags in a column in the photos table then do a search for tag words in that column. However, my instinct goes against it because the size of each row begins to grow and your table will start to hit row limit sizes (as in the case of SQL Server). So, as Crash said, put the tags in another table that gets joined by an ID. I'd have to agree with Crash in that if there's an index present in the column, searches do speed up.


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Thu Nov 04, 2010 8:23 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Right, especially if you are considering the tags example. How else would you store them? In a comma separated list on the file record? Unless I am missing the boat here, I don't see any other way that it could be faster, especially with this example.

Int look ups are stupid fast, must faster than char data (guess what words are? lots of numbers). An INT ID as the PK is going to make where clause usage and join usage of the INT ID faster still. If you index the tag name on top of that you are in good shape.

Index stuff you are going to use heavily in where clauses or join statements.

You are making me want to performance test this... /sigh


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Thu Nov 04, 2010 11:02 am 
Bitchin' Fast 3D Z8000
Bitchin' Fast 3D Z8000
User avatar

Joined: Mon Jun 14, 2004 4:04 pm
Posts: 987
Location: Earth
CrashTECH wrote:
Right, especially if you are considering the tags example. How else would you store them? In a comma separated list on the file record? Unless I am missing the boat here, I don't see any other way that it could be faster, especially with this example.

Int look ups are stupid fast, must faster than char data (guess what words are? lots of numbers). An INT ID as the PK is going to make where clause usage and join usage of the INT ID faster still. If you index the tag name on top of that you are in good shape.

Index stuff you are going to use heavily in where clauses or join statements.

You are making me want to performance test this... /sigh


Well, my only concern is row sizes. SQL Server's performance tends to diminish once you start hitting those row limits. Other than that, unless I see numbers from performance test, it's mostly conjecture to me. If you do, post the results. :)


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Fri Nov 05, 2010 7:08 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Can you show an example where it is a problem? Why would you have to do any manual look ups? Why do you have to do name->ID? That is silly.

In the spacecraft databases, every table had an artificial key (ie a "sid") as a primary key that duplicated one or more of the natural keys. The column for the "natural key" usually had a unique constraint placed on it. However, this wasn't always the case, and occasionally, you'd find a "natural key" mapped to two or more primary keys in the database. Figuring out that mess may or may have not been the reason for some coffee stains on the wall and ceiling in my office... I don't recall.

One of the many things stored in the database were the commands that would be sent to the satellite. From a CS perspective, this was basically the ISA for the satellite hardware. In many cases, a command was independent, but many of the commands had subcommands stored in the subcommands table. Instead of listing the real commands and risking finding myself on the wrong side of federal office furniture, I'll make up some commands and sub-commands...

Code:
cmd#  name   sub-cmds (# and name)
----  ----   ------------------------
100   foo    {1000 = on, 1001 = off}
200   bar    {2000 = bus0, 2001 = bus1, 2002 = bus2, 2003 = bus3}
300   baz    null


Aside from the larger and more cluttered database, working at the database isn't such a big problem. For example, I can write a query that lists all the commands with subcommands quite easily...

Code:
Select c.*, s.*
From commands c, sub-commands s
Where c.name = s.cmd_name


You have to perform lookups when you're dealing with a subset of the database with the sids (ie w/ your app code). For example, you might get a message like "Error with command 1001", which isn't exactly useful. You have to go back and find out what it is in the database, reconstruct the parent-child relationship, etc. The same thing is likely to happen with this picture application because you're now manipulating "sids" instead of the natural keys. It may not be as onerous as the spacecraft databases that I had to fight with for a couple of years, but I don't see any real benefits accruing by creating artificial keys and dealing with funny error messages like "Error in picture 102983, invalid tag 28362". Of course, you could write additional sql to do the lookup, but why bother?


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Sun Nov 07, 2010 2:58 pm 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Right... but I don't think as general case your situation makes the best sense for desiging a database, especially if you are trying to be more relational. Joining tables on char data, it WILL be slower.

I'd be willing to go with having an int id and a name be part of the PK, as long as the int was an identity column.

I am not sure I follow your example fully... you have 1 table, but you are joining two in the code example?

I don't see how that is any more difficult than (assuming each has an id field and the sub_commands tables has a fk int field point to the command table id field, fkCommand:

Code:
Select * FROM commands C
   JOIN sub_commands S ON S.fkCommand = c.id


The only difference I see is that your query would run slower as it has to compare char data to do the joining. I can still index the name(s) and get all of the sub commands for specific command and do so by name:

Code:
Select * FROM commands C
   JOIN sub_commands S ON S.fkCommand = c.id
WHERE c.name = 'blag'


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Sun Nov 07, 2010 6:38 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
DJSPIN80 wrote:
If I understand what you're saying, there's an additional overhead if you're searching for name -> id on the tag fields, yes?

Well, my memory of working with the tables and what I wrote were off just a bit. Most of the problems occur when writing/debugging apps that access the database (with artifical/int pkeys), not when interfacing with the database through some type of interactive SQL program. Given the speed of computers (and general lack of concern with speed displayed by most companies these days), the issue at hand is primarily emotional (ie I haven't seen much in the way of intellectual rigor supporting either side), but I still feel that it does become a fairly significant practical issue when you're writing/debugging an app though (web, rich-client, or whatever). Having too debug a bunch of meaningless integers can be painful in a large database, especially when dealing with systems that don't have an interactive sql environment (ie a client machine that contains the app, but not the standard suite of database programs).

DJSPIN80 wrote:
The way I think about it is: you can put the tags in a column in the photos table then do a search for tag words in that column. However, my instinct goes against it because the size of each row begins to grow and your table will start to hit row limit sizes (as in the case of SQL Server). So, as Crash said, put the tags in another table that gets joined by an ID. I'd have to agree with Crash in that if there's an index present in the column, searches do speed up.

No, I would never advocate such a schema design; It violates 1NF. In the case of a M:N entity-relationship, I would use the standard three table physical schema (table1=photos, table2=tags, table3 maps the two tables). What Crashtech was suggesting is the use of artificial keys: photo "foo" is mapped to 100 in the photos table; tag "beach" is mapped to 1000 in the tags table; and the relationship 100,1000 is stored in the 3rd table. Instead of creating the artificial keys, I would use the natural keys, "foo" and "beach", in the 3rd table (and not use them as primary keys in table1 and table2 either).


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Sun Nov 07, 2010 7:09 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Right, especially if you are considering the tags example. How else would you store them? In a comma separated list on the file record? Unless I am missing the boat here, I don't see any other way that it could be faster, especially with this example.

Where is the idea that I would store tags in a single, comma separated column, coming from? I would never suggest doing anything like it. The original author said that he had done this type of thing, not me.

CrashTECH wrote:
Int look ups are stupid fast, must faster than char data (guess what words are? lots of numbers). An INT ID as the PK is going to make where clause usage and join usage of the INT ID faster still. If you index the tag name on top of that you are in good shape.

You're talking about saving maybe 50 NANO-SECONDS on a modern computer, which is completely trumped by a single page fault, a single bad packet, having to pull a BLOB image from the disk, etc. If the reason you're using artificial keys is to try and increase performance, then yes, I think you're completely missing the boat here.

CrashTECH wrote:
You are making me want to performance test this... /sigh

Just think about this outside of the database abstraction for a moment. Imagine that we've been told to develop a query/response type of application. Given a query string from the user, we need to provide the string response. Responses are not unique (ie not a 1:1 mapping). How would you make this app fast? Let's assume that we actually agreed on a solution: We're going to store the characters that make up a string in a binary search tree (ie performance is O(lg query-length) for this system). The leaf nodes will contain a pointer to the response.

Compared to a query/response app that is given an integer query and does a simple table lookup to provide the response, the binary-tree based response system is going to be slow, right? After all, we're comparing O(1) to O(lg n). We know the outcome. If we wanted the fastest solution, we should choose the O(1) table lookup solution, right?

Would you change your mind if I told you there were 1 trillion possible queries and only 10K responses? Why might the tree-based solution be a hell of a lot faster now?


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Sun Nov 07, 2010 7:29 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Right... but I don't think as general case your situation makes the best sense for desiging a database, especially if you are trying to be more relational.

I'm suggesting using the same number of entities and relations. Creating artificial keys does not create additional relationships (ie masturbation is not having sex). You never hear the phrase artificial key during ERD discussions, right?

CrashTECH wrote:
Joining tables on char data, it WILL be slower.

Perhaps, either you're not giving database programmers enough credit or I'm giving them to much credit. See my previous post concerning why I don't think the question is relevant though.

CrashTECH wrote:
I am not sure I follow your example fully... you have 1 table, but you are joining two in the code example?

This is why we should have been dealing with ERDs and not SQL/physical schemas. I consider photos and tags to be two distinct entities with an M:N relationship between them. This logical design would result in three tables. I would use the natural keys (eg the name of a tag) instead of creating an artificial key.

CrashTECH wrote:
I don't see how that is any more difficult than (assuming each has an id field and the sub_commands tables has a fk int field point to the command table id field, fkCommand:

Code:
Select * FROM commands C
   JOIN sub_commands S ON S.fkCommand = c.id

That example isn't any more difficult; You're joining the entire tables. Selecting a single record is only slightly more of a pain, but it does become a pita when someone calls you on the phone saying that command #123232, sub-cmd#383729, crashed because of arguement #28372. Of course, they have no idea what any of this means, and you figure out much later that someone commanded a certain gimble to move 10 degs at 1deg/minute which conflicted with another command (probably 18373, 29387392, 19383). It's a PITA, trust me!


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Mon Nov 08, 2010 10:19 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Gadget wrote:
CrashTECH wrote:
I am not sure I follow your example fully... you have 1 table, but you are joining two in the code example?

This is why we should have been dealing with ERDs and not SQL/physical schemas. I consider photos and tags to be two distinct entities with an M:N relationship between them. This logical design would result in three tables. I would use the natural keys (eg the name of a tag) instead of creating an artificial key.

Hrm... I think having a concrete example works better. I mean, we are talking about how to layout a DB Schema anyway, right? So given that lets get a few things layed out so that we are all looking at the exact same thing for the same example. I really want to see what SQL does with the different methods. How effectively the relationships are search should be up to the DB engine itself I would think that it was tuned properly.

I think we agree that the three tables make sense for the proper relationships for photos and tags. My proposed solution to that schema is here (I added another table for the files, but I haven't updated the view/queries below):

Code:
files_table
ID INT Identity(1,1) PK
file_name varchar(50)
... (rest of the table doesn't matter, unless you are storing binary data in the DB, and you shouldn't be...)

tags_table
ID INT Identity(1,1) PK
tag_name varchar(50)  <-- I would put an index on this, since it will be used in where clauses

file_tags_table
ID INT Identity(1,1) PK
tag_id INT FK
file_id INT FK

file_tags_view
SELECT t1.*, t2.tag_name FROM file_tags_table t1
   JOIN tags_table t2 ON t1.tag_id = t2.ID

SELECT * FROM file_tags_view WHERE tag_name =/IN/LIKE ['tag'/('tag1', 'tag2', 'tag3')/'%partialtagtext%']


I am going to create the docs to go along with this and some SQL scripts etc to create the tables. I want to see how you would create this relationship of 1:* files to tags. Then we put both in a DB populate with similar data, and performance test it (realizing that flickr has way more data, and that we might have trouble mimicking that much data).


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Tue Nov 09, 2010 10:21 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Decent read... need to dig into it a little deeper... http://radar.oreilly.com/archives/2006/ ... lickr.html


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Wed Nov 10, 2010 7:23 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Gadget wrote:
CrashTECH wrote:
I am not sure I follow your example fully... you have 1 table, but you are joining two in the code example?

This is why we should have been dealing with ERDs and not SQL/physical schemas. I consider photos and tags to be two distinct entities with an M:N relationship between them. This logical design would result in three tables. I would use the natural keys (eg the name of a tag) instead of creating an artificial key.

Hrm... I think having a concrete example works better.

Yeah, works better... after repeating myself five times now, we're still not on the same page, but this is working better?

When we discuss algorithms, do we discuss a particular implementation or use abstractions to help our limited brains cope with the complexity? How about operating systems... do we discuss memory management in the abstract or tear open an OS kernel and try to deduce what the hell is going on in there? Anyways... this thread is starting to remind me of a line from Friday the 13th:

Freddy to Jason: Why the ____ won't you die!

CrashTECH wrote:
I really want to see what SQL does with the different methods.

I assume that "methods" means different schema implementations, right? Correct me if I'm wrong. Please use a context free grammar next time... =)

CrashTECH wrote:
How effectively the relationships are search should be up to the DB engine itself I would think that it was tuned properly.

Huh? English... please use English in the forums!

<snip SQL database stuff>
I'll download SQL Express and setup the schema that I've been talking about then post it.

CrashTECH wrote:
Then we put both in a DB populate with similar data, and performance test it (realizing that flickr has way more data, and that we might have trouble mimicking that much data).

I'll let you populate/run the tests from one machine. It'll be a better comparison that way. Does SQL Server have a built-in benchmark tool or something? How do you plan on testing them? To be fair, you need to post the test procedure/criteria.

Who said anything about Flickr data? Do you work there or something?


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Thu Nov 11, 2010 6:05 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Gadget wrote:
Yeah, works better... after repeating myself five times now, we're still not on the same page, but this is working better?
No, I understand what you are saying. In practical use cases I don't believe it. I don't believe that my method is wrong and that yours is better. This was a SQL schema question, so why does it not make sense to address the schema issue? Does it not make sense to put forth the two examples and see what actually happens to them before trying to determine the why? Take the specific case and then drill down into what is going on. Theory is all well and good but what happens in practice is far more important. The world doesn't run on theory.

I am going to ignore the snipes at my typing. I'll try to use a NFA or something for you next time. :)

Gadget wrote:
I'll let you populate/run the tests from one machine. It'll be a better comparison that way. Does SQL Server have a built-in benchmark tool or something? How do you plan on testing them? To be fair, you need to post the test procedure/criteria.
Execution plans, time to run queries, etc. MS SQL Server has a profiler. Not sure if it comes with the express version or not.

Gadget wrote:
Who said anything about Flickr data? Do you work there or something?
No I don't... but the first line of the first post in this thread:
cbassett01 wrote:
I am in the process of creating a custom photo gallery system for a website, and need a little advice on how to set up the tables in mysql.
Trying to keep it relevant to the topic in question... the VOLUME of data has a large impact on how the queries perform. Flickr is basically the same thing the OP wants(ed) to create.


Top
  Profile  
 
 Post subject: Re: database design question for the MySQL users...
PostPosted: Sun Nov 14, 2010 2:11 am 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Gadget wrote:
Yeah, works better... after repeating myself five times now, we're still not on the same page, but this is working better?
No, I understand what you are saying. In practical use cases I don't believe it. I don't believe that my method is wrong and that yours is better. This was a SQL schema question, so why does it not make sense to address the schema issue?

Entity-relationship modeling does address the schema issue; The whole point of doing it is to create a good schema! The ERD would have shown that logically we were on the same page, but in terms of the physical implementation, you want to create artificial keys. I had to explain THAT several times. Moving on...

CrashTECH wrote:
Take the specific case and then drill down into what is going on. Theory is all well and good but what happens in practice is far more important. The world doesn't run on theory.

Yeah, just earlier today I was thinking that...

Google... probably just guessed a good search algorithm
IBM... O(n^3) sorting would have been good enough to get them off the ground (ie they didn't really need a fancy sorting algorithm, and besides, they wouldn't have even KNOWN that it was a FANCY algo either! Boy ignorance really is bliss isn't it!)
Akamai... The whole thing with distributed hash tables was probably just a lucky bug in their code
iRobot... Nope, no theory stuff there; That shit is as "random" as it gets, right?! (and elephants not playing chess... pfft!)
Intel... They were playing with transistors and just happened to make an ALU; The whole boolean logic thing came decades later to punish the rest of us for being late to the party.

Isn't the term APPLIED SCIENCE confusing? To what are these scientists applying? Did you know that over 33% of US GDP is attributable to quantum mechanics? Seems strange that SciAm was able to relate a theory to the real-world in such a manner when the world obviously doesn't "run on theory". Seriously, you need to sit down and think about what you just said, and for your sake, I hope you're more considerate to the theories of say physics than you are towards math and computation. You may just find out the hard-way that Newton et al "theories" about gravity, electro-magnetic force, and energy actually have some relevance to the "real world".

CrashTECH wrote:
I am going to ignore the snipes at my typing. I'll try to use a NFA or something for you next time. :)

Honestly, I couldn't figure out what you were trying to say about search...

CrashTECH wrote:
Gadget wrote:
Who said anything about Flickr data? Do you work there or something?
No I don't... but the first line of the first post in this thread:
cbassett01 wrote:
I am in the process of creating a custom photo gallery system for a website, and need a little advice on how to set up the tables in mysql.
Trying to keep it relevant to the topic in question... the VOLUME of data has a large impact on how the queries perform. Flickr is basically the same thing the OP wants(ed) to create.

I find it a bit ironic that someone who was just arguing "real world" vs theory would suggest (1) testing with a volume of data that has to be at least 100K times the volume the OP would ever be able to generate or (2) conclude that Flickr is what the OP actually wanted to create. Being incredibly prone to realism at the moment, I would even suggest that a site like Flickr probably use something a bit less naive than just a simple relational database and sql query... just a theory.

As all of the data is being generated, we can test with different volumes of data. We can pick numbers of records that we find to be meaningful, perhaps, 1K 100K 10M and 1B?


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group

© 2014 Future US, Inc. All rights reserved.