Author Topic: SQL should I denormalize or join this?  (Read 1403 times)

Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
SQL should I denormalize or join this?
« on: December 02, 2010, 04:23:25 AM »
There are 2 tables. First is huge and has x50 times more entries than the second. The second is to be very fast and used more than first. Now, there are several fields I would need from 1st table when I query the 2nd. So I have 2 choices:
1) JOIN first table each time I query the 2nd
2) Denormalize and duplicate data from 1st to 2nd (there are 2 fields from the 1st table that are updated each time user login, so I would have to add additional query to update 2nd query accordingly, this won't affect code maintainability at all only slightly performance)

Both tables are variabled width, althrough I could make 2nd fixed width if needed since it is small enough to waste space.

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #1 on: December 02, 2010, 07:42:43 AM »
Do both! Start with the normalized approach first as I have a feeling it'll actually be optimal in this case but try both and test the performance time on each.
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #2 on: December 03, 2010, 03:29:47 AM »
I can' t test the performance, even if I had time for this. It's about putting more stress on log user in function in exchange for making easier and faster "everything else", where "everything else" is not fully defined and impossible to predict with full accuracy what will it be in future. This is a decision to be made without sufficient information provided, that's the trick :D I need "wild guess" or best theoretical solution.

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #3 on: December 03, 2010, 05:52:29 AM »
lol Well then, as noted in my previous post, my "wild guess" is that the normalized solution will actually perform better as long as you join it properly, the result set would be limited to the size of the smaller table.

The "best" solution would probably be to create a view of the join that you would need to perform every time you access the second table. In that way, it's performance would be the same as for a standard query. But... for some reason I seem to recall you not using views.
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #4 on: December 03, 2010, 06:43:07 AM »
the result set would be limited to the size of the smaller table.
Well, that's not the problem. Result will always be small, the problem is with search (exclusion of 99.5% of rows that are not needed).

Quote
The "best" solution would probably be to create a view of the join that you would need to perform every time you access the second table. In that way, it's performance would be the same as for a standard query.
You mean the performance of standard query of the big table? The whole point is the big table query is too slow, I want it faster. That's why I though of making a new small table and use only it (at the moment there is only 1 big table used for everything, but in many tasks I would need only a tiny portion of it)...

The size difference between big and small table is 50-100 times. It is a really huge difference.

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #5 on: December 03, 2010, 07:36:54 AM »
You mean the performance of standard query of the big table? The whole point is the big table query is too slow, I want it faster. That's why I though of making a new small table and use only it (at the moment there is only 1 big table used for everything, but in many tasks I would need only a tiny portion of it)...

The size difference between big and small table is 50-100 times. It is a really huge difference.
No, a view is, essentially, a stored query that you can query to obtain sub-results. In essence, it allows you to store the join operation so you don't have to make it over and over again. Thus, it would be a small result set containing exactly the fields that you need. It's a virtual table which means that your underlying data is still semantically correct.

This issue, now that I remember it, was that views weren't introduced until MySQL 5.0 and, if I recall, you're running on MySQL 4, correct? So... I'd just do the join each time it's necessary. If you ever upgrade to 5, then you're at least already in a position to take advantage of the view's performance optimization with minimal changes to your underlying codebase.
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #6 on: December 04, 2010, 05:59:44 AM »
Well, I can't use views right now, but I'm not in a hurry for this feature, I would like to know the best solution even if it's not available to me...

I don't get the benefit of views. On the list of advantages they mention simplified queries, not performance boost... Actually, from what I read, it might be even worse performance wise. Views just create a temporary table without indexes from the given query, right? So, each time the main table is updated the same update has to be done on the temporary table, so the performance would be equal as if we duplicate data manually, correct? Not to mention that the view had to be recreated each time a new row is added, correct (or maybe it would do it automaticly without recreate)?

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #7 on: December 04, 2010, 07:55:53 AM »
Just like any other tool... there's a time and place to use views. You can use indexes on views depending upon the algorithm you use to create the view. The merge algorithm actually exists as a way to simply re-write queries to simplify them from the program's perspective.

As far as duplicating the data manually, yes you could re-create this functionality yourself. However, you can make MySQL do all of this work for you implicitly (and you can be sure they'll peak and tune it as much as possible) or you can do it yourself.

Now, one day, you decide to change methods. If you went with the view, you drop the view (or edit it to refer to the new storage structure is even better depending upon exactly what you're doing) and that's it! You are now ready to implement your new logic. With a manual attempt at this, you drop your temporary table (or alter it) and manually re-write any portion of your codebase where you insert, update or remove from that temporary table.

Notice, you will always have to change where you read from that table / view, regardless of the scenario, so I left that out.
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #8 on: December 05, 2010, 02:41:22 PM »
Hmmm, no, I wouldn't miss the opportunity to put additional data on the small table (which I always ommited on the big one due to saving size). So, in this particular case symetrical temporary View table won't do. I suspect there is a way to add custom fields to the view, but well, that would be really like a regular table then, so not much of a benefit... No, I can't get away without making one big and one small table. So the initial assumption was optimal, there are two tables. Now, what to do with fields that are shared? Actually, almost all big table fields would be needed when querying the small one...

- big table is 50-100 times bigger than small one
- every single time I query small table I would also need many fields from big one
- assuring integrity of tables (in case of denormalisation) would be extremely simple (write to the big table is done in one place, it's always a single query, the fields written on small table are never those shared with big table), it's basicly adding one line of code...
- even if integrity between tables is lost (in case of future wild code changes), it's perfectly fine, the small table is always used as a source of aproximate data. It would be even fine to make it a bit obsolete/delayed if performance requires so.
- when querrying the small table (or doing any other task on small table) there will never, ever be the big table change needed or even allowed (small table operations are prohibited of ever changing big table)
- each time user logs in there is a write to big table require (and in case of denormalization identical write to small table would be needed)
- operations on small table would require a lot of grouping, order by and search on data (from the big table in case of JOIN scenario)

What's the best solution for the assumptions above?

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #9 on: December 05, 2010, 07:32:39 PM »
- big table is 50-100 times bigger than small one
- every single time I query small table I would also need many fields from big one
Honestly, this sounds like something is terribly wrong with your table structures. If you need, pretty much, every field everytime you would pull data from the smaller table... then the smaller table should be part of the larger table. The purpose of having separate tables is for storing data that is separate and stands on its own.
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #10 on: December 06, 2010, 04:43:53 AM »
The whole thing is that I rarely need 99% of rows from big one. For huge number of heavy queries I need just 1% and the one 1% is easily defined, perfect condition to make a small table from these 1%. So the big table would be for storage and for rare queries while the real stress would be put on the small table which is basicly a copy (or some join with big) of the big one + some bonus fields.
Right now I have to run all these searches on the big one while knowing that I could easily exclude 99% of rows as not meeting the criteria. And the same one criteria apply to basicly all heavy queries (so it's not like I would later discover that I would need another criteria. No, this one criteria will be good forever and ever).

The similarity of this situation would to be to some web analyser where you store data from 10 years but almost always you need just last month (1%). So the optimization would be to create a brand new table for the last month (this is just an example, the whole situation is more complex than traffic analyser).

- "by the book" the solution would be to use partitioning, but I can't do this since the data "usability", while being easly determined is frequently changing (so like days from 2 years ago could suddenly be transferred to the stats of last month).

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #11 on: December 06, 2010, 06:35:41 AM »
lol Again, that problem could've been solved by views if you had them, but we've already had that discussion ;) I had your reasoning on creating a 2nd table all wrong from the get-go. This post has really clarified things for me.

What sort of indexes do you have setup and what's the query you're performing?
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Chris

  • Game Owner
  • Level 35
  • *
  • Posts: 2,217
  • Reputation: +28/-1
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #12 on: December 06, 2010, 07:02:37 AM »
Well, I don't have many indexes right now... Haven't though about it much yet, I just thought that first I should exclude these 99% of unneeded rows which alone would be probably a bigger performance gain than any amount of indexes.
As for queries these are mostly selects returning multiple rows on various criterias, possibly group by.

The whole question could be simplified to: What happens when I do LEFT JOIN small table and big table (connected via identical row ID (indexed))? If I want to querry fields from big table but only if there is a corresponding row on small one, would it be much faster than querrying from the whole big one without JOIN?

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: SQL should I denormalize or join this?
« Reply #13 on: December 06, 2010, 07:47:36 AM »
Indexing the table properly will be the best way to make this table perform well. By indexing it, the query can exclude these rows for you.

Left join is an extremely intense operation, especially with large tables. Also, left (or right) join is only for when you want to include all entries of the left (or right) table regardless of whether a matching record is found in the right (or left) table. Just use a standard (inner) join to restrict to those only with matching records.

With anything optimization-wise though... if the solution isn't 100% proven, just make it work and worry about optimization when you have the ability to test which is actually optimal for your setup.
Idiocy - Never underestimate the power of stupid people in large groups.


 


SimplePortal 2.3.3 © 2008-2010, SimplePortal