|
| View previous topic :: View next topic |
| Author |
Message |
Zarzyk
Joined: 30 Apr 2009 Posts: 2
|
Posted: Thu Apr 30, 2009 6:49 am Post subject: Association Rules Mining in Relational Databases |
|
|
Hi,
I'm writing a master thesis about implementing Association Rules Mining appliance on the massively distributed Relational Database-Netezza's NPS, and I am looking for science papers about this topic.
Is anybody aware of a good and recent article/paper with practical results about writing ARM algorithms in Relational Databases? Especially on Distributed databases (Netezza's alike)?
I'm looking for following algorithms:
* Apriori variants (there is a lot of works about it)
* FP-growth
* based on finding closed frequent itemsets
* based on finding maximal frequent itemsets
* any other appropriate/ the best approaches
They should be implemented in SQL and/or use use any of advanced RDBMS functionalities, like:
* Stored Procedures
* User Defined Functions/Aggregates
* UD Table functions
I found some articles about it, but all of them are mostly from before 2000, and at most from 2005 (
e.g. "SQL Based Frequent Pattern Mining", Xuequn Shang).
Results achieved are also unfortunately incomparable with results of algorithms working in main memory, e.g. published on FIMI 2004 webpage:
http://fimi.cs.helsinki.fi/experiments/
Is this a barrier we cannot jump through? I hope not, and that is what I'm trying to prove.
I would be grateful for any hints or links!
Thanks!
Krzysztof |
|
| Back to top |
|
 |
TimManns Data Mining Guru
Joined: 25 Sep 2006 Posts: 37 Location: Sydney
|
Posted: Mon May 11, 2009 4:18 am Post subject: association rulesets running as SQL is already out there |
|
|
- fyi -
Commercial applications such as SPSS Clementine already have a few association models (Apriori, GRI, CARMA) and these are built within the tool, but then *SCORED* on the database. The association model is converted into rulesets which are then automatically converted into huge verbose SQL case statements. The SQL looks hideous if you were to try to read it, but its auto-generated by Clementine and sent to the data warehouse as an SQL query and processed by the MMP such as Teradata or Netezza as any SQL query would. Most of my work runs as SQL on Teradata, auto-generated by Clementine. This includes all data preparation and transform, and scoring back to a db table (in one query and insert statement). The scoring is usually a backpropagation neural nets or decision tree (such as CART) converted into SQL.
One problem may be that your SQL exceeds the supported size (maybe 1 mb in filesize) if you have very large or complex models or association rulesets, but this is unlikely and can be avoided if you break your data into segments (customer or product groups).
Scoring tool-generated models as SQL is readily available, although building models in-database i believe is still uncommon.
Cheers
Tim
http://timmanns.blogspot.com/ |
|
| Back to top |
|
 |
Zarzyk
Joined: 30 Apr 2009 Posts: 2
|
Posted: Mon May 11, 2009 4:31 am Post subject: |
|
|
So, I'm doing very uncommon thing. And that is exciting!:)
Thanks Tim for a comment. I will definitely follow your blog.
Cheers,
Krzysztof |
|
| Back to top |
|
 |
FrellNancy
Joined: 11 May 2013 Posts: 1 Location: Kaneohe
|
Posted: Sat May 11, 2013 5:34 am Post subject: |
|
|
Scoring tool-generated models are effectively used for data storage in grids also..They are favorable by all means in concern with managing data and it's conventional security too.
http://www.bigdatacompanies.com |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|
|