KDnuggets » Forums
Latest News



 FAQFAQ    SearchSearch    MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Association Rules Mining in Relational Databases

 
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Data Mining Open Forum
View previous topic :: View next topic  
Author Message
Zarzyk



Joined: 30 Apr 2009
Posts: 2

PostPosted: Thu Apr 30, 2009 6:49 am    Post subject: Association Rules Mining in Relational Databases Reply with quote

Hi,
I'm writing a master thesis about implementing Association Rules Mining appliance on the massively distributed Relational Database-Netezza's NPS, and I am looking for science papers about this topic.

Is anybody aware of a good and recent article/paper with practical results about writing ARM algorithms in Relational Databases? Especially on Distributed databases (Netezza's alike)?
I'm looking for following algorithms:
* Apriori variants (there is a lot of works about it)
* FP-growth
* based on finding closed frequent itemsets
* based on finding maximal frequent itemsets
* any other appropriate/ the best approaches

They should be implemented in SQL and/or use use any of advanced RDBMS functionalities, like:
* Stored Procedures
* User Defined Functions/Aggregates
* UD Table functions

I found some articles about it, but all of them are mostly from before 2000, and at most from 2005 (
e.g. "SQL Based Frequent Pattern Mining", Xuequn Shang).
Results achieved are also unfortunately incomparable with results of algorithms working in main memory, e.g. published on FIMI 2004 webpage:
http://fimi.cs.helsinki.fi/experiments/

Is this a barrier we cannot jump through? I hope not, and that is what I'm trying to prove.

I would be grateful for any hints or links!
Thanks!
Krzysztof
Back to top
View user's profile Send private message
TimManns
Data Mining Guru


Joined: 25 Sep 2006
Posts: 37
Location: Sydney

PostPosted: Mon May 11, 2009 4:18 am    Post subject: association rulesets running as SQL is already out there Reply with quote

- fyi -

Commercial applications such as SPSS Clementine already have a few association models (Apriori, GRI, CARMA) and these are built within the tool, but then *SCORED* on the database. The association model is converted into rulesets which are then automatically converted into huge verbose SQL case statements. The SQL looks hideous if you were to try to read it, but its auto-generated by Clementine and sent to the data warehouse as an SQL query and processed by the MMP such as Teradata or Netezza as any SQL query would. Most of my work runs as SQL on Teradata, auto-generated by Clementine. This includes all data preparation and transform, and scoring back to a db table (in one query and insert statement). The scoring is usually a backpropagation neural nets or decision tree (such as CART) converted into SQL.

One problem may be that your SQL exceeds the supported size (maybe 1 mb in filesize) if you have very large or complex models or association rulesets, but this is unlikely and can be avoided if you break your data into segments (customer or product groups).

Scoring tool-generated models as SQL is readily available, although building models in-database i believe is still uncommon.

Cheers

Tim
http://timmanns.blogspot.com/
Back to top
View user's profile Send private message
Zarzyk



Joined: 30 Apr 2009
Posts: 2

PostPosted: Mon May 11, 2009 4:31 am    Post subject: Reply with quote

So, I'm doing very uncommon thing. And that is exciting!:)

Thanks Tim for a comment. I will definitely follow your blog.

Cheers,
Krzysztof
Back to top
View user's profile Send private message
FrellNancy



Joined: 11 May 2013
Posts: 1
Location: Kaneohe

PostPosted: Sat May 11, 2013 5:34 am    Post subject: Reply with quote

Scoring tool-generated models are effectively used for data storage in grids also..They are favorable by all means in concern with managing data and it's conventional security too.

http://www.bigdatacompanies.com
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Data Mining Open Forum All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KDnuggets » Forums

Copyright © 2012 KDnuggets.   Subscribe to KDnuggets News! Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets