Abstract
Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost.
In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures.
Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Agrawal, R., and Shafer, J. 1996. Parallel mining of association rules. In IEEE Trans. on Knowledge and Data Engg., 8(6):962–969.
Agrawal, R., and Srikant, R. 1994. Fast algorithms for mining association rules. In 20th VLDB Conf.
Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; and Verkamo, A. I. 1996. Fast discovery of association rules. In Fayyad, U., and et al., eds., Advances in Knowledge Discovery and Data Mining. MIT Press.
Agrawal, R.; Imielinski, T.; and Swami, A. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Intl. Conf. Management of Data.
Berge, C. 1989. Hypergraphs: Combinatorics of Finite Sets. North-Holland.
Cheung, D.; Han, J.; Ng, V.; Fu, A.; and Fu, Y. 1996a. A fast distributed algorithm for mining association rules. 4th Intl. Conf. Parallel and Distributed Info. Systems.
Cheung, D.; Ng, V.; Fu, A.; and Fu, Y. 1996b. Efficient mining of association rules in distributed databases. In IEEE Trans. on Knowledge and Data Engg., 8(6):911–922.
Fayyad, U.; Piatetsky-Shapiro, G.; and Smyth, P. 1996. The KDD process for extracting useful knowledge from volumes of data. In Communications of the ACM-Data Mining and Knowledge Discovery in Databases.
Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co.
Gillett, R. 1996. Memory channel: An optimized cluster interconnect. In IEEE Micro, 16 (2).
Han, E.-H.; Karypis, G.; and Kumar, V. 1997. Scalable parallel data mining for association rules. In ACM SIGMOD Conf. Management of Data.
Holsheimer, M.; Kersten, M.; Mannila, H.; and Toivonen, H. 1995. A perspective on databases and data mining. In 1st Intl. Conf. Knowledge Discovery and Data Mining.
Houtsma, M., and Swami, A. 1995. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering.
Mannila, H.; Toivonen, H.; and Verkamo, I. 1994. Efficient algorithms for discovering association rules. In AAAI Wkshp. Knowledge Discovery in Databases.
Park, J. S.; Chen, M.; and Yu, P. S. 1995a. An effective hash based algorithm for mining association rules. In ACM SIGMOD Intl. Conf. Management of Data.
Park, J. S.; Chen, M.; and Yu, P. S. 1995b. Efficient parallel data mining for association rules. In ACM Intl. Conf. Information and Knowledge Management.
Parthasarathy, S.; Zaki, M. J.; and Li, W. 1997. Application driven memory placement for dynamic data structures. Technical Report URCS TR 653, University of Rochester.
Savasere, A.; Omiecinski, E.; and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf.
Toivonen, H. 1996. Sampling large databases for association rules. In 22nd VLDB Conf.
Zaki, M. J.; Ogihara, M.; Parthasarathy, S.; and Li, W. 1996. Parallel data mining for association rules on shared-memory multi-processors. In Supercomputing'96.
Zaki, M. J.; Parthasarathy, S.; Li, W.; and Ogihara, M. 1997a. Evaluation of sampling for data mining of association rules. In 7th Intl. Wkshp. Research Issues in Data Engg.
Zaki, M. J.; Parthasarathy, S.; Ogihara, M.; and Li, W. 1997b. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining.
Zaki, M. J.; Parthasarathy, S.; Ogihara, M.; and Li, W. 1997c. New algorithms for fast discovery of association rules. Technical Report URCS TR 651, University of Rochester.
Zaki, M. J.; Parthasarathy, S.; and Li, W. 1997. A localized algorithm for parallel association mining. In 9th ACM Symp. Parallel Algorithms and Architectures.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zaki, M.J., Parthasarathy, S., Ogihara, M. et al. Parallel Algorithms for Discovery of Association Rules. Data Mining and Knowledge Discovery 1, 343–373 (1997). https://doi.org/10.1023/A:1009773317876
Issue Date:
DOI: https://doi.org/10.1023/A:1009773317876
Profiles
- Srinivasan Parthasarathy View author profile