This page contains supplementary material regarding the UPM unsupervised clustering algorithm for matching products by considering their titles only. The algorithm and its cluster refinement stage have been published here:
[1] L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020,
whereas a preliminary version of the algorithm (under the name UMaP) was published here:
[2] L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018.
Here are the resources:
- Both the C++ code and the datasets can be downloaded from the supporting GitHub project.
- The datasets and some additional descriptions can also be found on the corresponding kaggle page.
- Draft preprints of the articles [1] and [2].
- A relevant research seminar entitled "A combinatorial approach to entity matching for products" that I presented to the students of School of Science and Technology, International Hellenic University at 09/05/2019.
Note: The researchers who used, or will use this code and/or dataset/s, are kindly asked to cite the aforementioned articles [1] and [2] in their published work/s.