Similarity search algorithms for top-k most similar consumption patterns

Nadeem Iftikhar, Xiufeng Liu, Akos Madarasz, Finn Ebertsen Nordbjerg

Publikation: Bidrag til tidsskriftTidsskriftsartikelForskningpeer review


Similarity search can be expensive on large data sets. This is because, when a query is provided for the search, the algorithm has to traverse the entire data set, and compute a similarity score between every instance and the query, and store the top k instances. Two algorithms are presented in this paper, which look at efficient top-k querying. The first algorithm is an efficient solution for quick querying in sparse data sets. The efficiency comes from recreating the data set in a hash map, only keeping the non-zerovalues. The second algorithm is efficient on non-sparse data sets, by building a kd-tree from the instances, and pruning search timeon query.
StatusUnder udarbejdelse - 2021