ACM-sigmod-2017-contest

Implementation done within the course of the subject "Software Development for Information Systems" during the winter of 2017-2018 by our team:

Panagiotis Kokkinakos
Georgios Rouvalis
Theodoros Stefou

Summary

An N-gram of words is a contiguous sequence of N words (https://en.wikipedia.org/wiki/N-gram). For this year's contest(2017), the task is to filter a stream of documents using a set of N-grams of interest and, for each document, return where in it one of the N-grams is found. The input to the task will have two parts: first, an initial set of N-grams, which may be processed and indexed; second, a series of queries (documents) and N-gram updates (insertions or deletions), arbitrarily interleaved. For each N-gram insertion or deletion, the set of N-grams of interest is updated accordingly. For each new query (document) arriving, the task is to return as fast as possible the N-grams of the currently up-to-date set that are found in the document. These should be presented in order of their first appearance in the document. If one N-gram is a prefix of another and the larger one is in the document, then the shorter one is presented first. Note that in answering each query, all insertions/deletions preceding the query must be taken into account.

Part 1

Insert and save elements to trie.
Implementation of algorithm to solve the problem
Implementation of a test to evaluate data structures and search(Unit Testing)

Part 2

Ignore double-found n-grams during Q command. (Implementation of BloomFilter)
Implementation of Hash Table in every trie level.
Find Top-K n-grams in every burst.
Implementation of static trie.

Part 3

Both for Static and Dynamic Tries implementation of paralleling using threads ( Thread Pool & Job Scheduler)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Project(Part1)		Project(Part1)
Project(Part2)		Project(Part2)
Project(Part3)		Project(Part3)
.gitattributes		.gitattributes
Final Report and Measurements.pdf		Final Report and Measurements.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACM-sigmod-2017-contest

Summary

Part 1

Part 2

Part 3

About

Releases

Packages

Languages

License

GeoRouv/ACM-sigmod-2017-contest

Folders and files

Latest commit

History

Repository files navigation

ACM-sigmod-2017-contest

Summary

Part 1

Part 2

Part 3

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages