Data mining libraries

A general data mining C++ library

Features

Keyphrase Extraction. We've implemented two kinds of keyphrase extraction approaches. One refers to the translation model from thesis work of Zhiyuan Liu, the other comes from our innovatin which uses Wiki data as the semantic knowledge base.
Taxonomy Generation.
Duplicate Detection. Read the paper Detecting Near-Duplicates for Web Crawling firstly then we could understand the algorithm. We used the famous Charikar simhash fingerprints generation approach and set the dimensions(f) to 64.
Ctr Prediction. We've implemented both AdPredictor and FTRL.
Chinese Query Correction.
Collaborative Filtering. This is an item-based incremental collaborative filtering.
Others.

Dependencies

We've just switched to C++ 11 for SF1R recently, and GCC 4.8 is required to build SF1R correspondingly. We do not recommend to use Ubuntu for project building due to the nested references among lots of libraries. CentOS / Redhat / Gentoo / CoreOS are preferred platform. You also need CMake and Boost 1.56 to build the repository . Here are the dependent repositories list:

cmake: The cmake modules required to build all iZENECloud C++ projects.
izenelib: The general purpose C++ libraries.
icma: The Chinese morphological analyzer library.
ijma: The Japanese morphological analyzer library.
ilplib: The language processing libraries.

License

The project is published under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 576 Commits
bin		bin
conf		conf
docs		docs
include		include
lib		lib
source		source
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
ConfigureChecks.cmake		ConfigureChecks.cmake
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
build.sh		build.sh
idmlib-version.h.cmake		idmlib-version.h.cmake
update-resource-push.sh		update-resource-push.sh
update-resource.sh		update-resource.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data mining libraries

Features

Dependencies

License

About

Releases

Packages

Contributors 8

Languages

License

izenecloud/idmlib

Folders and files

Latest commit

History

Repository files navigation

Data mining libraries

Features

Dependencies

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages