Skip to content
/ kontext Public
forked from czcorpus/kontext

An alternative web front-end for the Manatee corpus search engine

License

Notifications You must be signed in to change notification settings

Kira-D/kontext

 
 

Repository files navigation

KonText

Build status

Introduction

KonText is a fully featured corpus query interface for the Manatee open corpus search engine. It started as a fork of the Bonito 2.68 web interface and while still sharing a lot of code with the original Bonito (now bonito-open), KonText is gradually becoming more independent.

It is maintained by the Institute of the Czech National Corpus. Current version contains all the key features of the Bonito 2.98.3 (primarily a support for parallel corpora).

Features

internal changes

  • rewritten as a WSGI application (Bonito-open is CGI-based)
  • modular code design with dynamically loadable plug-ins providing custom functionality implementation
  • fully decoupled background concordance calculation based on the Celery task queue (alternatively, the multiprocessing package can be used)
  • completely rewritten client-side code (AMD modules, code separated from templates)
  • improved logging, error processing and debugging support
  • improved code documentation

new features

  • support for spoken corpora - defined segments can be played back as audio
  • support for user-defined line groups
  • persistent URLs for large queries - you can send a link to someone even if the query was in megabytes
  • access to previous queries
  • easy access to favorite corpora (subcorpora, aligned corpora)
  • interactive subcorpus selection - you can select text types and see how other attributes' available values changed
  • interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
  • a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
  • a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora

enhanced user interface

  • improved user interface and design
  • extended corpora information (size, structures, attributes, citation information)
  • concordance results contain also the Average Reduced Frequency
  • sub-corpus can be created by a custom CQL expression
  • on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
  • result shuffling can be pre-set

Requirements

  • a WSGI-compatible server
    • recommended setup: Gunicorn + a reverse proxy (e.g. Nginx or Apache2)
    • supported setup: Apache2 with mod_wsgi
  • Python 2.7 and:
    • Cheetah Template Engine
    • lxml library
    • werkzeug library (provides WSGI middleware)
    • PyICU library (optional but preferred)
    • markdown library (optional, for formatted corpora references)
    • openpyxl library (optional, for XLSX export)
  • corpus search engine Manatee
    • versions from 2.83.3 to 2.137.2 are supported (the latest one is highly recommended); unless there is an incompatible change in Manatee, newer versions should work too
  • a key-value storage
    • any custom implementation (Redis and SQLite backends are available by default)
  • (optional) Celery task queue task queue for background concordance calculation and maintenance tasks

Build and installation

Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

About

An alternative web front-end for the Manatee corpus search engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 46.0%
  • TypeScript 29.7%
  • JavaScript 19.5%
  • CSS 4.4%
  • Other 0.4%