-
Notifications
You must be signed in to change notification settings - Fork 43
Developing on the four main daemons (Searching, Crawling, Documents and AD)
This article is a work in progress. You can help Searchdaimon by expanding it with information you know.
The ES has four main daemons that handle different tasks. They are all located in the bin folder, and are commonly run from the command line when doing development.
- searchdbb - Do the actual searching. Get the queries to search for from cgi-bin/dispatcher_allbb.
- crawlManager2 - Get told what to crawl and add the data to the index. Also handle lookups of security information in 3-party system.
- boitho-bbdn - Get crawled documents from crawlManager2 and extract the text, makes thumbnails and add them to the repository. Another program will be told to index the repository later.
- boithoad - Dedicated daemon for talking to Active Directory. The code will be a plugin for crawlManager2 some day.
When developing on the ES core you may want to run one or more of the main daemons from the command line to see what debug information they produce. Running from the command line is also necessarily when using debugging tools like GDB and Valgrind.
Most of the debug information is outputted through a login library. It can either be discarded (the default), redirected to a file or printed on the console. To get the debug info printed on the console you have to set the environment flags BBLOGGER_APPENDERS to 1 and set the severity level BBLOGGER_SEVERITY.
Before you can run a daemon from the command line the already running instance must be stopped. Access the web admin panel, and under "Manage services" stop the ones you are planning to run manually.
You must login to the ES with ssh, su to the boitho user and enter the /home/boitho/boithoTools folder before you can run any of the daemons.
Typically you login as root and:
su – boitho cd boithoTools
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb
Do the actual searching. Get the queries to search for from cgi-bin/dispatcher_allbb.
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb
The searchdbb daemon also have some command line option you should enable when running from the command line:
- -f Fast startup. Skip thesaurus rebuild and index pre-chasing
- -s Single process. Will not create a new process for each query nor use multiple threads
Example:
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb -f -s
Now you can go to the search interface and search for something. Observe how the searchdbb daemon outputs debug info to the ssh console.
Sometimes you may want to run Valgrind to look for memory leeks and errors.
Example:
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 valgrind --leak-check=full --max-stackframe=5247212 bin/searchdbb -f -s
The search kernel supports some command line arguments that may be useful when running it manually.
Argument | |
-m number | Max. The search kernel will exit after max number of queries. Normally -m are used with -s or -t to run a certain number and queries and then exit so valgrind can do a full memory leek check |
-l | Log. Log messages to the logs/searchd.log file |
-o | Preopen. Open up index files before starting |
-b file | Brank file. Load static rank information from brank file |
-s | Single. Do not fork for new connections nor use multiple threads |
-t | Single with Threads. Do not fork for new connections but will use multiple threads |
-f | Fast startup. Skip time consuming task at startup so we can start to answer queries faster. Often used when running the searchdbb from the commandline and we don't want to wait for spelling etc |
-c | No cache. Do not cache indexes |
-A number | Set appenders |
-L number | Set log severity |
-S number | Set spelling min freq |
-a seconds | Alarm. Set how long a query can run for. Default is 60 seconds. When the time is up the searchdbb will receive and alarm and exit the process that was running the query |
Get told what to crawl and add the data to the index. Also handle lookups of security information in 3-party system.
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/crawlManager2
The crawler manager supports some command line arguments that may be useful when running it manually.
Argument | |
-m number | Max. Exit after max number of queries. Normally -m are used with -s to run a certain number and crawls and then exit so valgrind can do a full memory leek check |
-s | Single. Do not fork for new connections nor use multiple threads |
Get crawled documents from crawlManager2 and extract the text, makes thumbnails and add them to the repository. Another program will be told to index the repository later.
The document manager uses plugins to extract the text from the files is gets. Please see the main article for more information about them: Plugin: File filter
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/boitho-bbdn
The crawler manager supports some command line arguments that may be useful when running it manually.
Argument | |
-m number | Max. Exit after max number of queries. Normally -m are used with -s to run a certain number and crawls and then exit so valgrind can do a full memory leek check |
-s | Single. Do not fork for new connections nor use multiple threads |
Dedicated daemon for talking to Active Directory. The code will be a plugin for crawlManager2 some day.
env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/boithoad