Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More nifty dicts #2

Closed
vvasuki opened this issue Jun 20, 2022 · 8 comments
Closed

More nifty dicts #2

vvasuki opened this issue Jun 20, 2022 · 8 comments

Comments

@vvasuki
Copy link
Contributor

vvasuki commented Jun 20, 2022

Besides Apte, I like shabdasAgara - consider adding it .

image

Also shabdArthakaustubha (available on indic-dict/stardict-sanskrit) , example below:

From शब्दार्थकौस्तुभः sa-kn
शिल्प

शिल्प

पदविभागः - नपुंसकलिङ्गः
कन्नडार्थः - ಕಲೆ /ಕುಶಲವಿದ್ಯೆ /ಚಿತ್ರ ನೃತ್ಯ ಗೀತ ಮೊದಲಾದುವು
निष्पत्तिः - शील (समाधौ) - "पः" धातोः ह्रस्वश्च निपा० (उ० ३-२८)
प्रयोगाः - "पात्रविशेषे न्यस्तं गुणान्तरं व्रजति शिल्पमाधातुः"
उल्लेखाः - माल० १-६

शिल्प

पदविभागः - पुल्लिङ्गः
कन्नडार्थः - ಸ್ರುವ /ಯಜ್ಞದ ಪಾತ್ರೆ

शिल्प

पदविभागः - पुल्लिङ्गः
कन्नडार्थः - ಕೈಗಾರಿಕೆ

@akprasad
Copy link
Contributor

Added the Shabdasagara in 84d36a0. Changes are live, e.g.:

https://ambuda.org/dictionaries/shabdasagara/nara

Adding the Shabdasagara is easy since it follows the format of the other Cologne dictionaries. For the शब्दार्थकौस्तुभ, I'd rather not add stardict support if it's just to support one additional dictionary.

Options are:

  • If you can find 3 high-quality dictionaries in this format for 3 different Indian languages, I'll happily add the support.
  • Otherwise, please submit a PR. See the scripts in ambuda/seed/dictionaries for reference.

@vvasuki
Copy link
Contributor Author

vvasuki commented Jun 21, 2022

Adding the Shabdasagara is easy since it follows the format of the other Cologne dictionaries. For the शब्दार्थकौस्तुभ, I'd rather not add stardict support if it's just to support one additional dictionary.

You don't need to add stardict support for that. The format is pretty simple - just open and see https://github.com/indic-dict/stardict-sanskrit/blob/master/sa-head/other-indic-entries/shabdArtha_kaustubha/shabdArtha_kaustubha.babylon . You can import it in no time. Basically, each entry is 3 lines like:

अऋणिन्
अऋणिन्<br><br><b>पदविभागः - </b>विशेष्यनिघ्नम्<br><b>कन्नडार्थः - </b>ಋಣವಿಲ್ಲದ / ಸಾಲವಿಲ್ಲದ<br><b>व्युत्पत्तिः - </b>न ऋणी |<br><b>प्रयोगाः - </b>“दिवसस्याष्टमे भागे शाकं पचति यो नरः । अऋणी चाप्रवासी च स रात्रिञ्चर मोदते” ||<br><b>उल्लेखाः - </b>भा० ।<br><b>विस्तारः - </b>[ऋ ಕಾರಕ್ಕೆ ವ್ಯಂಜನತ್ವವನ್ನು ಒಪ್ಪುವುದರಿಂದ ಇಲ್ಲಿ “नृट्” ಆಗಮವು ಬಂದಿರುವುದಿಲ್ಲ.]<br><br>

or

HEADWORD
DEFINITION
EMPTY_LINE

@vvasuki
Copy link
Contributor Author

vvasuki commented Jun 21, 2022

(Reopening since this is a third option beyond what was considered earlier.)

@vvasuki vvasuki reopened this Jun 21, 2022
@akprasad
Copy link
Contributor

akprasad commented Jun 23, 2022

There's a lot of noise in the dictionary data: keys with slashes, parentheses, spaces, ... Some examples:

prati+hana
svedavipruw(z)
haratejas/bIjam/vIryam

I have the code ready, but I don't feel good about deploying this kind of data. Let me know when the file is clean and I'll land & deploy.

@vvasuki
Copy link
Contributor Author

vvasuki commented Jun 23, 2022

Thanks for notifying me - I'd not noticed these. Despite missing all those headwords, I'd found this dict so useful. Why don't you omit all entries with pluses, spaces and parantheses and import the rest? I have manually solved many of the problems myself (you should now treat | as a headword separator, so that उष|दाहे|ओषति lists 3 headwords for the same definition), and have scheduled a task for a proofreader I employ - https://trello.com/c/vhbbJ1Qf . I'll let you know when the data is further cleaned.

@drdhaval2785
Copy link

https://github.com/sanskrit-lexicon/cologne-stardict/tree/master/production

If you are OK with babylon format mentioned by Vishvas above, all dictionaries of CDSL may be used in that format from above repository.

@akprasad
Copy link
Contributor

akprasad commented Jun 23, 2022

Added in 5ac5504.

https://ambuda.org/dictionaries/shabdartha-kaustubha/nara

It took more time than I expected to parse out the structure of each dict entry. So in the short-term this is all the work I plan to do for this dictionary. If there are future updates for this dict in terms of presentation or data, please send a PR and I'll happily merge it.

@akprasad
Copy link
Contributor

https://github.com/sanskrit-lexicon/cologne-stardict/tree/master/production

If you are OK with babylon format mentioned by Vishvas above, all dictionaries of CDSL may be used in that format from above repository.

To summarize our conversation on Discord: I prefer XML for its richer structure, and the CDSL XML files also update more frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants