Support a `classify_all` method #96

akprasad · 2024-01-01T01:53:08Z

Problem

Chandas contains a classify method that returns a single result. But sometimes it is interesting to look at all possible results and let the user decide how to use them.

Proposed solution

Create a classify_all method that returns all padyas (vrttas + jatis) that don't have a None match.

The text was updated successfully, but these errors were encountered:

MSSRPRAD · 2024-01-01T12:39:26Z

How about changing the classify method entirely into a classify_vrtta method that returns a single result (which now also has a quality parameter). We can have an analogous classify_jati method for Jati metres.

pub fn classify_vrtta(&self, text: impl AsRef<str>) -> MatchResult {}
pub fn classify_jati(&self, text: impl AsRef<str>) -> MatchResult {}

classify_all can then just return the best n results of Partial Matches by updating them in some data structure while iterating:

pub fn classify_all(&self, text: impl AsRef<str>, best_n: usize) -> Vec<MatchResult> {}
pub struct MatchResult {
    matched: Option<Matched>,
    match_type: Vec<MatchType>,
    aksharas: Vec<Vec<Akshara>>,
}

pub enum Matched {
    Vrtta(Vrtta),
    Jati(Jati),
}

pub enum MatchType {
    Partial(usize), // Representing Quality of match
    Prefix,
    Pada,
    Full,
}

@akprasad

akprasad · 2024-01-01T19:50:13Z

Thanks for the suggestions!

Having both classify_vrtta and classify_jati feels too complex -- an end user won't necessarily know whether the meter is a vrtta or jAti, and we should make their life easier with a simple classify endpoint so that they don't have to make two calls.

best_n is a nice idea but puts us in the position of having to decide what "best" means, but this seems dependent on the use case. classify_all without best_n is less sophisticated, but the return data is also more obvious and deterministic.

I do like the idea of putting usize and other fields on MatchType, e.g. Prefix(usize) to store the length of the prefix match. Partial is unclear to me. Perhaps MatchType should be split into a Scope (prefix, pada, full) and a Type (exact, EditDistance(usize), ...)

MSSRPRAD · 2024-01-01T20:14:45Z

Having both classify_vrtta and classify_jati feels too complex -- an end user won't necessarily know whether the meter is a vrtta or jAti, and we should make their life easier with a simple classify endpoint so that they don't have to make two calls.

Can't classify_all to be that endpoint? best_n can be defaulted to 1 (and first the Jati match is checked before vrtta match)?

best_n is a nice idea but puts us in the position of having to decide what "best" means, but this seems dependent on the use case. classify_all without best_n is less sophisticated, but the return data is also more obvious and deterministic.

I felt when the database consists of several hundreds of schemes, it would be apt to ask for the best 5 or so than a match against every single scheme in the data. (If we're providing % matches it might match against many meters)

akprasad · 2024-01-01T20:33:34Z

best_n can be defaulted to 1

Sadly Rust doesn't support default arguments, so there's no way to both have a default value and reduce friction for the user. (There are workarounds like having an Option<usize>, but then the caller needs to pass None explicitly.)

when the database consists of several hundreds of schemes

Good point!

So perhaps:

pub fn classify(&self, text: impl AsRef<str>) -> Matches {
  self.classify_all(text.as_ref(), 1)
}

pub fn classify_all(&self, text: impl AsRef<str>, n: usize) -> Matches { ... }

pub struct Match {
  // TODO: vrtta/jati, match type, etc.
}

pub struct Matches {
    matches: Vec<Match>,
    aksharas: Vec<Vec<Akshara>>,
}

MSSRPRAD · 2024-01-02T19:47:41Z

pub fn classify(&self, text: impl AsRef) -> Matches {
self.classify_all(text.as_ref(), 1)
}

What kind of response struct should be returned? If classify is just returning one response, should it be the one giving Pada match / Prefix Match / None(edit_distance or any other metric) in case there is not Full match?

akprasad · 2024-01-02T21:28:12Z

How about something like:

pub enum Padya {
  Vrtta(Vrtta),
  Jati(Jati),
}

// I don't like this name
pub enum MatchStrategy {
  Exact,
  EditDistance(usize)
}

pub enum MatchScope {
  None,
  Prefix,
  Pada,
  Full,
}

pub struct Match {
  pada: Padya,
  strategy: MatchStrategy,
  scope: MatchScope,
}

pub struct Matches {
  matches: Vec<Match>,
  aksharas: Vec<Vec<Akshara>>,
}

Then both methods can return a Matches. classify can return a Matches where the matches field has length at most 1.

akprasad · 2024-11-04T02:01:22Z

Still tweaking the output format, but I have a workable classify_all as a starting point that I'll push out soon.

akprasad added good first issue Good for newcomers chandas labels Jan 1, 2024

akprasad closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a `classify_all` method #96

Support a `classify_all` method #96

akprasad commented Jan 1, 2024

MSSRPRAD commented Jan 1, 2024 •

edited

Loading

akprasad commented Jan 1, 2024

MSSRPRAD commented Jan 1, 2024 •

edited

Loading

akprasad commented Jan 1, 2024 •

edited

Loading

MSSRPRAD commented Jan 2, 2024

akprasad commented Jan 2, 2024

akprasad commented Nov 4, 2024

Support a classify_all method #96

Support a classify_all method #96

Comments

akprasad commented Jan 1, 2024

Problem

Proposed solution

MSSRPRAD commented Jan 1, 2024 • edited Loading

akprasad commented Jan 1, 2024

MSSRPRAD commented Jan 1, 2024 • edited Loading

akprasad commented Jan 1, 2024 • edited Loading

MSSRPRAD commented Jan 2, 2024

akprasad commented Jan 2, 2024

akprasad commented Nov 4, 2024

Support a `classify_all` method #96

Support a `classify_all` method #96

MSSRPRAD commented Jan 1, 2024 •

edited

Loading

MSSRPRAD commented Jan 1, 2024 •

edited

Loading

akprasad commented Jan 1, 2024 •

edited

Loading