Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typefixes for index genenetwork #177

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jnduli
Copy link
Collaborator

@jnduli jnduli commented Jul 10, 2024

Description

This PR uses pymonad to build a MonadDict from RDF queries. I assume this will mean that LookupErrors won't happen since we'll get Nothing in this case.

How

  1. The RDF query returns a deeply nested dictionary. The MonadDict constructor cannot handle this since it assumes the input dictionary won't be nested. I've added an extra function that will recursively build MonadDicts when any key, value pairs point to a dictionary or a list.
  2. Use of bind and map to ensure that Just(x) and Nothing are handled ok

Testing

The script seems to run ok locally:

╰─$ python3 scripts/index-genenetwork create-xapian-index /tmp/xapian "mysql://gn2:password@localhost/gn2" "http://localhost:8890/sparql"
2024-07-10 19:06:00 EAT INFO: Building wiki cache
2024-07-10 19:06:00 EAT INFO: Building rif cache
2024-07-10 19:07:03 EAT INFO: Indexing genes
2024-07-10 19:07:03 EAT DEBUG: Spawned worker process on chunk 0
2024-07-10 19:07:07 EAT INFO: Indexing phenotypes
2024-07-10 19:07:08 EAT DEBUG: Spawned worker process on chunk 0
2024-07-10 19:07:33 EAT INFO: Combining and compacting indices
2024-07-10 19:07:33 EAT DEBUG: Spawned worker to compact files from 0 to 2
2024-07-10 19:07:34 EAT DEBUG: Removing databases that were compacted into 0_2
2024-07-10 19:07:34 EAT DEBUG: Completed xapian-compact 0_2; handled 2 files in 0.018332980800005318 minutes
2024-07-10 19:07:34 EAT DEBUG: Completed parallel xapian compacts
2024-07-10 19:07:35 EAT DEBUG: Removing databases that were compacted into combinedt8vg3h63
2024-07-10 19:07:35 EAT DEBUG: Completed xapian-compact combinedt8vg3h63; handled 1 files in 0.017718926316653474 minutes
2024-07-10 19:07:35 EAT INFO: Writing table checksums into index
2024-07-10 19:07:36 EAT INFO: Writing generif checksums into index
2024-07-10 19:07:36 EAT INFO: Index built
2024-07-10 19:07:36 EAT INFO: Time to Index: 0:01:35.973015

Caveats That I'll Try to Fix

mypy fails when I run it against this branch. Here's what I see:

scripts/index-genenetwork:262: error: Cannot infer type argument 1 of "bind" of "Maybe"
scripts/index-genenetwork:263: error: Argument 1 to "bind" of "Maybe" has incompatible type "Callable[[Any], str]"; expected "Callable[[Any], Maybe[Any]]"
scripts/index-genenetwork:263: error: Incompatible return value type (got "str", expected "Maybe[Any]")
scripts/index-genenetwork:264: error: "object" has no attribute "lower"
scripts/index-genenetwork:282: error: Cannot infer type argument 1 of "bind" of "Maybe"
scripts/index-genenetwork:282: error: Value of type "object" is not indexable
scripts/index-genenetwork:283: error: Argument 1 to "map" of "Maybe" has incompatible type "Callable[[List[Any]], Tuple[Counter[Any], MonadicDict]]"; expected "Callable[[List[Any]], List[Any]]"

It seems like mypy can't infer the subtypes for the various monaddicts.

jnduli added 3 commits July 10, 2024 18:57
Replaced with `Maybe.apply` where necessary removing the need to
check if a value is Nothing.
cache = MonadicDict()
count: Counter[str] = Counter()
words_regex = re.compile(r"\w+")
for entry in bindings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnduli how you build the monadic dict IMHO is a tad bit too complicated for me to follow. I would suggest that from the returned bindings, build a dict first, then do the thing with the counter, and finally return the cache as a monadic dict as: MonadicDict(my_foo_dict)

Copy link
Collaborator

@BonfaceKilz BonfaceKilz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No monads. Just sent a long response of why in our chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants