Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check if a type is/is_not parent of another type ? #160

Open
ttpro1995 opened this issue Jan 21, 2021 · 2 comments
Open

How to check if a type is/is_not parent of another type ? #160

ttpro1995 opened this issue Jan 21, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@ttpro1995
Copy link

ttpro1995 commented Jan 21, 2021

Follow the example of "Problem type inference".

graph

From one dataframe, I already make a list of type for each column. Here is the type_list:

[Discrete,
 Nominal,
 Discrete,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Nominal,
 Binary,
 Discrete,
 Discrete,
 Discrete,
 Nominal,
 Binary]

type(type_list[0]) give visions.types.type.VisionsBaseTypeMeta

Now, I want to check if each type either have parent type of Categorical or Numeric.

for column, t in zip(column, type_list):
     if is_type_parent_of_categorical(t): 
            category_job(dataframe[column]) 
# binary is child if Categorical
is_type_parent_of_categorical(type_list[14]) -> True 

# Discrete is child of Numeric 
is_type_parent_of_categorical(type_list[0]) -> False 

How should I implement is_type_parent_of_categorical ?

My workaround seem to work because string comparision:

def is_type_parent_of_categorical(visions_type):
        type_str = str(visions_type)
            if type_str in ["Categorical", "Ordinal", "Nominal", "Binary"]:
                return True
            return False
@ttpro1995 ttpro1995 added the bug Something isn't working label Jan 21, 2021
@ieaves
Copy link
Collaborator

ieaves commented Jan 26, 2021

Hey @ttpro1995 - there's a short and a long answer to your question.

Short Answer: Type relations are not defined on the types inheritance hierarchy (all types inherit from VisionsBaseType), rather they can be accessed from the .relations property. You'll notice I'm using the term relations rather than children which leads to...

Long Answer: Only nodes in a typeset have actual children. The relations attribute on a type will return a list of potential parents to the Type. Encoding parent relations on types rather than child relations allows us to compose types together to form typesets (easiest way to see this -> the root of a typeset graph is Generic, if Generic tracked its children then creating a new type like PositiveInteger would counterintuitively require source code changes to Generic; it would effectively produce strong coupling between types).

So, children only really exist on a TypeSet but it's pretty easy to get these as well. I'm going to use the StandardTypeset as an example but the same will work for any typeset you create / use.

Under the hood visions uses networkx to build typeset graphs. Each typeset has two graph attributes:

  1. A base_graph which includes non-inferential relations (i.e. excludes Int -> Float because that would require a coercion to the test sequence).
  2. A relation_graph which includes all possible types and relations.

So in order to get all possible children of a node in a Typeset we just have to use the networkx API and the Typesets relation_graph.

typeset = StandardTypeset()
test_type = Categorical

child_types = typeset.relation_graph[test_type]  

Technically child_types is going to be a networkx AtlasView object but it supports the in operation so it will work just fine for your purposes. So your is_child function would look something like

def is_child(typeset, A, B)
    """Determines if B is a child of A for a given typeset"""
    return B in typeset.relation_graph[A]

Technically this will only check a single level deep in the tree (i.e. the children), judging from your example you're actually interested in evaluating all possible descendants of a node which can be similarly achieved by

import networkx as nx

def is_descendant(typeset, A, B)
    """Determines if B is a descendant of A for a given typeset"""
    return B in nx.descendants(typeset.relation_graph, A)

EDIT:

It occurred to me you may simply be interested in determining whether your data is Numeric or Categorical - there's an even easier way to do this than checking the parent relations which is just to create a new typeset i.e.

new_typeset = Generic + Numeric + Category

new_typeset.infer_type(df)

@ieaves ieaves added enhancement New feature or request and removed bug Something isn't working labels Jan 26, 2021
@ieaves
Copy link
Collaborator

ieaves commented Jan 26, 2021

If you're interested in making a PR to include some of this functionality by default we would be more than happy to help you get those through! In the meantime, I've marked this as an enhancement request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants