Write an AI to assess the likelihood that a person will have a particular genetic trait.
$ python heredity.py data/family0.csv
Harry:
Gene:
2: 0.0092
1: 0.4557
0: 0.5351
Trait:
True: 0.2665
False: 0.7335
James:
Gene:
2: 0.1976
1: 0.5106
0: 0.2918
Trait:
True: 1.0000
False: 0.0000
Lily:
Gene:
2: 0.0036
1: 0.0136
0: 0.9827
Trait:
True: 0.0000
False: 1.0000
Complete the implementations of joint_probability
, update
, and normalize
.
The joint_probability
function should take as input a dictionary of people, along with data about who has how many copies of each of the genes, and who exhibits the trait. The function should return the joint probability of all of those events taking place.
- The function accepts four values as input:
people
,one_gene
,two_genes
, andhave_trait
.people
is a dictionary of people as described in the “Understanding” section. The keys represent names, and the values are dictionaries that contain mother and father keys. You may assume that eithermother
andfather
are both blank (no parental information in the data set), ormother
andfather
will both refer to otherpeople
in the people dictionary.one_gene
is a set of all people for whom we want to compute the probability that they have one copy of the gene.two_genes
is a set of all people for whom we want to compute the probability that they have two copies of the gene.have_trait
is a set of all people for whom we want to compute the probability that they have the trait.- For any person not in
one_gene
ortwo_genes
, we would like to calculate the probability that they have no copies of the gene; and for anyone not inhave_trait
, we would like to calculate the probability that they do not have the trait.
- For example, if the family consists of Harry, James, and Lily, then calling this function where
one_gene = {"Harry"}
,two_genes = {"James"}
, andtrait = {"Harry", "James"}
should calculate the probability that Lily has zero copies of the gene, Harry has one copy of the gene, James has two copies of the gene, Harry exhibits the trait, James exhibits the trait, and Lily does not exhibit the trait. - For anyone with no parents listed in the data set, use the probability distribution
PROBS["gene"]
to determine the probability that they have a particular number of the gene. - For anyone with parents in the data set, each parent will pass one of their two genes on to their child randomly, and there is a
PROBS["mutation"]
chance that it mutates (goes from being the gene to not being the gene, or vice versa). - Use the probability distribution
PROBS["trait"]
to compute the probability that a person does or does not have a particular trait.
The update
function adds a new joint distribution probability to the existing probability distributions in probabilities
.
- The function accepts five values as input:
probabilities
,one_gene
,two_genes
,have_trait
, andp
.probabilities
is a dictionary of people as described in the “Understanding” section. Each person is mapped to a"gene"
distribution and a"trait"
distribution.one_gene
is a set of people with one copy of the gene in the current joint distribution.two_genes
is a set of people with two copies of the gene in the current joint distribution.have_trait
is a set of people with the trait in the current joint distribution.p
is the probability of the joint distribution.
- For each person
person
inprobabilities
, the function should update theprobabilities[person]["gene"]
distribution andprobabilities[person]["trait"]
distribution by addingp
to the appropriate value in each distribution. All other values should be left unchanged. - For example, if
"Harry"
were in bothtwo_genes
and inhave_trait
, thenp
would be added toprobabilities["Harry"]["gene"][2]
and toprobabilities["Harry"]["trait"][True]
. - The function should not return any value: it just needs to update the
probabilities
dictionary.
The normalize
function updates a dictionary of probabilities such that each probability distribution is normalized (i.e., sums to 1, with relative proportions the same).
- The function accepts a single value:
probabilities
.probabilities
is a dictionary of people as described in the “Understanding” section. Each person is mapped to a"gene"
distribution and a"trait"
distribution.
- For both of the distributions for each person in
probabilities
, this function should normalize that distribution so that the values in the distribution sum to 1, and the relative values in the distribution are the same. - For example, if
probabilities["Harry"]["trait"][True]
were equal to0.1
andprobabilities["Harry"]["trait"][False]
were equal to0.3
, then your function should update the former value to be0.25
and the latter value to be0.75
: the numbers now sum to 1, and the latter value is still three times larger than the former value. - The function should not return any value: it just needs to update the
probabilities
dictionary.
You should not modify anything else in heredity.py
other than the three functions the specification calls for you to implement, though you may write additional functions and/or import other Python standard library modules. You may also import numpy
or pandas
, if familiar with them, but you should not use any other third-party Python modules.