- Added
interp_genoprob()
for linear interpolation of genotype probabilities, useful when comparing two sets of genotype probabilities derived by different methods and calculated at slightly different positions.
-
Have
zip_datafiles()
give a warning if any of the paths include".."
. -
insert_pseudomarkers()
now has acores
argument for doing calculations in parallel on multiple CPUs.
- Added
compare_genoprob()
for getting tabular comparison of two sets of genotype probabilities, for a single individual on a single chromosome.
- Added
deterministic
argument toguess_phase()
.
- Turned off debug C++ code.
- Added additional tests of check_cross2()
-
For crosstype "ail3", forgot to include the functions for requiring and checking the founder genotypes.
-
check_cross2 wasn't checking for missing names in gmap, pmap, and founder_geno
- Added function
guess_phase()
, for guessing the phase of imputed genotypes (such as frommaxmarg()
), mostly for visualization of genotypes of one individual from a multi-parent population.
- Fix handling of missing values in internal functions
mpp_decode_geno()
andmpp_encode_alleles()
.
- Implemented two new crosstypes:
"dh6"
for 6-way doubled haploids (for a set of maize MAGIC populations developed at the Wisconsin Crop Innovation Center) andail3
for 3-way advanced intercross lines.
- Added internal C++ function
check_founder_geno_size()
for checking the dimensions of the founder genotype data from R, and added this check to thecheck_cross2()
function.
- Added function
read_csv_numer()
that is likeread_csv()
but assumes that all columns expect the first are strictly numeric. Now used byread_pheno()
to vastly speed things up (because otherwise I was assuming everything was character and then converting, and that conversion to numeric was deathly slow).
- Fix formatting problem in output of
summary.cross2()
within RStudio.
- Revised installation instructions.
-
Added function
scale_kinship()
which converts a kinship matrix (or a list of such, in the case of the "leave one chromosome out" method) to be like a correlation matrix. -
Removed the "normalize" argument from
calc_kinship()
, though left the internal functionnormalize_kinship()
in place, for now.
insert_pseudomarkers
now gives an error if the inputmap
isNULL
.
count_xo
now works with the output ofsim_geno
. The result is a 3d-array of counts of crossovers for each individual on each chromosome in each imputation.
- Added
chisq_colpairs
which performs chi-square tests for independence on all pairs of columns of a matrix. It just calculates the statistics.
- Added an argument
save_rf
toest_map()
; ifTRUE
, the estimated recombinations are saved as an attribute ("rf"
) of the result. This can be useful for diagnostic purposes, for example when the estimated recombination fraction between markers is > 1/2. (After converting to genetic distance, rf>1/2 is indistinguishable from rf=1/2.)
-
New function
reduce_map_gaps
that reduces the length of any gaps in map. (Gaps greater thanmin_gap
are reduced tomin_gap
.) -
maxmarg
now picks at random among genotypes that jointly share the maximum probability. Previously, it picked the first among these. Added an argumenttol
; if two genotypes have probabilities that differ by no more thantol
, they are treated as having the same probability. -
New function
calc_entropy
takes the results ofcalc_genoprob
and calculates, for each individual at each genomic postion, the entropy of the genotype probability distribution, as a measure of missing information.
-
Fix bug in
find_map_gaps
regarding the case that the output are empty. -
Fix bug in attempting to subsett
calc_genoprob
output by individual using individuals that aren't present in the data. -
Fix bug in
est_map
where it was producingNaN
s in some cases.
-
read_cross2
now unzips a.zip
file to a separate directory, to avoid possibility of clashing of multiple sets of files. -
read_cross2
will now ignore any JSON or YAML files in the.zip
file that have the pattern__MACOSX/._*
. -
read_cross2
will stop with an error if a.zip
file contains multiple JSON or multiple YAML files. If there's both a YAML and a JSON file, the YAML file is used and a warning is issued.
est_map
now gives a warning if it reaches the maximum number of iterations without converging.
- Implemented new cross types
"risib4"
,"risib8"
, and"magic19"
. The"risib8"
cross type corresponds to the Collaborative Cross. The"magic19"
cross type corresponds to the 19-way Arabidopsis MAGIC lines of Kover et al (2009) PLOS Genetics 5:e1000551.
-
Added argument
lowmem
toest_map
; default isFALSE
, which corresponds to a new implementation that uses more memory but is considerably faster. -
Added function
find_map_gaps
for identifying larger inter-marker gaps in a genetic map. -
Added function
calc_geno_freq
for calculating genotype frequencies, by individual or by marker (from the multipoint genotype probabilities returned bycalc_genoprob
).
- Implemented new cross types
"riself4"
,"riself8"
, and"riself16"
, for multi-way MAGIC populations (multi-way RILs by selfing).
- Fixed problem in
read_cross2
in the case that data has a physical map but not a genetic map.
- Added argument 'overwrite' to
write_control_file
; ifTRUE
, overwrite the file, if it's present. (Previously, you were always forced to first remove it.)
-
Added function
ind_ids_covar
to grab individual IDs from the covariate data. -
ind_ids()
now return individuals that are in any of geno, pheno, covar.
subset_cross2()
now deals properly with the case that chromosome or individual IDs are not found in cross object, and deals with the case that geno and pheno (and covar) have different individuals.
-
Added functions
count_xo
andlocate_xo
for getting estimates of the number of crossovers on each chromosome in each individual, and of their locations. -
Added
compare_geno
for comparing raw genotypes between pairs of individuals (to look for possible sample duplicates). -
Added
calc_errorlod
to help identify potential genotyping errors (and problem markers or individuals).
-
Made various small improvements to the handling of problems in the input files.
-
Small changes to better handle genotype probabilities that are in the qtl2feather format.
-
Added internal functions
dim.calc_genoprob
anddimnames.calc_genoprob
, from Brian Yandell, for use with qtl2feather, which uses feather to store genotype probabilities in a file (to save memory). -
In precess of revising various functions to use qtl2feather, particularly in grabbing dimnames (with the above functions), but also to avoid
seq(along=genoprobs)
and instead useseq_len(length(genoprobs))
.
-
Removed the distinction between "lines" and "individuals", and the
linemap
component in the input that connected them. (While for RILs like the Collaborative Cross, we may want to work with individual-level phenotypes, it seems best to deal with that outside of the cross object.) -
Removed the functions
n_lines()
andline_ids()
. Added some functions:n_ind_geno()
for number of genotyped individuals, andind_ids_geno()
to get their IDs.n_ind_pheno()
for number of phenotyped individuals, andind_ids_pheno()
to get their IDs.n_ind_gnp()
for number of individuals with both genotypes and phenotypes, andind_ids_gnp()
to get their IDs.
-
Also,
n_ind()
andind_ids()
now return the total number of individuals, across both genotypes and phenotypes.
- Added a function
find_ibd_segments
that takes genotypes for a set of inbred strains and searches for segments where strain pairs look to be IBD.
- Refactored to simplify the main data structures for genetic map and
genotype probabilities.
calc_genoprob
now needs you to provide a pseudomarker map (created withinsert_pseudomarkers
).