This repository has been archived by the owner on May 4, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 50
Merged
Port to Nulls.jl #288
Changes from 2 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
a1c2c1a
Port to Nulls.jl
nalimilan 9169e9f
Revert behavior of == in presence of null to match that of NA
nalimilan be02c30
Require Nulls 0.1.0
nalimilan f5534d5
Add back promotion tests
nalimilan 19c79a8
Fix rep() test on 0.7
nalimilan 4ccfaf3
Remove more lifted operations on Null
nalimilan c3f7a50
Require Nulls 0.1.1
nalimilan 67f9aca
Remove objects defined in Nulls from docs
nalimilan 3bff91c
Rename dropna() to dropnull() but keep it
nalimilan 122fa56
Remove dropnull() thanks to efficient specialization of collect(::Eac…
nalimilan e3776ba
Remove remaining occurrences of NA
nalimilan b9999ee
Deprecate skipna argument in favor of skipnull
nalimilan 8859847
Remove even more uses of na
nalimilan 920f92c
Fix use NA with @data and @pdata
nalimilan 22ebdb0
Add deprecation for NAException
nalimilan 02bb120
Fix conversion between Array{Union{T, Null}} and DataArray
nalimilan f3a7af6
Stop exporting nonexistent head() and tail()
nalimilan 96f7b05
Override Nulls.levels() instead of defining custom function
nalimilan 622613a
Remove method redundant with ==(::AbstractArray{>:Null, ::AbstractArr…
nalimilan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ module DataArrays | |
using Base: promote_op | ||
using Base.Cartesian, Reexport | ||
@reexport using StatsBase | ||
@reexport using Nulls | ||
using SpecialFunctions | ||
|
||
const DEFAULT_POOLED_REF_TYPE = UInt32 | ||
|
@@ -25,23 +26,15 @@ module DataArrays | |
DataArray, | ||
DataMatrix, | ||
DataVector, | ||
dropna, | ||
each_failna, | ||
each_dropna, | ||
each_replacena, | ||
EachFailNA, | ||
EachDropNA, | ||
EachReplaceNA, | ||
EachFailNull, | ||
EachDropNull, | ||
EachReplaceNull, | ||
FastPerm, | ||
getpoolidx, | ||
gl, | ||
head, | ||
isna, | ||
levels, | ||
NA, | ||
NAException, | ||
NAtype, | ||
padna, | ||
padnull, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've left this method here, but we could define it in Nulls if that's really useful. No hurry, though. |
||
pdata, | ||
PooledDataArray, | ||
PooledDataMatrix, | ||
|
@@ -55,7 +48,6 @@ module DataArrays | |
tail | ||
|
||
include("utils.jl") | ||
include("natype.jl") | ||
include("abstractdataarray.jl") | ||
include("dataarray.jl") | ||
include("pooleddataarray.jl") | ||
|
@@ -71,7 +63,6 @@ module DataArrays | |
include("extras.jl") | ||
include("grouping.jl") | ||
include("statistics.jl") | ||
include("predicates.jl") | ||
include("literals.jl") | ||
include("deprecated.jl") | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,9 @@ | |
AbstractDataArray{T, N} | ||
|
||
An `N`-dimensional `AbstractArray` whose entries can take on values of type | ||
`T` or the value `NA`. | ||
`T` or the value `null`. | ||
""" | ||
abstract type AbstractDataArray{T, N} <: AbstractArray{Data{T}, N} end | ||
abstract type AbstractDataArray{T, N} <: AbstractArray{Union{T,Null}, N} end | ||
|
||
""" | ||
AbstractDataVector{T} | ||
|
@@ -20,45 +20,43 @@ A 2-dimensional [`AbstractDataArray`](@ref) with element type `T`. | |
""" | ||
const AbstractDataMatrix{T} = AbstractDataArray{T, 2} | ||
|
||
Base.eltype(d::AbstractDataArray{T, N}) where {T, N} = Union{T,NAtype} | ||
Base.eltype(d::AbstractDataArray{T, N}) where {T, N} = Union{T,Null} | ||
|
||
# Generic iteration over AbstractDataArray's | ||
|
||
Base.start(x::AbstractDataArray) = 1 | ||
Base.next(x::AbstractDataArray, state::Integer) = (x[state], state + 1) | ||
Base.done(x::AbstractDataArray, state::Integer) = state > length(x) | ||
|
||
Base.broadcast{T}(::typeof(isna), a::AbstractArray{T}) = | ||
NAtype <: T ? BitArray(map(x->isa(x, NAtype), a)) : falses(size(a)) # -> BitArray | ||
|
||
# FIXME: type piracy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I couldn't find a way to remove this without hurting performance, and since the point of porting DataArrays is to preserve performance... Maybe it's OK to keep this for now. Another solution would be to move this definition to Nulls.jl, and only implement here the |
||
""" | ||
isna(a::AbstractArray, i) -> Bool | ||
isnull(a::AbstractArray, i) -> Bool | ||
|
||
Determine whether the element of `a` at index `i` is missing, i.e. `NA`. | ||
Determine whether the element of `a` at index `i` is missing, i.e. `null`. | ||
|
||
# Examples | ||
|
||
```jldoctest | ||
julia> X = @data [1, 2, NA]; | ||
julia> X = @data [1, 2, null]; | ||
|
||
julia> isna(X, 2) | ||
julia> isnull(X, 2) | ||
false | ||
|
||
julia> isna(X, 3) | ||
julia> isnull(X, 3) | ||
true | ||
``` | ||
""" | ||
isna(a::AbstractArray{T}, i::Real) where {T} = NAtype <: T ? isa(a[i], NAtype) : false # -> Bool | ||
Base.isnull(a::AbstractArray{T}, i::Real) where {T} = Null <: T ? isa(a[i], Null) : false # -> Bool | ||
|
||
""" | ||
dropna(v::AbstractVector) -> AbstractVector | ||
|
||
Return a copy of `v` with all `NA` elements removed. | ||
Return a copy of `v` with all `null` elements removed. | ||
|
||
# Examples | ||
|
||
```jldoctest | ||
julia> dropna(@data [NA, 1, NA, 2]) | ||
julia> dropna(@data [null, 1, null, 2]) | ||
2-element Array{Int64,1}: | ||
1 | ||
2 | ||
|
@@ -76,53 +74,50 @@ dropna(v::AbstractVector) = copy(v) # -> AbstractVector | |
# TODO: Use values() | ||
# Use DataValueIterator type? | ||
|
||
struct EachFailNA{T} | ||
struct EachFailNull{T} | ||
da::AbstractDataArray{T} | ||
end | ||
each_failna(da::AbstractDataArray{T}) where {T} = EachFailNA(da) | ||
Base.length(itr::EachFailNA) = length(itr.da) | ||
Base.start(itr::EachFailNA) = 1 | ||
Base.done(itr::EachFailNA, ind::Integer) = ind > length(itr) | ||
function Base.next(itr::EachFailNA, ind::Integer) | ||
if isna(itr.da[ind]) | ||
throw(NAException()) | ||
Nulls.fail(da::AbstractDataArray{T}) where {T} = EachFailNull(da) | ||
Base.length(itr::EachFailNull) = length(itr.da) | ||
Base.start(itr::EachFailNull) = 1 | ||
Base.done(itr::EachFailNull, ind::Integer) = ind > length(itr) | ||
function Base.next(itr::EachFailNull, ind::Integer) | ||
if isnull(itr.da[ind]) | ||
throw(NullException()) | ||
else | ||
(itr.da[ind], ind + 1) | ||
end | ||
end | ||
|
||
struct EachDropNA{T} | ||
struct EachDropNull{T} | ||
da::AbstractDataArray{T} | ||
end | ||
each_dropna(da::AbstractDataArray{T}) where {T} = EachDropNA(da) | ||
Nulls.skip(da::AbstractDataArray{T}) where {T} = EachDropNull(da) | ||
function _next_nonna_ind(da::AbstractDataArray{T}, ind::Int) where T | ||
ind += 1 | ||
while ind <= length(da) && isna(da, ind) | ||
while ind <= length(da) && isnull(da, ind) | ||
ind += 1 | ||
end | ||
ind | ||
end | ||
Base.length(itr::EachDropNA) = length(itr.da) - sum(itr.da.na) | ||
Base.start(itr::EachDropNA) = _next_nonna_ind(itr.da, 0) | ||
Base.done(itr::EachDropNA, ind::Int) = ind > length(itr.da) | ||
function Base.next(itr::EachDropNA, ind::Int) | ||
Base.length(itr::EachDropNull) = length(itr.da) - sum(itr.da.na) | ||
Base.start(itr::EachDropNull) = _next_nonna_ind(itr.da, 0) | ||
Base.done(itr::EachDropNull, ind::Int) = ind > length(itr.da) | ||
function Base.next(itr::EachDropNull, ind::Int) | ||
(itr.da[ind], _next_nonna_ind(itr.da, ind)) | ||
end | ||
|
||
struct EachReplaceNA{S, T} | ||
struct EachReplaceNull{S, T} | ||
da::AbstractDataArray{S} | ||
replacement::T | ||
end | ||
function each_replacena(da::AbstractDataArray, replacement::Any) | ||
EachReplaceNA(da, convert(eltype(da), replacement)) | ||
end | ||
function each_replacena(replacement::Any) | ||
x -> each_replacena(x, replacement) | ||
function Nulls.replace(da::AbstractDataArray, replacement::Any) | ||
EachReplaceNull(da, convert(eltype(da), replacement)) | ||
end | ||
Base.length(itr::EachReplaceNA) = length(itr.da) | ||
Base.start(itr::EachReplaceNA) = 1 | ||
Base.done(itr::EachReplaceNA, ind::Integer) = ind > length(itr) | ||
function Base.next(itr::EachReplaceNA, ind::Integer) | ||
item = isna(itr.da, ind) ? itr.replacement : itr.da[ind] | ||
Base.length(itr::EachReplaceNull) = length(itr.da) | ||
Base.start(itr::EachReplaceNull) = 1 | ||
Base.done(itr::EachReplaceNull, ind::Integer) = ind > length(itr) | ||
function Base.next(itr::EachReplaceNull, ind::Integer) | ||
item = isnull(itr.da, ind) ? itr.replacement : itr.da[ind] | ||
(item, ind + 1) | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we leave this for the Nulls.jl docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's always hard to decide. I've added a commit removing these objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could link to the Nulls docs saying something like "DataArrays uses Nulls to represent missing data. For more information about Nulls, see [the Nulls docs](link)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already something on the homepage of the manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, missed that