ReadWriteDlm2
functions readdlm2()
, writedlm2()
, readcsv2()
and writecsv2()
are similar to those of stdlib.DelimitedFiles, but with additional support for Dates
formats, Complex
, Rational
, Missing
types and special decimal marks. ReadWriteDlm2
supports the Tables.jl
interface.
-
For "decimal dot" users the functions
readcsv2()
andwritecsv2()
have the respective defaults: Delimiter is','
(fixed) anddecimal='.'
. -
The basic idea of
readdlm2()
andwritedlm2()
is to support the decimal comma countries. These functions use';'
as default delimiter and','
as default decimal mark. "Decimal dot" users of these functions need to definedecimal='.'
. -
Alternative package:
CSV
(supports also special decimal marks)
This package is registered and can be installed within the Pkg
REPL-mode: Type ]
in the REPL and then:
pkg> add ReadWriteDlm2
Basic Example(-> more): How To Use ReadWriteDlm2
julia> using ReadWriteDlm2, Dates # activate modules ReadWriteDlm2, Dates
julia> a = ["text" 1.2; Date(2017,1,1) 1]; # create array with: String, Date, Float64 and Int eltype
julia> writedlm2("test.csv", a) # test.csv(decimal comma): "text;1,2\n2017-01-01;1\n"
julia> readdlm2("test.csv") # read `CSV` data: All four eltypes are parsed correctly!
2×2 Array{Any,2}:
"text" 1.2
2017-01-01 1
julia> using DataFrames # Tables interface: auto Types for DataFrame columns
julia> DataFrame(readdlm2("test.csv", tables=true))
2×2 DataFrame
│ Row │ Column1 │ Column2 │
│ │ Any │ Real │
├─────┼────────────┼─────────┤
│ 1 │ text │ 1.2 │
│ 2 │ 2017-01-01 │ 1 │
Read a matrix from source
. The source
can be a text file, stream or byte array.
Each line, separated by eol
(default is '\n'
), gives one row.
The columns are separated by ';'
, another delim
can be defined.
readdlm2(source; options...)
readdlm2(source, T::Type; options...)
readdlm2(source, delim::Char; options...)
readdlm2(source, delim::Char, T::Type; options...)
readdlm2(source, delim::Char, eol::Char; options...)
readdlm2(source, delim::Char, T::Type, eol::Char; options...)
Pre-processing of source
with regex substitution changes the decimal marks
from d,d
to d.d
. For default rs
the keyword argument decimal=','
sets
the decimal Char in the r
-string of rs
. When a special regex substitution
tuple rs=(r.., s..)
is defined, the argument decimal
is not used (
-> Example). Pre-processing
can be switched off with: rs=()
.
In addition to stdlib readdlm()
, data is also parsed for Dates
formats (ISO),
theTime
format HH:MM[:SS[.s{1,9}]]
and for complex and rational numbers.
To deactivate parsing dates/time set: dfs="", dtfs=""
.
locale
defines the language of day (E
, e
) and month (U
, u
) names.
The result will be a (heterogeneous) array of default element type Any
. If
header=true
it will be a tuple containing the data array and a vector for
the columnnames. Other (abstract) types for the data array elements could be
defined. If data is empty, a 0×0 Array{T,2}
is returned.
With tables=true
[, header=true
] option[s] a Tables
interface compatible
MatrixTable
with individual column types is returned, which for example can
be used as argument for DataFrame()
.
decimal=','
: Decimal mark Char used by defaultrs
, irrelevant ifrs
-tuple is not the default oners=(r"(\d),(\d)", s"\1.\2")
: Regex (r,s)-tuple, the default change d,d to d.d ifdecimal=','
dtfs="yyyy-mm-ddTHH:MM:SS.s"
: Format string for DateTime parsingdfs="yyyy-mm-dd"
: Format string for Date parsinglocale="english"
: Language for parsing dates names, default is englishtables=false
: ReturnTables
interface compatible MatrixTable iftrue
dfheader=false
:dfheader=true
is shortform fortables=true, header=true
missingstring="na"
: How missing values are represented, default is"na"
readcsv2(source, T::Type=Any; opts...)
Equivalent to readdlm2()
with delimiter ','
and decimal='.'
.
More information about Base functionality and (keyword) arguments - which are also
supported by readdlm2()
and readcsv2()
- is available in the documentation for readdlm().
Module | Function | Delimiter | Dec. Mark | Element Type | Ext. Parsing |
---|---|---|---|---|---|
DelimitedFiles | readdlm() |
' ' |
'.' |
Float64/Any | No (String) |
ReadWriteDlm2 | readdlm2() |
';' |
',' |
Any | Yes |
ReadWriteDlm2 | readcsv2() |
',' |
'.' |
Any | Yes |
ReadWriteDlm2 | readdlm2(opt:tables=true) |
';' |
',' |
Column spec. | Yes, + col T |
ReadWriteDlm2 | readcsv2(opt:tables=true) |
',' |
'.' |
Column spec. | Yes, + col T |
Write A
(a vector, matrix, or an iterable collection of iterable rows, a
Tables
source) as text to f
(either a filename or an IO stream). The columns
are separated by ';'
, another delim
(Char or String) can be defined.
writedlm2(f, A; options...)
writedlm2(f, A, delim; options...)
By default, a pre-processing of values takes place. Before writing as strings,
decimal marks are changed from '.'
to ','
.
With a keyword argument another decimal mark can be defined.
To switch off this pre-processing set: decimal='.'
.
In writedlm2()
the output format for Date
and DateTime
data can be
defined with format strings. Defaults are the ISO formats. Day (E
, e
) and
month (U
, u
) names are written in the locale
language. For writing
Complex
numbers the imaginary component suffix can be selected with the
imsuffix=
keyword argument.
decimal=','
: Character for writing decimal marks, default is a commadtfs="yyyy-mm-ddTHH:MM:SS.s"
: Format string, DateTime write formatdfs="yyyy-mm-dd"
: Format string, Date write formatlocale="english"
: Language for writing date names, default is englishimsuffix="im"
: Complex - imaginary component suffix"im"
(=default),"i"
or"j"
missingstring="na"
: How missing values are written, default is"na"
writecsv2(f, A; opts...)
Equivalent to writedlm2()
with fixed delimiter ','
and decimal='.'
.
Module | Function | Delimiter | Decimal Mark |
---|---|---|---|
DelimitedFiles | writedlm() |
'\t' |
'.' |
ReadWriteDlm2 | writedlm2() |
';' |
',' |
ReadWriteDlm2 | writecsv2() |
',' |
'.' |
julia> using ReadWriteDlm2
julia> a = Any[1 complex(1.5,2.7);1.0 1//3]; # create array with: Int, Complex, Float64 and Rational type
julia> writecsv2("test.csv", a) # test.csv(decimal dot): "1,1.5+2.7im\n1.0,1//3\n"
julia> readcsv2("test.csv") # read CSV data: All four types are parsed correctly!
2×2 Array{Any,2}:
1 1.5+2.7im
1.0 1//3
julia> using ReadWriteDlm2
julia> a = Float64[1.1 1.2;2.1 2.2]
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> writedlm2("test.csv", a; decimal='€') # '€' is decimal Char in 'test.csv'
julia> readdlm2("test.csv", Float64; decimal='€') # a) standard: use keyword argument
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> readdlm2("test.csv", Float64; rs=(r"(\d)€(\d)", s"\1.\2")) # b) more flexible: rs-Regex-Tupel
2×2 Array{Float64,2}:
1.1 1.2
2.1 2.2
julia> using ReadWriteDlm2
julia> a = Union{Missing, Float64}[1.1 0/0;missing 2.2;1/0 -1/0]
3×2 Array{Union{Missing, Float64},2}:
1.1 NaN
missing 2.2
Inf -Inf
julia> writedlm2("test.csv", a; missingstring="???") # use "???" for missing data
julia> read("test.csv", String)
"1,1;NaN\n???;2,2\nInf;-Inf\n"
julia> readdlm2("test.csv", Union{Missing, Float64}; missingstring="???")
3×2 Array{Union{Missing, Float64},2}:
1.1 NaN
missing 2.2
Inf -Inf
julia> using ReadWriteDlm2, Dates
julia> Dates.LOCALES["french"] = Dates.DateLocale(
["janvier", "février", "mars", "avril", "mai", "juin",
"juillet", "août", "septembre", "octobre", "novembre", "décembre"],
["janv", "févr", "mars", "avril", "mai", "juin",
"juil", "août", "sept", "oct", "nov", "déc"],
["lundi", "mardi", "mercredi", "jeudi", "vendredi", "samedi", "dimanche"],
["lu", "ma", "me", "je", "ve", "sa", "di"],
);
julia> a = hcat([Date(2017,1,1), DateTime(2017,1,1,5,59,1,898), 1, 1.0, "text"])
5x1 Array{Any,2}:
2017-01-01
2017-01-01T05:59:01.898
1
1.0
"text"
julia> writedlm2("test.csv", a; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
julia> read("test.csv", String) # to see what have been written in "test.csv" file
"dimanche, 1.janvier 2017\ndi, 1.janv 2017 5:59:1,898\n1\n1,0\ntext\n"
julia> readdlm2("test.csv"; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
5×1 Array{Any,2}:
2017-01-01
2017-01-01T05:59:01.898
1
1.0
"text"
See -> DataFrames
for installation and more information.
julia> using ReadWriteDlm2, Dates, DataFrames, Statistics
julia> df = DataFrame( # Create DataFrame `df`
date = [Date(2017,1,1), Date(2017,1,2), nothing],
value_1 = [1.4, 1.8, missing],
value_2 = [2, 3, 4]
)
3×3 DataFrame
│ Row │ date │ value_1 │ value_2 │
│ │ Union… │ Float64⍰ │ Int64 │
├─────┼────────────┼──────────┼─────────┤
│ 1 │ 2017-01-01 │ 1.4 │ 2 │
│ 2 │ 2017-01-02 │ 1.8 │ 3 │
│ 3 │ │ missing │ 4 │
julia> writedlm2("testdf_com.csv", df) # decimal comma: write DataFrame df
julia> read("testdf_com.csv", String) # check csv data
"date;value_1;value_2\n2017-01-01;1,4;2\n2017-01-02;1,8;3\nnothing;na;4\n"
julia> df2 = DataFrame(readdlm2("testdf_com.csv", header=true, tables=true))
3×3 DataFrame
│ Row │ date │ value_1 │ value_2 │
│ │ Union… │ Float64⍰ │ Int64 │
├─────┼────────────┼──────────┼─────────┤
│ 1 │ 2017-01-01 │ 1.4 │ 2 │
│ 2 │ 2017-01-02 │ 1.8 │ 3 │
│ 3 │ │ missing │ 4 │
julia> mean(skipmissing(df2[!, :value_1]))
1.6
julia> mean(df2[!, :value_2])
3.0