Introduction
The DLMReader package has the filereader and filerwriter functions for reading and writing delimited files, respectively. They have a few keyword arguments which we explain each of them in this section.
filereader
User must pass the file name as the first argument of filereader to read a delimited file into Julia, i.e. filereader(path; ...). The filereader function assumes that the observations are separated by comma and the first line of the input file contains the columns' name, additionally, it assumes that the strings are not quoted. It scans the first 20 lines of the input file to detect Int64 and Float64 columns, and use String as the default type when the detection goes wrong. Thus, for a well-formatted csv file, user does not need to use any keyword argument. However, the filereader function provides some keyword arguments to give user extra flexibility for reading complex delimited files.
filereadertreats empty strings and "." as missing
Keyword arguments
- types
 - delimiter
 - dlmstr
 - ignorerepeated
 - header
 - linebreak
 - guessingrows
 - fixed
 - quotechar
 - escapechar
 - dtformat
 - int_base
 - informat
 - skipto
 - limit
 - multiple_obs
 - line_informat
 - buffsize
 - lsize
 - string_trim
 - makeunique
 - emptycolname
 - warn
 - eolwarn
 - threads
 - threshold
 
types
User can pass the types of each column of the input file by using the types keyword argument. User may pass a vector of types which includes every type of each column, or may pass a dictionary of types for few selected columns.
Default: auto detection
julia> ds = filereader(IOBuffer("""x1,x2
       12,13
       1,2
       """), types = [Int, Float64])
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Float64? 
─────┼────────────────────
   1 │       12      13.0
   2 │        1       2.0delimiter
To change the default delimiter, user must pass the delimiter keyword argument. The delimiter keyword argument only accept Char as delimiter. Additionally, user can pass a vector of Char which causes filereader to use them as alternative delimiters.
Default: comma
julia> ds = filereader(IOBuffer("""x1;x2
       12;13
       1;2
       """), delimiter = ';')
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2dlmstr
This keyword argument is used to pass a string as the delimiter for values.
Default: nothing
julia> ds = filereader(IOBuffer("""x1|:|x2
       12|:|13
       1|:|2
       """), dlmstr = "|:|" )
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2ignorerepeated
If it is set as true, repeated delimiters will be ignored.
Default: false
julia> ds = filereader(IOBuffer("""x1,,x2
       12,13
       1,,,,2
       """), ignorerepeated = true)
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2header
User must set this as false if the first line of the input file is not the column header. Additionally, user can pass a vector of columns' name, which will be used as the columns' header.
Default: true
julia> ds = filereader(IOBuffer("""
       12,13
       1,2
       """), header = [:Col1, :Col2])
2×2 Dataset
 Row │ Col1      Col2     
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2linebreak
The filereader function use the value of this option as line separator. It can accept a Char or a vector of Char where the length of the vector is less than or equal two. For some rare cases user may need to pass this option to assist filereader in reading the input file.
Default: auto detection
julia> ds = filereader(IOBuffer("""
       x1,x2;12,13;1,2;"""), linebreak = ';')
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2guessingrows
This provide the number of lines to be used for types detection. The filereader function will detect the types of the column more accurately if user increase this value, however, it costs more computation time.
Default: 20
fixed
This option is used for reading fixed width files. User must pass a dictionary of columns' locations (as a range) for reading a fixed width file.
Default: nothing
julia> ds = filereader(IOBuffer("""
       12
       34
       """), fixed = Dict(1=>1:1, 2=>2:2), header = false)
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        1         2
   2 │        3         4quotechar
If the texts are quoted in the input file, user must pass the quoted character via this keyword argument.
Default: nothing (the filereader assumes the texts are not quoted)
julia> ds = filereader(IOBuffer("""x1,x2
       "12",13
       "1",2
       """), quotechar = '"')
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │       12        13
   2 │        1         2escapechar
Declaring the escape char for quoted text.
Default: nothing (the filereader assumes the text are not quoted)
dtformat
User must pass the date format of DataTime columns if they are different from the standard format. The dtformat keyword argument accept a dictionary of values.
Default: nothing
julia> ds = filereader(IOBuffer("""date1,date2
       2020-1-1,2020/1/1
       2020-2-2,2020/2/2
       """), dtformat = Dict(1 => dateformat"y-m-d", 2 => dateformat"y/m/d"))
2×2 Dataset
 Row │ date1       date2      
     │ identity    identity   
     │ Date?       Date?      
─────┼────────────────────────
   1 │ 2020-01-01  2020-01-01
   2 │ 2020-02-02  2020-02-02int_base
The filereader can read integers with with given base. User can pass this information for a specific column.
Default: nothing
julia> ds = filereader(IOBuffer("""x1,x2
       100,100
       101,101
       """), int_base = Dict(1 => 2))
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        4       100
   2 │        5       101informat
User can pass a dictionary which provides the information of the informat of selected columns.
Default: nothing
julia> ds = filereader(IOBuffer("""x1,x2
       NA,12
       1,NA
       """), informat = Dict(1:2 .=> NA!))
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │  missing        12
   2 │        1   missing skipto
It can be used to start reading a file from specific location.
Default: 1
julia> ds = filereader(IOBuffer("""COL1, COL2
       1,2
       2,3
       3,4
       """), skipto = 3, header = false)
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        2         3
   2 │        3         4limit
It can be used to limit the number of observations read from the input file.
Default: Inf
julia> ds = filereader(IOBuffer("""COL1, COL2
       1,2
       2,3
       3,4
       """), limit = 1)
1×2 Dataset
 Row │ COL1      COL2     
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        1         2multiple_obs
If it is set as true, the filereader function assumes there may be more than one observation in each line of the input file.
Default: false
julia> ds = filereader(IOBuffer("""1,2,3,4,5
       6,7
       """), multiple_obs = true, header = [:x1, :x2], types = [Int, Int])
4×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        1         2
   2 │        3         4
   3 │        5         6
   4 │        7   missing line_informat
User can provide line informat via this keyword argument.
Default: nothing
buffsize
User can provide any positive number for the buffer size. Each thread allocates the amount of buffsize and reads the values from the input file into it.
Default: 2^16
lsize
It indicated the line buffer size for reading the input files. For very wide table use may need to manually adjust this option. Its value must be less than buffsize.
Default: 2^15
string_trim
Setting this as true will trim the trailing blanks of strings before storing them into the output data set.
DLMReadershipped with theSTRIP!informat which can be used to strip (removing leading and trailing blanks) any raw text before parsing.
Default: false
julia> ds = filereader(IOBuffer("""x1,x2
       "    fdh  ",df
       "dkhfd    ",dfadf
       """), quotechar = '"', string_trim = true)
2×2 Dataset
 Row │ x1        x2       
     │ identity  identity 
     │ String?   String?  
─────┼────────────────────
   1 │     fdh   df
   2 │ dkhfd     dfadf
julia> ds[:, :x1]
2-element Vector{Union{Missing, String}}:
 "    fdh"
 "dkhfd"
julia> ds = filereader(IOBuffer("""x1,x2,x3
       1,   2020-2-2   , " ff  "
       2,2020-1-1,"343"
       """), types = Dict(2 => Date), quotechar = '"', informat = Dict(2:3 .=> STRIP!))
2×3 Dataset
 Row │ x1        x2          x3       
     │ identity  identity    identity 
     │ Int64?    Date?       String?  
─────┼────────────────────────────────
   1 │        1  2020-02-02  ff
   2 │        2  2020-01-01  343
julia> ds[:, :x3]
2-element Vector{Union{Missing, String}}:
 "ff"
 "343"makeunique
If there are non-unique columns' names, this can resolve it by adding a suffix to the names.
Default: false
julia> ds = filereader(IOBuffer("""x,x
       1,2
       """), makeunique = true)
1×2 Dataset
 Row │ x         x_1      
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        1         2emptycolname
If it is set to true, it generates a column name for columns with empty name.
Default: false
julia> ds = filereader(IOBuffer("""x,
       1,2
       """), emptycolname = true)
1×2 Dataset
 Row │ x         NONAME1  
     │ identity  identity 
     │ Int64?    Int64?   
─────┼────────────────────
   1 │        1         2warn
Control the maximum number of warning and information. Setting it to 0 will suppress warnings and information during reading the input file.
Default: 20
eolwarn
Control if the end-of-line character warning should be shown.
Default: true
threads
For large files, the filereader function exploits all threads. However, this can be switch off by setting this argument as false.
Default: true
threshold
The file size threshold (in bytes) which specifies the minimum file size for switching to the high performance algorithm.
Default: 2^26
filewriter
The filewriter function writes a data set into disk. Behind the scene, it uses byrow function from InMemoryDatasets.jl to efficiently convert each row of the input data set into UInt8. The first argument of the filewriter must be a filename and the second argument must be the passed data set.
Keyword arguments
- delimiter
 - quotechar
 - mapformats
 - append
 - header
 - buffsize
 - lsize
 - threads
 
delimiter
By default, filewriter uses comma as delimiter, however, user can pass any other Char (or a vector of Char) via the delimiter keyword argument.
Default: comma
quotechar
The filewriter function does not quote values, if this is desired, the quote Char must be passed via the quotechar keyword argument.
Default: nothing
mapformats
Setting this as true causes filewriter to write the formatted values.
Default: false
append
Setting this as true causes filewriter to append values to the end of the input file.
Default: false
header
The filewriter function writes column names in the output file, however, this can be prevented by setting header = false.
Default: true
buffsize
This option controls the buffer size.
Default: 2^24
lsize
This option controls the line size for writing values.
Default: auto detection
threads
If set true, filewriter exploits all threads.
Default: true