xml2csv.pl has multiple options as can be seen below. The optionion names can be shortened as long as they remain unqiue so --c, --ca, --cas and --case are all identical. Also the program will accept -- (minus,minus), - (minus) or + (plus) as a parameter. Boolean options are negated by prefixing the command with 'no' so , for example '--noremove_crlf' will not remove 'cr/lf' from data whilst --remove_crlf will remove 'cr/lf' for data
These parameters (all with the same names) can be set in the configutation file (default name config.yaml). Where items are boolean they are set to either 1 (TRUE) or 0 (FALSE) rather than prefixing with (no)
Each of the commands is described below:
The help command displays a list of all the available options:
Usage: xml2csv.pl [options]
Options:
--(no)allow_file_overwrite default=allow_file_overwrite
--(no)allow_surrogate_ids default=allow_surrogate_ids
--case=string default='lower'
--diagram_template=string default='monochrome'
--diagram_type=string default='all'
--graphviz_command=string default='dot -Tpdf -O '
--help
--indent=integer default=3
--interval=integer default heuristically defined
--(no)list_templates default=nolist_templates
--load_config_file=string default='config/config.yaml'
--load_diagrams_file default='config/diagrams.yaml'
--load_structure_file=string default=''
--load_template_file=string default='config/database.yaml'
--load_xml_file=string default='data.xml'
--quiet
--(no)remove_crlf=boolean default=noremove_crlf
--sample_size=integer default=10
--save_config_file=string default='saved_config.yaml'
--save_counts_file=string default='saved_counts.txt'
--save_diagram_file=string default='diagram.dot'
--save_path=string default='output'
--save_structure_file=string default='saved_structure.yaml'
--substitution_string=string default='_x2c'
--template=string default='postgres'
--usage
--verbose
--(no)CSV_always_quote default=noCSV_always_quote
--CSV_escape_char=char default='"'
--CSV_quote_char=char default='"'
--CSV_sep_char=char default=','
--XML_ContentKey=string default='-content'
--(no)XML_ForceArray default=XML_ForceArray
--(no)XML_ForceContent default=XML_ForceContent
--(no)XML_KeepRoot default=XML_KeepRoot
--(no)XML_NoAttr default=noXML_NoAttr
--(no)XML_NormaliseSpace default=noXML_NormaliseSpace
--(no)XML_SuppressEmpty default=1
For further information see: https://github.com/datamgmt/xml2csv
For further information on CSV_ options see: https://metacpan.org/pod/Text::CSV
For further information on XML_ options see: https://metacpan.org/pod/XML::Simple
default=allow_file_overwrite
By default xml2csv.pl will just overwrite files. By setting this to noallow_file_overwrite the output directory must either not exist (in which case it is created at run time) or must be emply in order for the program to run
default=allow_surrogate_ids
By default the programme will create a number of surrogate keys in the database in order to keep track of relationships between elements. If you do not require the surrogate keys then noallow_surrogate_ids turns them off. By default a surrogate key will have a suffix of _x2c_id to make it easily recognisable and prevent name clashes with existing data fields. This can be changes with --substitution_string (see below)
See examples: noallow_surrogate_ids, allow_surrogate_ids
default='lower'
Defines whether column names are convertedto one case or another. 'lower', the default, converts them all to lowercase; 'upper' converts them all to uppercase and 'mixed' leaves them as found in the xml file used as a source.
See examples: case_lower, case_upper, case_mixed
default='monochrome'
The layout for diagrams is determined by a temp,ate found in config/diagrams.yaml. By default the programme comes with two templates 'monochrome' and 'colorful'. You can either modify these templates or create new ones to suit your needs Diagrams for information on creating/modifying templates and --list_templates to get a list of all currently available templates.
See examples: diagram_template_colorful, diagram_template_monochrome
default='all'
A diagram will always contain the table name, however it may also contain the columns of the table or a sample data set. The valid values are: 'none' which switches off diagram creation; 'basic' which returns just the table name; 'model' which returns the table anme and column details; 'sample' which returns the table name and a sample of the data; 'all', the default, which returns the table name, column name, and sample data.
See examples: diagram_type_none, diagram_type_basic, diagram_type_model, diagram_type_sample, diagram_type_all
default='dot -Tpdf -O '
This is the command required to create a diagram using Graphviz. By defaul it assumes that the 'dot' command is available in your path and that you want a PDF output. Further information can be found at Graphviz Command Line Invocation
default=3
This command determines the inent level of SQL statements that are generated. By default this is set to 3 but can be set to any integer
See example: indent
default heuristically defined
This command determines how frequently a progress message is printed out at verbose level 4 or greater. It is calculated from the size of the file automatically with larger files having less frequent updates however it can be set from the command line if you have a particular need.
default=nolist_templates
Lists the available database and diagram templates' For a default configuration this will return the following
Available templates:
The load_template_file called config/database.yaml contains:
csv_header
csv_noheader
mysql
oracle
postgres
The load_diagrams_file called config/diagrams.yaml contains:
colorful
monochrome
default='config/config.yaml'
This file contains the settings for the program. This option allows the system to use alternative config files specified either by an absolute or relative path. Files are in YAML format.
default='config/diagrams.yaml'
This file contains information on how to lay out the diagrams. This option allows the system to use alternative diagram files specified either by an absolute or relative path. Files are in YAML format. For further information on creating and editing these files see Diagrams
default=''
This file contains information on the data structure. This option allows the system to import an exisitng data structure specified either by an absolute or relative path. Files are in YAML format. For further information on creating and editing these files see Structure
default='database.yaml'
This file contains database specific syntax information. This option allows the system to use alternative database files specified either by an absolute or relative path. Files are in YAML format. For further information on editing these files see Database
default='data.xml'
This is the XML file you want to process. This option allows tthe user to specify a file to read in either by an absolute or relative path.
This option decrements the verbose-ness of the messages by 1 for each occurence - the default is 3 so 'xml2csv.pl -q -q -q' will reduce the verbose level to 0. It is the opposite of --verbose (which increments by 1)
default=noremove_crlf, remove_crlf
By default data it left unmodified in the output csv however sometimes it is desirable to remove carriage returns and line feeds in which case the --remove_crlf option does this. The programme uses the Perl '\R' operator to identify generic newlines - for further information on it's behaviour see Perldoc.
See example: remove_crlf
default=10
The diagram types 'sample' and 'all' will include a sample of the data from the XML file. By default it will return the first 10 rows or all rows if less than 10 exist. This option can be used to increase or decrease the sample size however it should be noted that very large samples may be undesirable becuase of the time to generate or the possibility of running out of memory whilst running the programme.
See exmaple: sample_size
default='saved_config.yaml'
This option means that the system will create a file with the name passed in the output directory that contains the entire configuration as run. This is useful in circumstances where you want to re-run a load (e.g. a daily load) and always want to use the same configuration.The saved file can be used with the --load_config_file option. An empty string means that no file will be written.
default='saved_counts.txt'
This option writes a file of the named parameter that contains the name of each record and the number of records found for each table. This is useful for audit purposes. An empty string means that no file will be written
default='diagram.dot'
This option means that the system will create a file with the name passed in the output directory that contains the giagram in Graphviz 'dot' format. The type of diagram used is determinined by the --diagram_template option. An empty string means that no file will be written, which is equivalent to --diagram-type=none
default='output'
This is the name of the directory where all the output will be saved. If the directory does not exist it will be created. The directory can be an absolute or relative path.
default='saved_structure.yaml'
This option means that the system will create a file with the name passed in the output directory that contains the structure configuration that was found in the input xml file. This is useful in circumstances where you want to re-run a load (e.g. a daily load) and ensure certain fieds are included even though they may not appear in some input files. The saved file can be used with the --load_structure_file option. An empty string means that no file will be written. For further information on using and editing these files see Structure
default='_x2c'
For the generation of surrogate keys and in cirtain other conditions a unique string is required to identfy data generated by xml2csv.pl. The default is to use _x2c which is probably pretty unique but can be varied if required.
See example: substitution_string
default='postgres'
This option determines the format of the scripts generated by the utility. This is useful if you want to load the output into a SQL database. By default it generates 'postgres' scripts, but can also generate mysql and oracle database scripts. In addition there are options for csv only with a header record (csv_header) and csv only with no header record (csv_noheader). This file can be modified to vary the output as required and add new output formats. The currently available templates can be viewed with --list_templates. For further information on editing these files see Database
See examples: template_csv_header, template_csv_no_header, template_oracle, template_mysql template_postgres
A synonym for --help
This option increments the verbose-ness of the messages by 1 for each occurence - the default is 3 so 'xml2csv.pl -v -v' will increase the verbose level to 5. It is the opposite of --quiet (which decrements by 1)
default=noCSV_always_quote
This encapsulates all data with the configured quote character.
The programme uses the Perl module Text::CSV to format the CSV record and file. This option can be passed directly through to the module. For further information see always_quote
See example: CSV_always_quote
default='"'
This changes the escaping character from the default of double quotes (")
The programme uses the Perl module Text::CSV to format the CSV record and file. This option can be passed directly through to the module. For further information see escape_char
See example: CSV_escape_char
default='"'
This changes the quoting character from the default of double quotes (")
The programme uses the Perl module Text::CSV to format the CSV record and file. This option can be passed directly through to the module. For further information see quote_char
See example: CSV_quote_char
default=','
This changes the seperating character from the default of comma (,)
The programme uses the Perl module Text::CSV to format the CSV record and file. This option can be passed directly through to the module. For further information see sep_char
See example: CSV_sep_char
default='-content'
The default is usually sufficient
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see ContentKey
See example: XML_ContentKey
default=XML_ForceArray
This is an important attribute for influencing how your XML data is ultimately structured in the CSV
This option should be set to '1' to force nested elements to be represented as arrays even when there is only one. Eg, with ForceArray enabled, this XML:
<opt>
<name>value</name>
</opt>
would parse to this:
{
'name' => [
'value'
]
}
instead of this (the default):
{
'name' => 'value'
}
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see ForceArray
See example: XML_ForceArray
default=XML_ForceContent
This is an important attribute for influencing how your XML data is ultimately structured in the CSV
When XMLin() parses elements which have text content as well as attributes, the text content must be represented as a hash value rather than a simple scalar. This option allows you to force text content to always parse to a hash value even when there are no attributes. So for example:
XMLin('<opt><x>text1</x><y a="2">text2</y></opt>', ForceContent => 1)
will parse to:
{
'x' => { 'content' => 'text1' },
'y' => { 'a' => 2, 'content' => 'text2' }
}
instead of:
{
'x' => 'text1',
'y' => { 'a' => 2, 'content' => 'text2' }
}
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see ForceContent
See example: XML_ForceContent
default=XML_KeepRoot
This is an important attribute for influencing how your XML data is ultimately structured in the CSV
In its attempt to return a data structure free of superfluous detail and unnecessary levels of indirection, XMLin() normally discards the root element name. Setting the 'KeepRoot' option to '1' will cause the root element name to be retained.
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see KeepRoot
See example: XML_KeepRoot
default=noXML_NoAttr
This is an important attribute for influencing how your XML data is ultimately structured in the CSV
When used with XMLout(), the generated XML will contain no attributes. All hash key/values will be represented as nested elements instead.
When used with XMLin(), any attributes in the XML will be ignored.
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see NoAttr
See example: XML_NoAttr
default=noXML_NormaliseSpace
The default is usually sufficient
This option controls how whitespace in text content is handled. Recognised values for the option are:
- 0 = (default) whitespace is passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation)
- 1 = whitespace is normalised in any value used as a hash key (normalising means removing leading and trailing whitespace and collapsing sequences of whitespace characters to a single space)
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see NormaliseSpace
See example: XML_NormaliseSpace
default=1
This option controls what XMLin() should do with empty elements (no attributes and no content). The default behaviour is to represent them as empty hashes. Setting this option to a true value (eg: 1) will cause empty elements to be skipped altogether.
The programme uses the Perl module XML::Simple to read the XML file. This option can be passed directly through to the module. For further information see SuppressEmpty
See example: XML_SuppressEmpty