Split CSV file

Split CSV file

Description

Split a CSV file in two parts, one containing all entries conforming to user-defined constraints, the other containing all remaining entries. For each line, a decision is made whether it should be accepted or rejected, based on the condition parameters.

It is possible to check against multiple values by specifying an input file containing all possible values in addition to the value specified via the ‘value’ parameter. If the test for a CSV row succeeds for at least one of the specified values, the row is accepted.

Input files

  • at least 1 input file (.csv)
  • optional: values files (.txt)

Output files

  • Accepted entries (accepted.csv)
  • Rejected entries (rejected.csv)

Context



Example

Using the following table as input:

ItemCost
Apple0.25
Banana0.25
Orange0.40
Fruit hip holster10.00

Specifying 'item' as column (or 'Item' since this is case-insensitve), 'contains' as operator and 'a' as value, the following rows will be accepted:

ItemCost
Banana0.25
Orange0.40

Or, if case sensitivity is disabled:

ItemCost
Apple0.25
Banana0.25
Orange0.40

Common CSV file problems

CSV files must be plain text files, using  ,  as the entry separator, and  "  as the optional quote character. The quote character is used to denote a cell if the entry separator is part of the cell content. The first line is expected to represent the table header, all following lines are expected to represent table rows.

Condition

Column

Examples: peptide, protein, defline, scan count, PBC count, Ratio mean, Ratio SD, Ratio RSD, charge, filename

Operand

Choices: contains (default), starts with, is equal to, is not equal to, is less than, is less than or equal to, is greater than, is greater than or equal to

Value

Examples: __putative__gpf_, __putative__orf_, __td__target_, __td__decoy_

Be case sensitive

Choices: yes (default), no

Source code

split-csv-file.rb, split-csv-file.yaml (GitHub)