Comma-Separated Values (CSV)

CSV files are commonly used as export and import formats for spreadsheets and other software applications. While they do not contain the formatting used on a spreadsheet, they do contain the two-dimensional data.

CSV Characteristics

Anyone familiar with spreadsheets has probably had to import from or export to a CSV file at one time or another. The CSV file format is very popular for conveyance of two-dimensional data between software applications. Unfortunately, what is considered a valid CSV file varies among applications and systems. This variation can cause incompatibilities between software applications and often considerable frustration.

Aligni Practices and Policies Regarding CSV

CSV is widely considered a simple, common format to exchange tabular data with spreadsheet applications such as Microsoft Excel and OpenOffice. It is strictly a two-dimensional format: rows, columns, and cell contents at each intersection. It is not intended to represent three or more dimensions and makes a poor choice when forced to do so.

Many users often attempt to capture multi-dimensional data by creating additional columns (e.g. Phone1, Phone2, Phone3). It is Aligni’s policy to not encourage or support this functionality. Instead, we direct you to the Aligni API to export multi-dimensional data and manipulate it however your application requires.

The official specification for CSV file formats is RFC 4180. We’ve summarized some of the main points below.

Definitions

  • CRLF – In the notation below, “CRLF” refers to a line break or carriage return / line feed. Note that UNIX (including Mac) and Windows systems have slightly different standards for CRLF. Luckily, this is often taken care of at the browser level when uploading or downloading a CSV file.
  • Record – A record in a CSV file corresponds to a single row in the source data or spreadsheet and contains one or more fields. In many cases, it will also correspond to a single line of text in the file, but it’s important to note that CSV does support embedded CRLF or line breaks within a field.
  • Field – A field in a CSV file corresponds to the contents of a single row/column location in the source data or spreadsheet.

Rules

  1. Each record is located on a separate line, delimited by a CRLF.
  2. Each record must contain the same number of fields throughout the file.
  3. Spaces are considered part of a field and are not ignored.
  4. Each field may or may not be enclosed by double quotes. IMPORTANT NOTE: Microsoft Excel does not use double quotes at all. This may lead to incompatible CSV files that must be edited prior to importing into Aligni.
  5. Fields containing CRLF, double quotes, and commas must be enclosed in double-quotes.
  6. Double quote characters within a field must be “escaped” by including an additional double quote character before them.

Importing Booleans

Aligni considers true (case insensitive) to be in the affirmative for any imported booleans. Any other value will imply false including: false, no, Yes, Tru, true43, Harold, and so on. Aligni will always export true or false (lower case) when exporting booleans.

File Encoding

The term File Encoding or Character Encoding refers to the set of characters that are allowed to be in the file. This is established so that any software reading or writing to that file can operate with the contents. Common examples are: ASCII, ISO-8859 and Unicode. ASCII is a character set common on American computers but is limited to English characters.

Microsoft Excel

Microsoft Excel exports CSV files in ANSI encoding. While ANSI itself is not a character set, it often refers to one of several ISO-8859 encodings which are locale-specific. Excel is supposed to save files in the machine-specific ANSI encoding, but doesn’t always do so due to bugs.

One workaround is to open the resulting CSV file in Notepad, then click “File > Save As” and select UTF-8 as the encoding format.

Importing CSV Files into Aligni

When importing a CSV file, Aligni will attempt to import the file in UTF-8, ASCII, and then ISO8859-1, in that order.

Example

An example CSV file is shown below along with the table form of its parsed contents to illustrate how Aligni will interpret this file. A few points are notable of the contents of this example:

  • The description for part 100123 includes a comma so the description field must be enclosed in double quotes.
  • The description for part 100124 includes a CRLF so the description field must be enclosed in double quotes.
  • The description for part 100125 includes a double quote. Therefore, the quote character is replaced by a two double quotes. This is called “escaping” the double quote.
Part Number,Manufacturer,"Manufacturer P/N" CRLF
100123,Texas Instruments,TPS73218DBV,"1.8 volt regulator, 250 mA" CRLF
100124,Analog Devices,AD860,"Something
Second Line Description" CRLF
100125,Consolidated Diversified,3055BK-3IN,"4"" wire segment" CRLF
PART NUMBERMANUFACTURERMANUFACTURER P/NDESCRIPTION
100123Texas InstrumentsTPS73218DBV1.8 volt regulator, 250 mA
100124Analog DevicesAD860Something
Second Line Description
100125Consolidated Diversified3055BK-3IN3.25″ wire segment

Aligni CSV Usage and Multidimensional Data

CSV has become a very common way to exchange tabular or “tabular-like” data between software applications. It is a familiar and adequate way to represent mostly text in tabular form since it works reasonably well with spreadsheet applications like Excel and, well, who isn’t familiar with Excel? We consider CSV suitable for two-dimensional data sets but draw the line at exporting or importing multi-dimensional data.

For example, consider that a part record has a single manufacturer and that manufacturer has distribution at multiple vendors. If we were to export (or import) data with the vendor information, we would need to represent that in a third dimension (rows are the part records, columns are the associated record attributes). There are several ways to hack this third column into a spreadsheet and we’ve seen them all. It is common, for example, to add “Vendor 1”, “Vendor 2”, “Vendor 3” to the columns. But where does this madness stop? It stops at what most people would consider a reasonable N, until someone says they need N+1. Another solution would be to put all the vendors in a single cell and separate them by a newline. Which method is right? (Hint: they’re both wrong — CSV is not suitable for multidimensional representation)

So our policy is to take a stand against multidimensional data representation in CSV. There are better representations available for that. Our Aligni API is the best way to transfer multi-dimensional data between Aligni and another software application and we encourage that.