CSV / Tab-delimited Text files


Prev		Next

A CSV (Comma Separated Values) or Tab-delimited Text (or Tab Separated Values) file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab. CSV or Tab-delimited Text files can be compared to spreadsheets like the ones in Microsoft Excel in that they also have rows and columns. Note that .csv files can be created by Excel.

Take a look at Figure 70, “Tab-delimited Text”. The first row represents the event of a person saying 'so from here'. The first value (as well as the first column of the complete file) represents the tier name, the second and third represent begin time in different formats, the fourth and fifth represent the end time, the sixth an seventh represent the duration and the last value represents the annotation.

Figure 70. Tab-delimited Text

You are able to import CSV or Tab-delimited Text files in ELAN: File > Import > CSV / Tab-delimited Text File.... In the dialog window browse to and select a file that contains CSV or Tab-delimited data and click Open.

The second dialog window contains two sections (see Figure 71, “Import CSV / Tab-delimited Text”). The upper section shows a sample table containing data from the selected file. Both rows and columns are numbered. The lower section enables you to specify which columns to include and what data type they represent. This means that the format of the files is flexible: it is not prescribed what data is expected nor how it is formatted. The numbers of the columns in the Import Options section correspond to the numbers of the columns in the sample table. The data types you can select are:

Annotation
Tier
Begin time
End time
Duration

Select at least one column with data type 'Annotation'. If you select a column for begin time, end time and duration, the latter will be ignored in the import process.

Figure 71. Import CSV / Tab-delimited Text

The option Specify first row of data enables you to exclude a header by excluding the first few lines. The option Specify delimiter lets you specify the delimiter if ELAN did not guess the correct delimiter. The delimiters supported by ELAN are comma, tab, colon, semi-colon and the vertical line (vertical bar).

If you enable the option Default annotation duration ELAN creates all annotations from the selected file with durations equal to the number of milliseconds specified. This option works only if there is no time data or only the begin or end times.

Default annotation duration will create annotation units with the specified duration.

Skip empty cells will leave out the cells in the csv that are empty. Different tiers can be imported with different segmentations with this option.

Combine with template (.etf) allows to import annotations into tiers defined in a template, described in more detail below (Figure 74, “Import tab-delimited text in combination with a template”)

Finally click OK to import the data. If a transcription document was open when starting the import, the imported tiers and annotations will be added to the already open document, otherwise a new transcription document is created with the imported annotations as its contents.

Another example

To demonstrate that the format of the imported file can be flexible, take a look at the following tab-delimited text:

Figure 72. Tab-delimited text, different orientation

In this example each column represents a tier with the tier names in the first row and the annotation in the other rows. This file can be imported by selecting the following import options:

Figure 73. Import CSV / Tab-delimited Text

Note that the Specify first row of data option is set to 2. As a consequence ELAN starts importing annotations from row 2 instead of row 1. Furthermore, ELAN tries to extract tier names from the first line of the file if the column they are part of is specified as 'annotation'. This results in this example in two tiers: K-Spch and W-Spch.

To merge a CSV file with an existing *.eaf file, open the *.eaf file first and then choose Import CSV/Tab-delimited Text File. For information on merging a CSV file that has been imorted into a new document with an existing *.eaf file, please see the section called “Merging transcriptions”.

Import with a template:

Figure 74. Import tab-delimited text in combination with a template

When the Combine with template (.etf) checkbox is selected, the Select... button allows to browse to an ELAN template file (*.etf). The import function will then use the tiers and tier types etc. from the template as the basis for the new transcription and add the imported annotations to those tiers. The matching is based on tier names being exactly the same in the template and in the tier column (as in the first screenshot) or column headers in the delimited text file (as in row 1 of the sample tables in the second and third screenshot).

The import function will try to apply constraints as defined by the tiers and types in the template, but success is not guaranteed. Especially if the template defines many levels of tier dependencies, proper import might fail. Depending on the structure of the delimited text file, the Skip empty cells option may have to be selected or deselected for a successful import.

	Note
	This option is still experimental.