Data Format & Messaging Standards

Basic information about data format and messaging standards
When we talk about data formats here, we are referring to the lowest level of the file structure. In other words, we are looking at the question of how individual values are represented in a file.

There are a few basic types of data formats here:

CSV: Comma-separated values.

Fixed record: Each value has a precise number of characters, allowing it to be located precisely by virtue of its position within the file.

XML: Extensible markup language, in which the values (and even groups of values) are given their own names (tags and attributes).

There are also more complex variations on these – e.g. the values in EDIFACT and X12 messages are separated by several different characters, while the BWA format used in the book trade is based on a mixture of fixed record and CSV. Separating the individual values is just one aspect of the issue, however, since for each more complex format, the

values are arranged into logical groups. With CSV and fixed record, these are typically called records, while EDIFACT or X12 refers to segments (and “loops” as the next-level arrangement of segment groups), and XML calls them complex elements.

Simple examples

CSV:

OH;4711;K0815;20130530

LIN;1;S123;5;9.99

LIN;2;H456;3;17.95

 

Fixed record:

OH 4711 K0815 20130530

LIN0001S123 0005000009990

LIN0002H456 0003000017950

 

XML:

<Order>

<Header>

<OrderNo>4711</OrderNo>

<CustomerNo>K0815</CustomerNo>

<OrderDate>2013-05-30T00:00:00+2</OrderDate>

</Header>

<LineItems>

<LineItem No=“1“>

<ItemNo>S123</ItemNo>

<Quantity>5</Quantity>

<PricePerUnit>9.99</PricePerUnit>

</LineItem>

<LineItem No=“2“>

<ItemNo>H456</ItemNo>

<Quantity>3</Quantity>

<PricePerUnit>17.95</PricePerUnit>

</LineItem>

</LineItems>

</Order>

 

Here, we see the same information three times, in three different formats. As you can see, the order header has the identifier “OH” in CSV, while the line items have the identifier “LIN”. After all, we need some way of identifying the information involved. XML is obviously much more detailed and significantly easier for humans to read – but it also takes up a lot more space. Yet with a decent compression algorithm, this isn’t a serious problem – at least during transfer.

We will look at these formats in more detail over the following pages, but these brief examples will suffice for now.

At its most basic, a messaging standard can simply define more complex data formats – but it can also extend as far as laying down procedural rules concerning transfer paths (for example). Examples of standards that concentrate on the data format (or in other words, on defining particular message types and their exact structure) are EDIFACT and BMEcat. The latter of these in turn uses XML as a data format.

By contrast, ENGDAT is a highly complex industrial standard that contains data formats, but also defines the content of the files (e.g. the metadata – cf. the article on ENGDAT) and sets out rules concerning transfers and even file names.

Example of EDIFACT format:

UNA:+.? ‘

UNB+UNOC:3+Sender ILN+Recipient ILN+130230:1025+1++98765′

UNH+1+ORDERS:D:96A:UN’

BGM+220+9′

DTM+4:20130230:102′

NAD+SU+++Hardware Provider+1 Sample Street+Nowhere+NRW+54321+DE’

NAD+BY+++Lobster:GmbH+Münchnerstr. 15a+Starnberg+BAV+82319+DE’

LIN+1++4711:SA’

IMD+++::USB Stick’

QTY+1:100′

UNS+S’

CNT+2:1′

UNT+10+1′

UNZ+1+98765′

 

In this example, the company Lobster (the identifier BY stands for buyer) ordered 100 USB sticks (item number 4711) from the company Hardware Provider (SU = supplier) on February 30, 2013.

As an aside: In 2013 in Fulda, Germany, a certain genius really was arrested for having multiple fake IDs on which he claimed that his birthday was February 30.

The example above shows an order (ORDERS) in EDIFACT format version D 96 A. This is the earliest of two versions, which was issued in 1996. As the managing body for EDIFACT, the UNECE typically issues two versions (A and B) each year, and sometimes even a third (which is called – surprise, surprise – C). Generally speaking, not every message changes in each new version, and the changes are often backwards-compatible. The defined message types cover almost everything you will ever need in electronic data interchange – from AUTHOR (authorisation message) to WKGRRE (work grant request message). We will go into more detail in the chapter on EDIFACT.

The ENGDAT procedure

ENGDAT is a defined workflow for exchanging technical documents, used primarily in the automotive sector. Because the files involved are generally CAD files in which it is impossible to integrate information about the people involved and so on (as we can in an EDIFACT message), the meta-information is packaged in separate files that are then sent alongside the CAD file. This even includes address and contact information. These files are called ENGPART messages. EDIFACT and XML can both be used as the format for all this additional information, although XML is the more legible option (just compare the examples above!) and the one used in the latest versions (ENGDAT v3, ENGPART v4). Transfers take place via OFTP. This is not mandatory, but it has become standard in practice.

Other messaging standards

X12 works in a similar way to EDIFACT, in that it provides its own unique format (a more complicated form of CSV with multiple separating characters, each with a different meaning – see above). This standard is a kind of precursor to EDIFACT, but is still in use – though primarily outside Europe. And much like EDIFACT, it also defines large numbers of different message types.

By contrast, Fortras and older VDA messages use a pure fixed record format, though the number of individual messages is restricted. You can find out more about these standards in their respective articles.

Conclusion

As you can see, an EDI program needs to be able to deal with many different messaging standards and data formats. The bigger the selection it can handle, the better. Even if you only have simple CSV files in your queue, this can quickly change when you start dealing with new business partners (for example). As such, an EDI system should offer the widest possible selection of formats.

By way of example, here is the list of options included with the data management software Lobster_data:

As we have already said, behind CSV, XML and fixed record there is a whole host of different formats and messaging standards, all of which rely on one of these basic formats. And yes, Excel is (unfortunately) still also frequently misused in data interchange. Incidentally, SAP IDocs are also fixed record files in their older form, though they have a very special structure, which is why we have treated them separately. More recently, these documents have also become available in XML format. You can find out more in the article on IDoc.

One good solution for integrating all these data formats is Lobster_data