UN/EDIFACT

What does it stand for?
EDIFACT describes a data format, or a method of constructing messages. And there are an awful lot of those: Version D12B defines almost 200 different message types. What do we mean by version D12B? D12B stands for the Directory of message types as of the year 2012, edition 2. Generally speaking, two editions are issued each year in which new messages are added and existing ones are modified (which usually means expanded), and these new editions are called A and B. In 2001, however, there was also a third edition (called C). Conveniently, the message specifications in each version have been made generally available for download from the UNECE website. OK, so they come in a dreadful text format, but all the same, they’re free of charge.

Alternatives - aka subsets

The UNECE issues the standard for EDIFACT, in which it defines messages for all kinds of different sectors and fields of application. That’s all well and good, but as things stand, it’s also possible for everybody to just do their own thing. They do so by means of subsets, which draw on the same elements as EDIFACT, but focus exclusively on those that are relevant to particular industries or user groups. The selection of message types used – as well as the structure of those messages – is stripped back to what specific users consider to be sufficient for their needs. Examples of these subsets include EANCOM (used in the consumer goods industry), EDIFOR (haulage), EDIWHEEL (the tyre industry) and Odette (created by the trade association of the same name and used in the automotive sector). At least in theory, the messages in these subsets should be structured in such a way that they can just as easily take their place within the structures of the UN standard. In other words, segments, segment groups and even fields can be omitted – but only if they are not mandatory within thestandard. That said, it is possible to define your own qualifiers. But more on that later.

General structure

How are EDIFACT messages structured? Here is an example:

UNA:+.? ‘

UNB+UNOC:3+Sender ILN+Recipient ILN+130230:1025+98765′

UNH+1+ORDERS:D:96A :UN’

BGM+220+9′

DTM+4:20130230:102′

NAD+SU+++Hardware Provider+1 Sample Street+Nowhere+NRW+54321+DE’

NAD+BY+++Lobster:GmbH+Münchnerstr.15a+Starnberg+BAV+82319+DE’

LIN+1++4711:SA’

IMD+++::USB Stick’

QTY+1:100′

UNS+S’

CNT+2:1′

UNT+11+1′

UNZ+1+98765′

The bottom level

Let’s unpick the whole thing from the bottom up. Here, you can see individual lines, each of which contains a “segment” – though the lines only serve to aid legibility, and are not compulsory. The entire message could just as easily be written in a single line, or in blocks of text 80 characters wide (which is very popular among AS/400 or IBM iSeries users). In that case, the message would appear as follows:

UNA:+.? ‘UNB+UNOC:3+Sender ILN+Recipient ILN+130230:1025+98765’UNH+1+ORDERS:D

:96A:UN’BGM+220+9’DTM+4:20130230:102’NAD+SU+++Hardware Provider+1 Sample Street+N

owhere+NRW+54321+DE’NAD+BY+++Lobster:GmbH+Muenchnerstr.15a+Starnberg+BAV+82319+

DE’LIN+1++4711:SA’IMD+++::USB’QTY+1:100’UNS+S’CNT+2:1’UNT+11+1’UNZ+1+98765’

Looks ugly, doesn’t it? But it’s just as valid. This is because an apostrophe (‘) appears at the end of each segment, which is an agreed end-marker (and part of the standard). That makes line breaks completely unnecessary. As you can see from the example above, it’s even possible to split values. It makes no difference at all. Any software capable of reading EDIFACT will simply put the values back together again (or at least, it should…).

So what is a segment? Each segment consists of a segment identifier (UNA, UNB, UNH, and so on) and values. The identifier is always made up of three letters, and tells you what the value is. For example, QTY stands for QUANTITY, while NAD is short for NAME AND ADDRESS. Segment identifiers beginning with UN have a special meaning, and do not depend on the message type. The most important of these general segments are: UNA, UNB, UNH, UNT and UNZ. We will explain these individually later on.

The values within a segment are separated by two different characters; in the example, these are the plus sign and the colon. Why two different characters? Because segments contain not only fields, but also things called Composite elements, or Composites for short. These are made up of multiple fields that belong together logically. For example, in the UNB segment, the date and the time of the message are combined into a composite: 130230:1025 – or February 30, 2013 (yes, that isn’t a typo) at 10:25. Standalone fields are separated from each other and from the composites using a plus sign, while fields within a composite are separated with a colon. If a field has no value, but is followed by another field or composite, then the necessary separators are included in order to preserve the order. A good example of this can be seen above in the IMD segment: IMD+++::USB Stick. Here, only the third field of the composite contains a value, and the two fields before the composite are also empty. If nothing else appears in the segment or composite after a particular value, then the remaining separators can be omitted. An example of this would be:

NAD+SU+12345′

If one of the separators itself happens to appear within a value, it needs to be escaped in order to pass as a normal character. In the standard, this is done with the help of a question mark, so that a company called “Reibach + Sons” would become Reibach ?+ Sons. And if the question mark itself appears within a value, it is also escaped using itself: “Understood??”

So much for the lowest level of the data format.

The general structure of an Edifact file

Let’s now go back to the very top and look at the structure of each EDIFACT file. Certain segments will appear in every file, regardless of what that file relates to. These general segments are also known as service segments, and they form the wrapper for every EDIFACT file or message:

UNA (optional; in case other special characters are used than those specified in the standard – see below)

UNB (mandatory; appears once at the beginning of the file)

UNG (optional; wraps around a group containing multiple messages of the same type)

UNH (mandatory; once per message, n times within a group or file; header information)

Within an individual message

UNT (mandatory; end segment to UNH, therefore used once per UNH)

UNE (optional; end segment to UNG, therefore used once per UNG and mandatory whenever UNG is used)

UNZ (mandatory; end segment to UNB, appears just once at the end of the file)

Structure of individual messages

And now we come to the structure of an individual message. Unfortunately, this isn’t , entirely straightforward.

The message type determines which segments are used and what order they appear in; however, the same kind of segment can appear in multiple places within a single message, and can take on different meaning depending on the context. As an example of this, we will look at the APERAK message in version D96A. APERAK stands for “Application Error and Acknowlegement”. What can we say? It isn’t always easy when you have to reduce message names to exactly six characters…

We have chosen this message purely because it has a very simple structure that fits into one small diagram. It’s used to tell the sender of the data if there are any problems with the content or technical information in their file, or to confirm that everything is OK.

You can see the service segments UNB, UNH, UNT and UNZ (UNG/UNE aren’t really relevant to the content and have therefore been omitted), as well as the actual message between those segments (from BGM onwards). BGM stands for “beginning of message” and is mandatory in every message. The red star in the box flags it as a mandatory segment.

We just mentioned “segment groups”. The image shows SG1, SG2, SG3, and inside SG3, SG4 (SG standing for segment group). The tree structure lets you see clearly which group contains which segments or sub-groups. Much like an individual segment, a segment group of this kind can, under certain circumstances, appear multiple times within a file. In fact, this is always the case for segment groups – otherwise there wouldn’t be any need for them. The green number 1 in the image show which segments appear only once, and which segments or groups can appear multiple times in the file. What makes EDIFACT somewhat complicated is the rule stipulating that the individual segments can appear in particular positions in each file. Let’s take the RFF segment as an example. This contains references, e.g. to other documents. In principle, an RFF can appear immediately after the BGM; this is the RFF that appears in SG1. If this is followed by an NAD then we know that we have moved on to SG2. However, if another RFF appears after that, it can’t be the one in SG1 – rather, it has to be the one in SG4. And that in turn means we need another ERC between the NAD and the RFF, as otherwise the file will be broken. Complicated, isn’t it? Let’s take another specific example, but strip it back to the segment identifiers and add a few new segments:

UNB

UNH

BGM

DTM Global date and time information (DTM = Date and Time) for this message

RFF This is the RFF in SG1, containing references from this message to other documents (e.g. an order number)

DTM The DTM in SG1, e.g. date and time information from the external document to which the reference is made (an order placed on …)

RFF The RFF in SG1 once again – in other words, SG1 is repeated

DTM The DTM in the second iteration of SG1

NAD With that, we’ve arrived at SG2, containing the address details and contact information

NAD And this is also repeated, albeit only the NAD segment

ERC Now we’re in SG3 (error information)

FTX Free text containing information about the error

RFF And this RFF is now the one in SG4

FTX This free text belongs to the reference in SG4

FTX And so does this one

UNT

UNZ

If we took out the ERC segment, then none of the segments after it would be able to appear either – we would skip straight to the UNT. We can see this on the screenshot too: SG3 is optional, but if it is used, then the ERC segment has to appear too. This is a general rule – the first segment in each segment group is mandatory when the group is being used. There is a very simple reason for this: namely, without an ERC, we wouldn’t know that SG3 was coming next.

Consistency within versions and developments between them

Although EDIFACT files have a very complicated structure, there is some good news: the individual elements (the segments and composites) are always built in the same way. In other words, an RFF segment will always look the same, irrespective of what type of message it appears in. And even a field with a specific number (such as 1153 – fields are numbered in EDIFACT) will always have the same type, the same maximum length and the same meaning. Of course, the structure of segments and composites is expanded from version to version, and a field can become longer in the process (for example) – however, within a single version, an RFF or an NAD will always look the same. Even a C506 used in multiple segments will look the same everywhere it appears.

Here is a direct comparison between the APERAK message in versions D96A and D10B, each showing the structure of the first RFF segment:

APERAK D96A and D10B and the differences between the RFF segments, as displayed in Lobster_data

As you can see, another segment group has now been shifted to before the references, which means that RFF now introduces SG2. In addition, field 4000 (reference version number) has been replaced by fields 1056 (version identifier) and 1060 (revision identifier). Incidentally, these also appear in the BGM segment and in the newly added DOC segment in the later version of SG1, and carry the same meaning, albeit in a different context. This explains the additional #2 numbering – a unique feature of this converter. The “F” and “D” letters that appear before the actual field names are also connected with this special software.

General service segments

UNA: Defines the special characters

Specifies the following, in exactly this order:

1 Separators between values within a composite.

2 Separators between standalone values and whole composites.

3 Decimal separators.

4 Escape characters for special characters that appear within values themselves (example: for a company called “Reibach + Sons”, a? would need to appear before the +, giving Reibach ?+ Sons. Otherwise, the + would be interpreted as a separator).

5 Although the space character was still reserved in EDIFACT syntax V3, in EDIFACT syntax V4 a “repetition character” can stand for a new feature, which fortunately isn’t used.

6 The end-of-segment character.

The characters :+.? ‘ are standard; if these are used in a file then the UNA segment can be omitted.

UNB: Interchange header (file)
The most important global information about the file as a whole, such as the character set (see below), sender, recipient, date and message reference. In the example, all mandatory fields have been filled:

UNB+UNOC:3+Sender ILN+Recipient ILN+130230:1025+98765′

1 UNOC specifies the character set, or in other words, the characters that can be used in the document.

2 The 3 refers to EDIFACT syntax version 3, though there is already a version 4 by now (which we don’t need to go into here).

3 Then come the identifiers for the sender and the recipient. These could also simply be company names.

4 Next, the date and time of the message.

5 At the end, we have a reference number for the transfer, which is also repeated in the UNZ segment. This uniquely identifies the file to the parties involved, so it should only be used once between those parties.

UNH: Individual message header

Serial number and basic information about the individual message.

UNH+1+ORDERS:D:96A:UN’

1 This is the first message in the file or group (the 1 is repeated in the corresponding UNT).

2 The message is an order (ORDERS)

3 The message is written in the format version D96A and complies with the UNECE standard rather than a subset such as EANCOM

UNT: End of individual message

The total number of all the segments in this message, including UNH and UNT, and the serial number of the message that was given in the corresponding UNH.

UNZ: End of the entire file

The total number of messages in this file and the transfer reference number, which is also given in the UNB.

The totals and reference numbers in the UNT and UNZ segments are for monitoring purposes. They allow you to quickly spot whether individual segments or entire messages are missing, or have been incorrectly added by mistake.

BGM: Beginning of a message

Here once again, we find details about the message type (220 = order) as well as some additional information (e.g. whether the order should be withdrawn, and so on).

Qualifiers

There is one urgent topic that we still need to discuss: qualifiers, which are essential in EDIFACT. Firstly, they specify the subject of a particular segment (i.e. what the information contained in it related to); secondly, they determine the format in which particular values are presented; and thirdly, they set out special code lists, without which certain values are meaningless.

For an example of two of these qualifiers, let’s look at the DTM segment:

DTM+4:20130230:102′

The first qualifier is the 4 in front of the date. It means that this segment relates to the date (and perhaps also the time stamp) of an order. The 102 at the end states that the value is given in CCYYMMDD format – i.e. four digits for the year (C stands for century) and two digits each for the month and the day.

The NAD segment also contains two examples of code list qualifiers. In field 3039, we enter an ID for the party involved – e.g. the ILN of the company’s business partner. And in the two subsequent fields 1131 and 3055, we can add code list qualifiers:

Possible codes for field 1131

Conversely, field 1131 once again contains a qualifier specifying the subject of the data.

We can see this in the first example on page 52:

NAD+SU+++Hardware Provider+1 Sample Street+Nowhere+NRW+54321+DE’

NAD+BY+++Lobster:GmbH+Münchnerstr. 15a+Starnberg+BAV+82319+DE’

SU stands for the Supplier (or to give it its official definition: “Party which provides service(s) and/or manufactures or otherwise has possession of goods, and consigns or makes them available in trade.”),

BY stands for Buyer.

Character SET

EDIFACT files can also appear in various character sets. The character set specifies the characters that can be used within the file, and is included in the UNB segment.

The most important character sets are:

  • UNOA: Only capital letters, numbers, spaces and a few special characters such as =
  • UNOB: The same as UNOA, but lower-case letters and a few more special characters are also allowed
  • UNOC: In principle, everything that can appear in the Latin1 character set, aka ISO-8859-1.
  • UNOY: The character set that can be displayed with Unicode – so any character, essentially.

In addition, various Latin-X character sets such as Arabic and Hebrew are represented by UNOD through to UNOK.

So far, so clear. But there’s a catch:
because a system processing an EDIFACT file won’t necessarily know which character set to expect, it needs to read this information from the UNB segment. By the time you are using UNOY and receiving files in UTF16 – i.e. coded in 2 bytes – things get very tricky. For that reason, it has been specified that the UNA segment (if present) and the UNB segment must always be ASCII-coded, even when the rest of the message after the BGM is in Cyrillic or in UTF16. That’s how it goes with globalisation: business is booming, but there are plenty of stumbling blocks too …

Level of freedom

Last of all, we have one more important point to make: EDIFACT offers an incredible level of freedom, from the qualifiers right up to the question of what information to package in which segment and where to place it in the message structure.

One example of the many available qualifiers is the NAD segment. We can use the qualifier BY for the buyer; however, we could also just as easily use BS (Bill and ship to), or CN (Consignee), or probably even one of a few other options. Things don’t look much better for the supplier either. And as already noted in the article on in-house EDI systems, we can also position the full address information in a number of different places on the same message. The NAD segment can exist in multiple places within a single message type, most of which represent multiple levels (such as document, line item, detail, etc.), and there are an equal number of opinions out there regarding what addresses can or should be placed where. This often depends on the capabilities of the systems from which the data are derived. If a system only recognises address information at the level of the delivery call-off, then the addresses can only meaningfully be specified at the highest level.

On the other hand, if the system offers the option of adding a separate address to each line item, this option will presumably be taken up and represented through the issue of a DELFOR.

As such, it is crucial that both of the parties involved in an EDI process come to an agreement on exactly how the relevant message format should be used. At the very least, there should be exact information on how the data are prepared or expected, as otherwise it will be impossible to cleanly implement the process. In practice, however, parties are often forced to rely on a small amount of example data and a process of trial-and-error – which may be hugely time-consuming (and frustrating), but generally works out fine. In cases like these, it is important to work with a software package that is easy to operate and offers comprehensive testing facilities in order to support the user in setting up connections.