Internal paths/sources/sinks

General information
EDI systems are used to enable internal and external systems to communicate with one another, while EAI systems focus on internal systems, and ETL/ELT almost exclusively takes place between databases. All three concepts need options for providing your in-house systems (such as databases and ERP systems) with data, or receiving data from them.

Without staking any claim to completeness, we will now present to you at least the most important of these system types and communication forms. There are definitely other options out there, but this list should cover well over 90 percent of what is available.

And let’s not forget that we can communicate with our own computers in the same way as with others out there on the web, so we can add most of the communication methods covered under “Public paths” to this list too (although in practice, nobody would use OFTP to connect two computers on their own network…).

Filesystem
The file system is of course the most easily accessible option, especially the local one. We can make data available for processing here and store the results afterwards – provided, of course, that the different systems communicating with each other all have access to the same file system.

However, not all relevant programs will run on the same computer in either EDI or EAI. If they did, they would impede each other’s performance – not to mention the risk that all business-critical processes could be knocked out in one fell swoop with the loss of a single computer.

One solution for this is a SAN, or Storage Area Network, where all your important processes can save their data. This generally also offers features like high availability, etc.

However, there is also a smaller version in the form of NAS – Network Attached Storage. This is none other than the good old file server, accessed by means of suitable protocols such as NFS or SMB – only the file server itself and attempts to access it are scattered all over the company network.

The local file system scarcely merits any closer attention here, and building a SAN would be too big a topic for us to address here. What we are primarily interested in are the requirements that an EDI/EAI software package needs to meet in order to access drives on remote computers.

NFS

NFS is the Network File System from the world of Unix. It’s fairly antiquated these days, and for historical reasons it doesn’t offer especially high security standards. However, the current version 4 represents a significant improvement on the widely used version 3, and has gained a lot of ground on this front.

If you are operating in a Unix/Linux environment then NFS can make your life very easy. Simply by mounting the relevant network drives (ideally automatically while booting), you can access them through your software just as easily as a local partition or directory. In fact, Windows servers can also handle NFS – only this option is rarely used. If the remote computer runs on Windows, you can instead use Samba to mount its shares. This differs little from the equivalent process on Unix/Linux (aside from a few minor operational differences – but that takes us on to SMB).

In other words, if your EDI/EAI system runs on Unix/Linux, that’s all it needs to do; strictly speaking, it won’t even be able to tell that one directory is local while another is stored on a server in the basement. From the point of the view of the application, access is identical whether the file name is /user/local/data/File.txt or /mount/fileserver/commondata/File.txt.

SMB

SMB is what Windows mainly uses to create its network file system. Here, things are a little more complicated than with NFS mounts. You are probably already familiar with the famous connected network drive. It is typically represented using letters from the end of the alphabet (in particular the letter T), with the T: drive pointing towards a shared directory on a remote computer somewhere. This is a neat solution for an office computer where the user logs in, the network drives are connected (usually automatically), and everything is then available ready for them to work with.

For server processes, however, which generally run as a Windows service with no user login, this is unworkable. Drive letters like T: always relate to the logged in user, and do not exist for processes that launch automatically when you start up your computer. Another technique is required here – and you will already be familiar with it from your file manager (aka Windows Explorer). You specify the server and the share name of the directory you are looking for directly, a little like this:

\\servername\sharename\path\File.txt

This UNC notation allows you to reach every known computer on the network and access all their shared directories. (Incidentally, this also works for Unix, although standard slashes/ are used instead.)

However, we still need a user authentication for SMB. Whereas NFS (or Samba) mounts are completed during boot and identified accordingly (something hardly worth mentioning in NFS version 3), a Windows computer will expect users to authenticate themselves whenever they want to access the shared drive.

One way of solving this problem is to run the software in question on the user account of a domain administrator. That way, it would provide your user’s identity whenever it accessed the UNC path. If that identity is known on the remote computer too, it will grant the software access. However, not all system and network administrators will consent to allowing some random program to run in the context of such a powerful user.

The alternative is for the software to log in as a specified user during each access attempt.

The minor difference

In principle, programs should be able to access shared directories on remote computers as easily as local drives. Yet there are still a few limitations. The simplest of these is that the server on which the required directory is saved might not be running at the moment, or might be inaccessible due to a network fault; however, this can just as easily happen during access via FTP (for example).

Another limitation is that you shouldn’t fall for the illusion that you are interacting with a local directory. Not only is the access time slower, since everything is running over the network, but you also can’t rely on the same mechanisms that make your life easier when working with local drives.

Let’s take file events as a simple example. You may already be familiar with these. Imagine you’ve opened Windows Explorer; some process saves a new file or deletes one, and you see the change immediately without having to refresh. The event (file added/deleted) is sent into the system as a message, and the Explorer reacts to it instantly and updates the display. This process is far less reliable over the network, however – perhaps because we don’t want to overload the network with these kinds of minor events. Yet the downside is that your software is suddenly no longer able to rely on this mechanism whenever it interacts with a network resource.

Utility for EDI/EAI

Local or network file systems are a simple solution for exchanging data from EDI/EAI systems with other processes; however, we shouldn’t forget that although we can provide the relevant system with a file to process, the system still needs to check for itself whether there is any new work (since file events are unreliable, as we have seen). That means the various recipients need to regularly search “their” directories for new files in a similar way to FTP data interchange – though at least in the in-house network we don’t need to worry about intercepted or manipulated transfers.

Some simpler software packages offer only one such data interface – or at least allow you to program one. In that case, you have no other choice. However, if you have access to other ways of ensuring that delivered data go into processing straight away, you should take advantage of them.

SAP ALE and RFC
There are plenty of functions (or function modules, in SAP terminology) that can be called up externally, and if there’s something you need that doesn’t come with SAP, you can program it yourself with the help of SAP Consultants (and there are plenty of those too). Of course, you might already have a specialist employed at your company.

We have already mentioned two of these functions above:

  • IDOC_INBOUND_ASYNCHRONOUS can be used to deliver an external IDoc into SAP (such as a new order).
  • IDOC_OUTBOUND_ASYNCHRONOUS is what SAP uses to issue an IDoc that will then be converted and transmitted by your EDI software (for example).

There are other important RFCs beyond these, such as RFC_READ_TABLE, which you can use to quickly retrieve a few pieces of data from an SAP system. This is capable of a wide range of things, and you can also use it to find out what IDoc types are available in your SAP system, or what extensions to these are known. You can also use it to query descriptions.

Yet RFCs don’t exist solely within the SAP system itself. Connected third-party software (such as an EDI or EAI system) can also provide RFCs, which in turn can be accessed by the SAP system. In other words, this type of communication can run in both directions.

RFC

There are plenty of functions (or function modules, in SAP terminology) that can be called up externally, and if there’s something you need that doesn’t come with SAP, you can program it yourself with the help of SAP Consultants (and there are plenty of those too). Of course, you might already have a specialist employed at your company.

We have already mentioned two of these functions above:

  • IDOC_INBOUND_ASYNCHRONOUS can be used to deliver an external IDoc into SAP (such as a new order).
  • IDOC_OUTBOUND_ASYNCHRONOUS is what SAP uses to issue an IDoc that will then be converted and transmitted by your EDI software (for example).

There are other important RFCs beyond these, such as RFC_READ_TABLE, which you can use to quickly retrieve a few pieces of data from an SAP system. This is capable of a wide range of things, and you can also use it to find out what IDoc types are available in your SAP system, or what extensions to these are known. You can also use it to query descriptions.

Yet RFCs don’t exist solely within the SAP system itself. Connected third-party software (such as an EDI or EAI system) can also provide RFCs, which in turn can be accessed by the SAP system. In other words, this type of communication can run in both directions.

ALE

ALE is a kind of intermediate layer between RFCs and the outside world. Any third-party software or other SAP installations that exchange IDocs with an SAP system will rely on ALE. ALE masks some of the intricacies of the actual RFC calls that it carries out. And in many cases, the third-party software will be an EDI or an EAI system.

SAP itself refers to “translators” that talk to the SAP system via ALE on one side, and communicate with another piece of software on the other. These translators need to be able to:

  • automatically integrate structure descriptions from IDocs into their own structure definitions;
  • accept IDocs from the SAP system and interpret the information according to their own structure definitions;
  • provide adequately powerful mapping functionality (i.e. they need to be able to handle the conversion between different data formats);
  • hand over the generated IDocs to the SAP system.

In other words, they need to be able to communicate reasonably well with the SAP system and also handle the basic functionality of an EDI/EAI software package. If you are looking for a software package of this kind and you run an SAP system (or may be required to run one in future), you should make sure that the software is capable of handling SAP ALE and RFC and that it meets the above criteria.

SAP already offers whole libraries full of routines that programmers can use on any software that needs to utilise these two interfaces. These SAP Connectors are available for a number of different programming languages, including Java and .NET, which are used in turn by the EDI/EAI software manufacturer.

You can obtain an extremely detailed (and fairly technical) account of ALE from SAP, including instructions for programming on the SAP side.

Databases
Generally speaking, databases don’t just stand around doing their own thing; rather, they form the foundation for all the different systems that run within a business. From bulky ERP systems to small payroll programs, there is scarcely any program nowadays that still stores its data in a file system. Instead, most software makes use of the capabilities of a database system. Some programs place very particular requirements on the database, or only work in conjunction with a certain product, while others are more flexible.

The state of the art today is still the RDBMS, or Relational Database Management System. These are databases that are interrogated using SQL (Structured Query Language). There are also other approaches available, such as object-oriented databases, but these lead something of a niche existence. Relational databases can be used for just about

Access

In order for a program to manage its data in a database system, there needs to be an interface between the two. On Windows systems, this is usually ODBC (Open Database Connectivity), which is also available on Unix/Linux. A platform-independent alternative is JDBC (Java Database Connectivity), although any software using this needs to be written in Java. Nowadays, any serious database will offer both ODBC and JDBC drivers. Programs can also often access databases via APIs (Application Programming Interfaces), although this requires very close integration between the software and the database, of a kind that really ought to be avoided these days. In theory, if you use the standard ODBC and JDBC interfaces, it should be easy to swap out the database system without disturbing the software connected to it. In theory.

In practice, however, databases don’t only support standard SQL; rather, they always come with specific features of their own, ranging from minor differences in the notation of the SQL statements right up to the call syntax used in stored procedures (these are small programs written directly within the database system in a language specific to the database, and which are designed to handle useful tasks). If your software uses any of the specific features of a particular database system then you can forget about any straightforward swaps. However, none of this has much to do with EDI or EAI (or eve

Databases and EDI/EAI/ETL

EDI/EAI/ETL software is designed to link together many different systems, so it shouldn’t come with any special requirements in terms of the databases it accesses; rather, it should be capable of accessing as many different database systems as possible. Of course, these EDI/EAI/ETL programs can also store certain things in their own database tables – but because their day-to-day work requires them to be highly flexible, it would be strange if they placed any special requirements on their own data.anything, and they are unbeatable when it comes to storing and managing enormous quantities of data.

In other words, the systems we are talking about here need to be able to handle ODBC and/or JDBC. If need be, we can also stick on some additional functionality to deal with particularly idiosyncratic databases that might only be accessible via API, but in any case the standard options need to be covered.

The tool has to adapt to suit the circumstances within your business, and not the other way around.

However, that doesn’t mean you can cheerfully go about connecting your newly purchased EDI or EAI software to databases belonging to all kinds of different programs, and writing whatever data you like into them. Nothing can go badly wrong as long as you restrict your software to merely reading data; yet more complex programs (let alone something as big as an SAP ERP system) will not take kindly to any external software messing around with their tables. There are simply far too many dependencies to take into account, and you can very quickly end up breaking something. You should use alternative interfaces for this instead, if they are available. You should also make sure you know exactly what you are doing when you set up your ETL processes.

Yet despite all that, direct access to databases is still very useful for simple lookups (such as obtaining an address to go with a customer number). And because you might have more than one database system in your company, you shouldn’t make those systems too difficult to access. If you find yourself taking half a day to gain access to an additional database, or if you end up having to reconfigure every last detail whenever you define a new process, then you know something is wrong. It should also be possible to access several different databases with a single process if the data you need is stored across multiple databases.

Once you’ve accessed the database system, you then need to deal with the tables. That can be straightforward or complicated, depending on your software. The more support your software offers with this, the more you can concentrate on actually defining your EDI or EAI process.

One last thing: when we talk about databases, we are often talking about mass data. This involves moving around entire lists of product master data, and hopefully large quantities of customer master data too; it involves orders, invoices, delivery notes and much more besides whizzing in and out every second, and being either written directly to the various databases or read from them, or at least requiring huge numbers of lookups to be carried out. And all this needs to run at high performance. Databases are usually streamlined to guarantee performance; however, the program accessing them needs to be able to cope with these torrents of data too. Millions of records need to be read, processed and saved effortlessly.

Conclusion

Let’s quickly summarise what EAI, EDI or ETL software (or software providing all three) needs to offer when it comes to databases.

  • Access to (almost) any database via ODBC and/or JDBC.
  • Quick and easy connections to new database systems.
  • Straightforward access not only to database systems, but also to the table structures inside them.
  • Simple methods for reading and saving significant quantities of data, as well as for rapid lookups and smaller. insert/update statements – not to mention stored procedure calls.
  • High-performance processing of enormous volumes of data.
  • The ability to collect data from different databases and process them together.

Even if you don’t anticipate any initial need for direct access to databases, you should still make sure that any product you choose meets the above requirements. If you don’t, and later find that you need to carry out one of the tasks set out above, then you might come to regret your decision. What’s more, databases can sometimes prove extremely useful in EDI and EAI processes as a place to temporarily store processed data and then retrieve them after running additional sorting (e.g. using an “order by” clause).

Message Services
“Message service” is a very general term. A message service’s sole purpose is to facilitate the interchange of messages between various systems which – in principle – don’t need to be aware of each other’s existence. Each connected system simply sends messages to this service, and these are then received by other systems. And a system that wants to receive these messages can sign up to particular channels or topics.

The interesting thing here is that the sender doesn’t need to know exactly who received the message, and often doesn’t even find out whether it was received at all. Nor does it particularly matter how the message is ultimately processed. This uncouples the various systems from each other both logically and temporally. The message is handed over to the message service, and that’s it.

Basic message distribution variants

“Channels or topics” is a fairly vague description. In more accurate terms, message services offer a variety of ways to distribute messages. We can broadly distinguish between the following approaches:

  • Queues: Messages are placed in queues and distributed to interested recipients in order (according to the FIFO principle). There can be multiple possible recipients for each message, but only one of them will actually receive a given message.
  • Publish/subscribe: Messages are fed into a given pool, and all recipients who are interested in that pool will receive the associated messages. This is sometimes also known as broadcasting.
  • Routing: Similar to the publish/subscribe process, except that the messages are sent with an additional feature called a routing key. The possible recipients are only interested in messages with particular keys, and in this way, they can filter out the messages that are relevant to them. These keys are entirely freely selectable, and a recipient can also express an interest in more than one key.
  • Topics: This is broadly similar to routing, except that topics don’t just contain one value; rather, they consist of multiple parts. For example: “Earth.Europe.Germany” (with a full stop as a separator), or “Politics.Economics.Germany”. Recipients can register for the exact topic, or for parts of it – e.g. “Earth.Europe.*” (everything that comes under Earth. Europe, with the * as a placeholder for a word), “Politics.#” (everything political, with the # as a placeholder for any number of words), or even “#.Germany” (regardless of whether this relates to the geographical or political classification).
  • RPC: Remote Procedure Call. Each message requires a certain procedure to be carried out on another server, and simultaneously indicates whom the result should be sent to afterwards. Logically enough, recipients register for those messages for which they also offer the relevant processes.

That’s about all we intend to say about how message services work. Anyone who wants to find out more can watch the tutorial provided by message service provider RabbitMQ. A more interesting question is: what sorts of message services are out there? The short answer is that there are a lot of them. There’s the Java Message Service JMS, for example. This is an API – i.e. a programming interface – that provides all the key functions you need to build a message service. One widespread implementation of this is Apache ActiveMQ, but there are several dozen more besides.

But Java isn’t the only way to provide and implement message services; you can do so with many other programming languages too, which means that message services are a dime a dozen. And that brings us to a problem: strictly speaking, message services should be as flexible as possible in order to be able to connect any kind of system as a sender and/or recipient of messages. That’s why the aforementioned RabbitMQ interfaces are available in over half a dozen languages, for example. But integrating many different programming languages is not the only way to make message services flexible: we can also give them an interface that is accessible not via programming, but through the network. One example of this kind of protocol is STOMP (Simple Text Oriented Message Protocol), while another would be AMQP, the Advanced Message Queuing Protocol. The latter is not a text protocol, but a binary protocol, which means it can do more things than STOMP. Let’s take a close look at it:

AMQP

The real, server-side implementation of a message service is also known as a broker, and the technology behind the broker should be hidden from the outside world. Instead, it should offer an interface via which messages can be delivered to and from clients. AMQP is an interface of this kind. It was developed by a fairly large consortium of companies, and is now also an OASIS standard. One of the advantages of AMQP is that it doesn’t cost anything. Anyone can implement and use the protocol in their broker or client.

Another benefit is that increasing numbers of brokers (such as RabbitMQ or Apache ActiveMQ) use this interface. If you already run an in-house message service then there is a very good chance that there will be AMQP available for it. With AMQP, clients and servers (aka brokers) communicate via TCP. It makes absolutely no difference who the party at the other end is, or how (e.g. in what language) they are implemented. The principle is exactly the same as for web servers and browsers. This makes it the ideal basis for linking different systems together in a network, which brings us on to the real key point:

Message services and EDI/EAI

You may be wondering what the topic of message services is doing in this guide. Well, in principle, a message service offers an ideal way for systems within a company network to communicate with each other. OK, so SAP might also offer ALE as a direct mode of communication – but that is highly specific to SAP. Other products rely on HTTP, or even on simple data system interfaces, but these forms of communication are only stopgap solutions, strictly speaking. Whereas HTTP or even FTP are suitable methods for communication between different companies over the Internet (and protocols like AS2 are designed specifically for this), the file system is primarily intended for data storage and not for communications. When it comes to data interchange over a single local network, however, none of these methods represent suitable solutions – whereas message services are designed for that very purpose. Whether your system is an EAI or EDI one (or can handle both), it always needs to communicate with other in-house products (e.g. over LAN).

Things like ESB can help with this – but as we have already said, this is a rather vague concept.

By contrast, the actual implementation of this communication can very readily take the form of a message service – provided, of course, that the software already available inhouse is capable of working with message services. Once again, a good option here is to use AMQP.

If you are currently considering purchasing a piece of EDI or EAI software, you should check whether your existing (or soon-to-be-purchased) systems are capable of handling message services. And if you have any doubts, you should take this into account when choosing your EDI/EAI solution.

AS/400 DataQueue
Data queues are a concept from the IBM AS/400 (now known as System i, iSeries or i5) designed for rapid communication of processes both on the iSeries and outside it. Although the name “queue” suggests FIFO processing, in principle there are three possible working methods:

  • FIFO: First In, First Out – the typical queue concept in which the entries are processed in order of arrival.
  • LIFO: Last In, First Out; also known as a stack. Here, the newest entry is always read first.
  • Keyed: Here, the order of the entries doesn’t count; instead, they are read via a key that is assigned to each entry in the queue.

Whichever mode is used, there will always be one or more processes that add entries to the data queue, and one or more other processes that regularly query the data queue and retrieve the available entries.

How does that help us?

So what can we do with a data queue in EDI/EAI? Well, the AS/400 has a rock solid reputation and is virtually unbeatable, which is why it’s still very popular when it comes to business-critical processes. There are even major companies who wouldn’t consider using anything other than an AS/400 for critical tasks.

On the other hand, the AS/400 (and in particular its unique OS/400 or i5/OS operating systems) represents a world of its own. It has its own complete file system (though it is also capable of emulating other systems within the definition applied in Unix), and it can’t be accessed externally using any old method. Data are often transferred via FTP – or alternatively you can access AS/400 files directly with SQL, since they take the form of database tables.

However, we’ve already pointed out in the article on FTP that this protocol actually only transports files. Some kind of scheduling is generally needed if you want to get hold of those files, or process the files you receive. Likewise, you can only query database tables actively, and this typically also requires scheduling of some kind. So if you want to respond to new data (e.g. a new order) as quickly as possible, you will need to search for new files or fire off select statements at extremely frequent intervals. This is a rather dissatisfactory solution.

By contrast, data queues are specifically designed to be checked frequently for new entries. Instead of entering an entire new order here that needs to be sent out, we can simply add the order number, which can then be used to read the full order using SQL. This vastly improves performance and also allows us to very frequently check for new orders (since checking for an order number is not very costly) and therefore react very quickly.

As we have already said, these data queues are also accessible to processes that don’t run on the iSeries. This access takes place with the help of IBM drivers, such as the Java Toolbox for Java. Here’s what the configuration for this kind of access might look like:

In this example, the value read from the queue is simultaneously used as a parameter in an SQL statement (placeholder “&1”) The result of this statement can then be subjected to further processing. If you operate an AS/400, this is one way in which you can integrate it into your EAI/EDI concept.