Open In App

Marshalling in Distributed System

Last Updated : 22 Mar, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

A Distributed system consists of numerous components located on different machines that communicate and coordinate operations to seem like a single system to the end-user.

External Data Representation:

Data structures are used to represent the information held in running applications. The information consists of a sequence of bytes in messages that are moving between components in a distributed system.  So, conversion is required from the data structure to a sequence of bytes before the transmission of data. On the arrival of the message, data should also be able to be converted back into its original data structure.

Different types of data are handled in computers, and these types are not the same in every position where data must be transmitted. Individual primitive data items can have a variety of data values, and not all computers store primitive values like integers in the same order. Different architectures also represent floating-point numbers differently. Integers are ordered in two ways, big-endian order, in which the Most Significant Byte (MSB) is placed first, and little-endian order, in which the Most Significant Byte (MSB) is placed last or the Least Significant Byte (LSB) is placed first. Furthermore, one more issue is the set of codes used to represent characters. Most applications on UNIX systems use ASCII character coding, which uses one byte per character, whereas the Unicode standard uses two bytes per character and allows for the representation of texts in many different languages.

There should be a means to convert all of this data to a standard format so that it can be sent successfully between computers.  If the two computers are known to be of the same type, the external format conversion can be skipped otherwise before transmission, the values are converted to an agreed-upon external format, which is then converted to the local format on receiving. For that, values are sent in the sender’s format, along with a description of the format, and the recipient converts them if necessary. It’s worth noting, though, that bytes are never changed during transmission. Any data type that can be supplied as a parameter or returned, as a result, must be able to be converted and the individual primitive data values expressed in an accepted format to support Remote Procedure Call (RPC) or Remote Method Invocation (RMI) mechanisms. So, an external data representation is a standard for representing data structures and primitive values that have been agreed upon.

  • Marshalling: Marshalling is the process of transferring and formatting a collection of data structures into an external data representation type appropriate for transmission in a message.
  • Unmarshalling: The converse of this process is unmarshalling, which involves reformatting the transferred data upon arrival to recreate the original data structures at the destination.

Approaches:

There are three ways to successfully communicate between various sorts of data between computers.

1. Common Object Request Broker Architecture (CORBA): 

CORBA  is a specification defined by the Object Management Group (OMG) that is currently the most widely used middleware in most distributed systems.  It allows systems with diverse architectures, operating systems, programming languages, and computer hardware to work together. It allows software applications and their objects to communicate with one another.  It is a standard for creating and using distributed objects. It is made up of five major components. Components and their function are given below:

  • Object Request Broker (ORB): It provides a communication infrastructure for the objects to communicate across a network.
  • Interface Definition Language (IDL): It is a specification language used to provide an interface in a software component. To exemplify, it allows communication between software components written in C++ and Java.
  • Dynamic Invocation Interface (DII): Using DII, client applications are permitted to use server objects without even knowing their types at compile time. Here client obtains an instance of a CORBA object and then invocation requests can be made dynamically on the corresponding object.
  • Interface Repository (IR): As the name implies, interfaces can be added to the interface repository. The purpose of IR is that a client should be able to find an object which is not known at compile-time and information about its interface then request is made to be sent to ORB.
  • Object Adapter (OA): It is used to access ORB services like object reference generation.
Common Object Request Broker Architecture (CORBA)

 

Data Representation in CORBA:

Common Data Representation (CDR) is used to describe structured or primitive data types that are supplied as arguments or results during remote invocations on CORBA distributed objects. It allows clients and servers’ built-in computer languages to communicate with one another. To exemplify, it converts little-endian to big-endian.

There are 15 primitive types: short (16-bit), long (32-bit), unsigned short, unsigned long, float (32-bit), double (64-bit), char, boolean (TRUE, FALSE), octet (8-bit), and any (which can represent any basic or constructed type), as well as a variety of composite types.

CORBA CDR Constructed Types:

Let’s have a look at Types with their representation:

  • sequence: It refers to length (unsigned long) to be followed by elements in order
  • string: It refers to length (unsigned long) followed by characters in order (can also have wide characters)
  • array: The elements of the array follow order and length is fixed so not specified.
  • struct: in the order of declaration of components
  • enumerated: It is unsigned long and here, the values are specified by the order declared.
  • union: type tag followed by the selected member

Example:

struct Person {
string name;
string place;
long year;
};

Marshalling CORBA:

From the specification of the categories of data items to be transmitted in a message, Marshalling CORBA operations can be produced automatically. CORBA IDL describes the types of data structures and fundamental data items and provides a language/notation for specifying the types of arguments and results of RMI methods.

2. Java’s Object Serialization:

Java Remote Method Invocation (RMI) allows you to pass both objects and primitive data values ​​as arguments and method calls. In Java, the term serialization refers to the activity of putting an object (an instance of a class) or a  set of related objects into a serial format suitable for saving to disk or sending in a message.

Java provides a mechanism called object serialization. This allows an object to be represented as a sequence of bytes containing information about the object’s data and the type of object and the type of data stored in the object.  After the serialized object is written to the file, it can be read from the file and deserialized. You can recreate an object in memory with type information and bytes that represent the object and its data.

Java's Object Serialization

 

Moreover, objects can be serialized on one platform and deserialized on completely different platforms as the whole process is JVM independent.

For example, the Java class equivalent to the Person struct defined in CORBA IDL might be:

Java




import java.io.*;
public class Person implements Serializable {
  
  public String name;
  public String place;
  public int phonenumber;
  public void letter() {
     System.out.println("Issue a letter to " + name + " " + place);
  }
}


 

3. Extensible Markup Language (XML):

Clients communicate with web services using XML, which is also used to define the interfaces and other aspects of web services. However, XML is utilized in a variety of different applications, including archiving and retrieval systems; while an XML archive is larger than a binary archive, it has the advantage of being readable on any machine. Other XML applications include the design of user interfaces and the encoding of operating system configuration files.

In contrast to HTML, which employs a fixed set of tags, XML is extensible in the sense that users can construct their tags. If an XML document is meant to be utilized by several applications, the tag names must be unique.

Clients, for example, typically interface with web servers via SOAP messages. SOAP is an XML standard with tags that web services and their customers can utilize. Because it is expected that the client and server sharing a message have prior knowledge of the order and types of information it contains, some external data representations (such as CORBA CDR) do not need to be self-describing. On the other hand, XML was designed to be utilized by a variety of applications for a variety of reasons. This has been made possible by the inclusion of tags and the usage of namespaces to specify the meaning of the tags. Furthermore, the usage of tags allows applications to pick only the portions of a document that they need to process.

Example:

XML definition of the Person struct:
<person id="9865">
  <name>John</name>
  <place>England</place>
  <year>1876</year>
  <!-- comment -->
</person>

Usage:

Marshalling is used to create various remote procedure call (RPC) protocols, where separate processes and threads often have distinct data formats, necessitating the need for marshalling between them.

To transmit data across COM object boundaries, the Microsoft Component Object Model (COM) interface pointers employ marshalling. When a common-language-runtime-based type has to connect with other unmanaged types via marshalling, the same thing happens in the.NET framework. DCOM stands for Distributed Component Object Model.

Microsoft Component Object Model (COM)

 

Scripts and applications based on the Cross-Platform Component Object Model (XPCOM) technology are two further examples where marshalling is crucial. The Mozilla Application Framework makes heavy use of XPCOM, which makes considerable use of marshalling.

So, XML (Extensible Markup Language) is a text-based format for expressing structured data. It was designed to represent data sent in messages exchanged by clients and servers in web services

The primitive data types are marshalled into a binary form in the first two ways- CORBA and Java’s object serialization. The primitive data types are expressed textually in the third technique (XML). A data value’s textual representation will typically be longer than its binary representation. The HTTP protocol is another example of the textual approach.

On the other hand, type information is included in both Java serialization and XML, but in distinct ways. Although Java serializes all of the essential type information, XML documents can refer to namespaces, which are externally specified groups of names (with types).



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads