Rajeev Ranjan's /* Life Runs on Code*/: 2011

Monday, August 29, 2011

How Rsync Works

A Practical Overview
Foreword
The original Rsync technical report and Andrew Tridgell's Phd thesis (pdf) Are both excellent documents for understanding the theoretical mathematics and some of the mechanics of the rsync algorithm. Unfortunately they are more about the theory than the implementation of the rsync utility (hereafter referred to as Rsync).

In this document I hope to describe...

A non-mathematical overview of the rsync algorithm.
How that algorithm is implemented in the rsync utility.
The protocol, in general terms, used by the rsync utility.
The identifiable roles the rsync processes play.
This document be able to serve as a guide for programmers needing something of an entré into the source code but the primary purpose is to give the reader a foundation from which he may understand

Why rsync behaves as it does.
The limitations of rsync.
Why a requested feature is unsuited to the code-base.
This document describes in general terms the construction and behaviour of Rsync. In some cases details and exceptions that would contribute to specific accuracy have been sacrificed for the sake meeting the broader goals.

Processes and Roles
When we talk about Rsync we use specific terms to refer to various processes and their roles in the task performed by the utility. For effective communication it is important that we all be speaking the same language; likewise it is important that we mean the same things when we use certain terms in a given context. On the rsync mailing list there is often some confusion with regards to role and processes. For these reasons I will define a few terms used in the role and process contexts that will be used henceforth. client role The client initiates the synchronisation.
server role The remote rsync process or system to which the client connects either within a local transfer, via a remote shell or via a network socket.
This is a general term and should not be confused with the daemon.

Once the connection between the client and server is established the distinction between them is superseded by the sender and receiver roles.
daemon Role and process An Rsync process that awaits connections from clients. On a certain platform this would be called a service.
remote shell role and set of processes One or more processes that provide connectivity between an Rsync client and an Rsync server on a remote system.
sender role and process The Rsync process that has access to the source files being synchronised.
receiver role and process As a role the receiver is the destination system. As a process the receiver is the process that receives update data and writes it to disk.
generator process The generator process identifies changed files and manages the file level logic.

Process Startup
When an Rsync client is started it will first establish a connection with a server process. This connection may be through pipes or over a network socket.

When Rsync communicates with a remote non-daemon server via a remote shell the startup method is to fork the remote shell which will start an Rsync server on the remote system. Both the Rsync client and server are communicating via pipes through the remote shell. As far as the rsync processes are concerned there is no network. In this mode the rsync options for the server process are passed on the command-line that is used to start the remote shell.

When Rsync is communicating with a daemon it is communicating directly with a network socket. This is the only sort of Rsync communication that could be called network aware. In this mode the rsync options must be sent over the socket, as described below.

At the very start of the communication between the client and the server, they each send the maximum protocol version they support to the other side. Each side then uses the minimum value as the the protocol level for the transfer. If this is a daemon-mode connection, rsync options are sent from the client to the server. Then, the exclude list is transmitted. From this point onward the client-server relationship is relevant only with regards to error and log message delivery.

Local Rsync jobs (when the source and destination are both on locally mounted filesystems) are done exactly like a push. The client, which becomes the sender, forks a server process to fulfill the receiver role. The client/sender and server/receiver communicate with each other over pipes.

The File List
The file list includes not only the pathnames but also ownership, mode, permissions, size and modtime. If the --checksum option has been specified it also includes the file checksums.
The first thing that happens once the startup has completed is that the sender will create the file list. While it is being built, each entry is transmitted to the receiving side in a network-optimised way.

When this is done, each side sorts the file list lexicographically by path relative to the base directory of the transfer. (The exact sorting algorithm varies depending on what protocol version is in effect for the transfer.) Once that has happened all references to files will be done by their index in the file list.

If necessary the sender follows the file list with id→name tables for users and groups which the receiver will use to do a id→name→id translation for every file in the file list.

After the file list has been received by the receiver, it will fork to become the generator and receiver pair completing the pipeline.

The Pipeline
Rsync is heavily pipelined. This means that it is a set of processes that communicate in a (largely) unidirectional way. Once the file list has been shared the pipeline behaves like this:
generator → sender → receiver
The output of the generator is input for the sender and the output of the sender is input for the receiver. Each process runs independently and is delayed only when the pipelines stall or when waiting for disk I/O or CPU resources.

The Generator
The generator process compares the file list with its local directory tree. Prior to beginning its primary function, if --delete has been specified, it will first identify local files not on the sender and delete them on the receiver.

The generator will then start walking the file list. Each file will be checked to see if it can be skipped. In the most common mode of operation files are not skipped if the modification time or size differs. If --checksum was specified a file-level checksum will be created and compared. Directories, device nodes and symlinks are not skipped. Missing directories will be created.

If a file is not to be skipped, any existing version on the receiving side becomes the "basis file" for the transfer, and is used as a data source that will help to eliminate matching data from having to be sent by the sender. To effect this remote matching of data, block checksums are created for the basis file and sent to the sender immediately following the file's index number. An empty block checksum set is sent for new files and if --whole-file was specified.

The block size and, in later versions, the size of the block checksum are calculated on a per file basis according to the size of that file.

The Sender
The sender process reads the file index numbers and associated block checksum sets one at a time from the generator.
For each file id the generator sends it will store the block checksums and build a hash index of them for rapid lookup.

Then the local file is read and a checksum is generated for the block beginning with the first byte of the local file. This block checksum is looked for in the set that was sent by the generator, and if no match is found, the non-matching byte will be appended to the non-matching data and the block starting at the next byte will be compared. This is what is referred to as the “rolling checksum”

If a block checksum match is found it is considered a matching block and any accumulated non-matching data will be sent to the receiver followed by the offset and length in the receiver's file of the matching block and the block checksum generator will be advanced to the next byte after the matching block.

Matching blocks can be identified in this way even if the blocks are reordered or at different offsets. This process is the very heart of the rsync algorithm.

In this way, the sender will give the receiver instructions for how to reconstruct the source file into a new destination file. These instructions detail all the matching data that can be copied from the basis file (if one exists for the transfe), and includes any raw data that was not available locally. At the end of each file's processing a whole-file checksum is sent and the sender proceeds with the next file.

Generating the rolling checksums and searching for matches in the checksum set sent by the generator require a good deal of CPU power. Of all the rsync processes it is the sender that is the most CPU intensive.

The Receiver
The receiver will read from the sender data for each file identified by the file index number. It will open the local file (called the basis) and will create a temporary file.

The receiver will expect to read non-matched data and/or to match records all in sequence for the final file contents. When non-matched data is read it will be written to the temp-file. When a block match record is received the receiver will seek to the block offset in the basis file and copy the block to the temp-file. In this way the temp-file is built from beginning to end.

The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.

After the temp-file has been completed, its ownership and permissions and modification time are set. It is then renamed to replace the basis file.

Copying data from the basis file to the temp-file make the receiver the most disk intensive of all the rsync processes. Small files may still be in disk cache mitigating this but for large files the cache may thrash as the generator has moved on to other files and there is further latency caused by the sender. As data is read possibly at random from one file and written to another, if the working set is larger than the disk cache, then what is called a seek storm can occur, further hurting performance.

The Daemon
The daemon process, like many daemons, forks for every connection. On startup, it parses the rsyncd.conf file to determine what modules exist and to set the global options.
When a connection is received for a defined module the daemon forks a new child process to handle the connection. That child process then reads the rsyncd.conf file to set the options for the requested module, which may chroot to the module path and may drop setuid and setgid for the process. After that it will behave just like any other rsync server process adopting either a sender or receiver role.

The Rsync Protocol
A well-designed communications protocol has a number of characteristics.

Everything is sent in well defined packets with a header and an optional body or data payload.
In each packet's header a type and or command specified.
Each packet has a definite length.
In addition to these characteristics, protocols have varying degrees of statefulness, inter-packet independence, human readability, and the ability to reestablish a disconnected session.

Rsync's protocol has none of these good characteristics. The data is transferred as an unbroken stream of bytes. With the exception of the unmatched file-data, there are no length specifiers nor counts. Instead the meaning of each byte is dependent on its context as defined by the protocol level.

As an example, when the sender is sending the file list it simply sends each file list entry and terminates the list with a null byte. Within the file list entries, a bitfield indicates which fields of the structure to expect and those that are variable length strings are simply null terminated. The generator sending file numbers and block checksum sets works the same way.

This method of communication works quite well on reliable connections and it certainly has less data overhead than the formal protocols. It unfortunately makes the protocol extremely difficult to document, debug or extend. Each version of the protocol will have subtle differences on the wire that can only be anticipated by knowing the exact protocol version.

notes
This document is a work in progress. The author expects that it has some glaring oversights and some portions that may be more confusing than enlightening for some readers. It is hoped that this could evolve into a useful reference.
Specific suggestions for improvement are welcome, as would be a complete rewrite.

Friday, August 19, 2011

CDE Login Problem, Remote X11, X-Server, XDMCP login problem

Sometimes I got an error after finishing on Solaris 10 box installation. After make some configuration then suddenly I can’t access my Solaris XDMCP remote session on my laptop.. Usually, I use XManager Enterprise to get Solaris GUI remote session XDMCP. here the step-by-step to troubleshoot if you got the same problem:

{Make sure that svc:/application/graphical-login/cde-login is enabled and online.

root@solaris10 # svcs cde-login
STATE STIME FMRI
online Mar_02 svc:/application/graphical-login/cde-login:default

root@solaris10 #netservices limited

restarting syslogd
restarting sendmail
dtlogin needs to be restarted. Restart now? [Y] y
restarting dtlogin
{Check dtlogin process:

root@solaris10 # ps -ef | grep dtlogin

root 29384 1 0 Mar 02 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 0 [should be TCP, not UDP]

{Modify the x11-server service:

—–>Show properties:
#svcprop svc:/application/x11/x11-server

——>Turn on tcp listen:
#svccfg -s svc:/application/x11/x11-server setprop options/tcp_listen=true

{Modify the dtlogin service:

—–>Show properties:
#svcprop svc:/application/graphical-login/cde-login:default
#svccfg -s svc:/application/graphical-login/cde-login setprop dtlogin/args=\”\”

—–>Then restart the X server:
#svcadm refresh svc:/application/graphical-login/cde-login:default;
#svcprop -p dtlogin svc:/application/graphical-login/cde-login:default

root@solaris10 #netservices open

restarting syslogd
restarting sendmail

root@solaris10# svcadm restart cde-login
root@solaris10# ps -ef |grep dtlogin
root 27722 1 0 15:08:37 ? 0:00 /usr/dt/bin/dtlogin -daemon
root 27724 26297 0 15:08:43 pts/3 0:00 grep dtlogin

Wednesday, June 29, 2011

tar file size limit

Generally, tar can't handle files larger than 2GB. I suggest using an alternative to tar, 'star'. A more comprehensive answer is available here:

--->
The quick answer is: 2^63-1 for the archive, 68'719'476'735 (8^12-1)
bytes for each file, if your environment permits that.

as I understand your question, you want to know if you can produce tar
files that are biger than 2 GBytes (and how big you can really make
them). The answer to this question depends on a few simple parameters:

1) Does your operating system support large files?
2) What version of tar are you using?
3) What is the underlying file system?

You can answer question 1) for yourself by verifying that your kernel
supports 64bit file descriptors. For Linux this is the case for
several years now. A quick look in /usr/include/sys/features.h will
tell you, if there is any line containing 'FILE_OFFSET_BITS'. If there
is, your OS very very probably has support for large files.

For Solaris, just check whether 'man largefile' works, or try 'getconf
-a|grep LARGEFILE'. If it works, then you have support for large files
in the operating system. Again, support for large files has been there
for several years.

For other operating systems, try "man -k large file', and see what you
get -- I'll be gladly providing help if you need to ask for
clarification to this answer. Something like "cd /usr/include; grep
'FILE_OFFSET_BITS' * */*" should tell you quickly if there is standard
large file support.

2) What version of tar are you using? This is important. Obviously,
older tar programs won't be able to handle files or archives that are
larger than 2^31-1 bytes (2.1 Gigabytes). Try running 'tar --version'.
If the first line indicates you are using gnu tar, then any version
newer than 1.12.64 will in principle be able to provide you with large
files. Try to run this command: "strings `which tar`|grep 64", and you
should see some lines saying lseek64, creat64, fopen64. If yes, your
tar contains support for large files.

If your tar program does not contain support for large files (most
really do, but maybe you are working on a machine older than 1998?),
you can download the newest gnu tar from ftp://ftp.gnu.org/pub/gnu/tar
and compile it for yourself.

The size of files you put into a tar archive (not the archive itself)
is limited to 11 octal digits, the max. size of a single file is thus
ca. 68 GBytes.

3) Given that both your operating system (and C library), and tar
application support large files, the only really limiting factor is
the file system that you try to create the file in. The theoretical
limit for the tar archive size is 2^63-1 (9'223'372 Terabytes), but
you will reach more practical limits (disk or tape size) much quicker.
Also take into consideration what the file system is. DOS FAT 12
filesystems don't allow files as big as the Linux EXT2, or Sun UFS
file systems.

If you need more precise data (for a specific file system type, or for
the OS, etc.) please do not hesitate to ask for clarification.

I hope my answer is helpful to you,
Rajeev

Tuesday, February 8, 2011

Amazon EC2 Instance Types

Amazon EC2 Instance Types

Amazon Elastic Compute Cloud (Amazon EC2) provides the flexibility to choose from a number of different instance types to meet your computing needs. Each instance provides a predictable amount of dedicated compute capacity and is charged per instance-hour consumed.
Available Instance Types
Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro
High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge
High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge
Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge
Selecting Instance Types

Amazon EC2 instances are grouped into six families: Standard, Micro, High-Memory, High-CPU, Cluster Compute, and Cluster GPU.

Standard Instances have memory to CPU ratios suitable for most general purpose applications; High-Memory instances offer larger memory sizes for high throughput applications, including database and memory caching applications; and High-CPU instances have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

Micro instances provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically. At steady state, Micro instances receive a fraction of the compute resources that Small instances do. Therefore, if your application has compute-intensive or steady state needs we recommend using a Small instance (or larger, depending on your needs). However, Micro instances can periodically burst up to 2 ECUs (for short periods of time). This is double the number of ECUs available from a Standard Small instance. Therefore, if you have a relatively low throughput application or web site with an occasional need to consume significant compute cycles, we recommend using Micro instances.

Cluster Compute instances provide a very large amount of CPU coupled with increased network performance making them well suited for High Performance Compute (HPC) applications and other demanding network-bound applications.

Cluster GPU instances provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance making them well suited for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications.

When choosing instance types, you should consider the characteristics of your application with regards to resource utilization and select the optimal instance family and size. One of the advantages of EC2 is that you pay by the instance hour, which makes it convenient and inexpensive to test the performance of your application on different instance families and types. One good way to determine the most appropriate instance family and instance type is to launch test instances and benchmark your application.
Measuring Compute Resources

Transitioning to a utility computing model fundamentally changes how developers have been trained to think about CPU resources. Instead of purchasing or leasing a particular processor to use for several months or years, you are renting capacity by the hour. Because Amazon EC2 is built on commodity hardware, over time there may be several different types of physical hardware underlying EC2 instances. Our goal is to provide a consistent amount of CPU capacity no matter what the actual underlying hardware.

Amazon EC2 uses a variety of measures to provide each instance with a consistent and predictable amount of CPU capacity. In order to make it easy for developers to compare CPU capacity between different instance types, we have defined an Amazon EC2 Compute Unit. The amount of CPU that is allocated to a particular instance is expressed in terms of these EC2 Compute Units. We use several benchmarks and tests to manage the consistency and predictability of the performance of an EC2 Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. This is also the equivalent to an early-2006 1.7 GHz Xeon processor referenced in our original documentation. Over time, we may add or substitute measures that go into the definition of an EC2 Compute Unit, if we find metrics that will give you a clearer picture of compute capacity.

To find out which instance will work best for your application, the best thing to do is to launch an instance and benchmark your own application. One of the advantages of EC2 is that you pay by the hour which makes it convenient and inexpensive to test the performance of your application on different instance types.
I/O Performance

Amazon EC2 provides virtualized server instances. While some resources like CPU, memory and instance storage are dedicated to a particular instance, other resources like the network and the disk subsystem are shared among instances. If each instance on a physical host tries to use as much of one of these shared resources as possible, each will receive an equal share of that resource. However, when a resource is under-utilized you will often be able to consume a higher share of that resource while it is available.

The different instance types will provide higher or lower minimum performance from the shared resources depending on their size. Each of the instance types has an I/O performance indicator (low, moderate, or high). Instance types with high I/O performance have a larger allocation of shared resources. Allocating larger share of shared resources also reduces the variance of I/O performance. For many applications, low or moderate I/O performance is more than enough. However, for those applications requiring greater or more consistent I/O performance, you may want to consider instances with high I/O performance.

Cluster Compute and Cluster GPU instances have very high I/O performance using 10 Gigabit Ethernet for high network throughput and reduced network latencies within clusters.

In addition, you can use Amazon EBS to get improved storage I/O performance for disk bound applications.

*The Small Instance type is equivalent to the original Amazon EC2 instance type that has been available since Amazon EC2 launched. This instance type is currently the default for all customers. To use other instance types, customers must specifically request them via the RunInstances API.

Rajeev Ranjan's /* Life Runs on Code*/