0x352 Application

1. Foundation
2. DNS
3. HTTP
4. Email
- 4.1. SMTP
- 4.2. POP3, IMAP
5. P2P
- 5.1. BitTorrent
- 5.2. Skype
6. Reference

1. Foundation

The application developer will use one of the two predominant architectural paradigms:

client-server achitecture: there is an always-on host, called the server, which services request from other hosts, called clients (e.g: Web, FTP, Telnet, email)
P2P architecture: application communicate between pairs of connected hosts (e.g: BitTorrent, eMule, Skype. PPLive)

The compelling feature of P2P is their self-scalability (e.g: do not require significant server infrastructure or bandwidth), but faces three major challenges:

ISP asymmetrical bandwidth
security
users' incentives

The API provided by the Network layer to the Application layer is the socket, the application developer has little control of the transport-layer side of the socket.

2. DNS

DNS is specified in RFC1034, RFC1035.

Originally HOSTS.TXT manages all hosts for ARPANET ( HOSTS.TXT is a single txt file ) . Problems with it were latency, linear search complexity etc.

Currently DNS is implemented with a distributed database (e.g.: BIND , Windows DNS). 13 root servers in the world. Each name is a node in an inverted tree, the path to the node is separated by dot. Non ASCII characters are translated into punycode. Dig command can be used to retrieve DNS records. Google public DNS: 8.8.8.8, 8.8.8.4

2.1. Organizations

ICANN: root domain management
Verisign: com, net, 2 root server including the a root server
IANA: part of ICANN, assign IP

DNS

2.2. Records

A resource record is a four-tuple

\[\text{(name, value, type, ttl)}\]

TTL is for DNS cache in each name server. Typically 1 day or 2 days.

Types:

A: 32 bit for IPv4 (domain -> ip)
AAAA: 128 bit for IPv6 (domain -> ip)
CNAME: map alias domain name to its canonical domain name
NS: name server for the target domain. used together with A record (e.g: foo.com -> dns.foo.com)
PTR: map IP to domain (for reverse lookup)
MX: canonical name of a mail server that has an alias hostname (e.g: foo.com -> mail.bar.foo.com)
TXT: meta data about server

The Message format looks like this:

dns

2.3. Resolution

DNS Resolution

DNS hierarchy

2.4. Linux Implementation

hostname will be first looked up with /etc/hosts, if not found using the default name server configuration is stored at /etc/resolv.conf
client: libresolv library (part of libc) provides the standard client implementation
server: standard of server implementation is BIND 9

2.5. Windows Implementation

hosts file is stored under the registry key of %SystemRoot%\System32\drivers\etc\hosts

2.6. Security

DNS Cache Poisoning: exploits tutorial (e.g: birthday attack on transaction id)

3. HTTP

HTTP is a stateless protocol, which means the server side does not remember any state info of the client.

HTTP can use either non-persistent connections or persistent connections (default)

non-persistent connections: each request/response should be sent over a separate TCP connection (connection get closed everytime)
persistent connections: request/response use the same TCP connection (server keeps the connection alive)

The benefit of the persistent connection is that it requires only 1 RTT (round-trip-time) after the first connection, while the non-persistent connection requires 2 RTT every time (because of handshake)

3.1. Versions

Major versioning are:

HTTP/1.0 (RFC1945)
HTTP/1.1 (RFC2616)
HTTP/2.0 (RFC7540)
HTTP/3.0 ?!

3.1.1. HTTP/1.0

For every TCP connection, there is only 1 request and 1 response.

3.1.2. HTTP/1.1

It supports connection reuse (i.e for every TCP connection, there could be multiple request and responses)

3.1.3. HTTP/2

Uses multiplexing (resources to be delivered are interleaved and arrive at the client at the same time)

3.1.4. HTTP/3

HTTP/3 uses QUIC, a multiplexed transport protocol built on UDP

3.2. Message

3.2.1. Request Message

Note that HTTP/2 has different format, it uses frames instead of the message here.

In HTTP < 2, A request message example is as follows:

GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr

where the first line is the request line, containing

method field
URL field
version

The subsequent lines are header lines, some examples are

Accept-Language: language preference
Connection: close to tell the server not to be persistent
User-agent: distinguish browser
Content-type: media type in the body (e.g: application/json)

After the header lines, there is the entity-body

3.2.2. Response Message

HTTP/1.1 200 OK
Connection: close
Date: Tue, 18 Aug 2015 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 18 Aug 2015 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html

(data data data data data ...)

It has status line, header lines and entity body.

The status line contains

protocol version
status code (e.g: 200)
status message (e.g.: OK)

The header lines might be

Connection: close to tell the client that server is going to close the connection
Date: when the response is made at server
Server: server-side info (like the User-Agent)
Content-Type

3.3. Methods

3.3.1. POST

Content-Type can be used to distinguish different data sending method.

application/x-www-form-urlencoded

# reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST

POST /test HTTP/1.1
Host: foo.example
Content-Type: application/x-www-form-urlencoded
Content-Length: 27

field1=value1&field2=value2

urlencoded means both key/value are encoded like URL (space is represented as '%20')

multipart/form-data

# reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST

POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="field1"

value1
--boundary
Content-Disposition: form-data; name="field2"; filename="example.txt"

value2
--boundary--

application/json sends json

# reference: https://www.ibm.com/docs/en/cics-ts/5.2?topic=samples-example-http-request-json-body

POST /genapp/customers/
Host: www.example.com
Content-Type: application/json
Content-Length: nn  1 

{
   "customers":
  {
    "firstName": "Joe”,
    "lastName": “Bloggs”,
    "fullAddress": 
    {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    }
  }
}

POST from form

When the post is triggered inside form tag, this can be configured with <form enctype='value'> where value can only be application/x-www-form-urlencoded (default), multipart/form-data or text/plain

other Content-Type such as application/json cannot be sent from form. Those have to be done with other options such as XMLHttpRequest.

3.3.2. PUT

diff between PUT vs POST is that PUT is idempotent: calling it 1 time vs several times has same effect. (recall this property also shows in lots of places, for example, the projection operator)

HTTP itself is a stateless protocol, cookie is used by server to identify users, it is defined at RFC6265

It consists of name, value (basically a map). typically will contain something like SESS_ID, but can contain other keys such as Domain
Domain and Path attributes can be used to specify a given site and route
Expiration can be used to expire cookie (when omitted, the cookie becomes a session cookie which will get deleted when browser is closed)
It also can have flags such as HttpOnly, Secure, SameSite

The cookie has 4 parts:

cookie header line in the response message (e.g: Set-Cookie: 1678)
cookie header line in the request message (e.g: Cookie: 1678)
cookie file on the client
backend db managing cookie on the server

cookie

4. Email

There are three components related to Email sending system

user agents (e.g: Outlook)
mail servers
SMTP

email

The mail message has similar format to the HTTP message: consisting of headers and body delimited by a CRLF.

A example of the header:

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.

4.1. SMTP

SMTP (RFC 5321) uses TCP to transfer mail from sender's server to the recipient's server. It can also be used to transfer mail from the sender's user agent to sender's mail server.

It commonly use port 25. Unlike HTTP (pull protocol), SMTP is a push protocol, which means the connection is initiated by the machine which want to send emails.

One constraint of SMTP is that it restricts the body of all mail messages to simple 7-bit ASCII (because it was introduced in 1982)

The main SMTP commands are

HELO (abbreviation fro HELLO)
MALL FROM
RCPT TO
DATA
QUIT

4.2. POP3, IMAP

POP3 and IMAP are mail access protocol, in which the user reads email with a client app instead of logging into the mail server.

POP3 (RFC1939) is a simple protocol. It is a command based protocol, and it progresses through three phases:

authentication: send username cmd, password cmd to server, replied with OK or ERR
transaction: user agent retrieves message and it can mark messages for deletion (list, retr, dele commands)
update: user agent sends quit command and server deletes marked messages

IMAP (RFC3501) is a much more complex email access protocol. For example, IMAP allows user to create folders and associate email with folders (again, of course, by using commands)

It also permits users to download components of a email (for low-bandwidth purpose)

5. P2P

5.1. BitTorrent

The collection of all pears participating in the distribution of a particular file is called a torrent. A file is divided into equal-size chunks (with typical size of 256KB). Over time each peer accumulates more and more chunks.

The operating steps are roughly

When a new peer joins a torrent, it registers itself with the tracker to obtain a random subset of peers (called neighboring peers)

When downloading

It asks each neighbor to obtain the list of chunks they have
It asks for the chunk it does not have based on the rarest first technique.

When uploading

It determines the four peers that are uploading to it at the highest rate, and give them the highest priority (unchoked)
Every 30 seconds, it picks 1 additional neighbor at random and sends it chunks.
Every 10 seconds, it recalculates the rates and possibly modifies the set of 4 peers.

Applications

use aria2 for linux client

5.2. Skype

Each user is assigned to a super peer which maps Skype username to IP address.

When Alice want to call Bob, they are first connected to their respective super peers when they are login. When Bob pick up the call, they are redicted to the relay node. These arch are designed to avoid NAT restrictions of the home routers.

6. Reference

[1] Fall, Kevin R., and W. Richard Stevens. TCP/IP illustrated, volume 1: The protocols. addison-Wesley, 2011.

[2] MDN Docs: https://developer.mozilla.org/en-US/