0x352 Application
1. Foundation
The application developer will use one of the two predominant architectural paradigms:
- client-server achitecture: there is an always-on host, called the server, which services request from other hosts, called clients (e.g: Web, FTP, Telnet, email)
- P2P architecture: application communicate between pairs of connected hosts (e.g: BitTorrent, eMule, Skype. PPLive)
The compelling feature of P2P is their self-scalability (e.g: do not require significant server infrastructure or bandwidth), but faces three major challenges:
- ISP asymmetrical bandwidth
- security
- users' incentives
The API provided by the Network layer to the Application layer is the socket, the application developer has little control of the transport-layer side of the socket.
2. DNS
DNS is specified in RFC1034, RFC1035.
Originally HOSTS.TXT manages all hosts for ARPANET ( HOSTS.TXT is a single txt file ) . Problems with it were latency, linear search complexity etc.
Currently DNS is implemented with a distributed database (e.g.: BIND , Windows DNS). 13 root servers in the world. Each name is a node in an inverted tree, the path to the node is separated by dot. Non ASCII characters are translated into punycode. Dig command can be used to retrieve DNS records. Google public DNS: 8.8.8.8, 8.8.8.4
2.1. Organizations
- ICANN: root domain management
- Verisign: com, net, 2 root server including
the a root server - IANA: part of ICANN, assign IP
2.2. Records
A resource record is a four-tuple
TTL is for DNS cache in each name server. Typically 1 day or 2 days.
Types:
- A: 32 bit for IPv4 (domain -> ip)
- AAAA: 128 bit for IPv6 (domain -> ip)
- CNAME: map alias domain name to its canonical domain name
- NS: name server for the target domain. used together with A record (e.g: foo.com -> dns.foo.com)
- PTR: map IP to domain (for reverse lookup)
- MX: canonical name of a mail server that has an alias hostname (e.g: foo.com -> mail.bar.foo.com)
- TXT: meta data about server
The Message format looks like this:
2.3. Resolution
2.4. Linux Implementation
- hostname will be first looked up with
/etc/hosts
, if not found using the default name server configuration is stored at/etc/resolv.conf
- client: libresolv library (part of libc) provides the standard client implementation
- server: standard of server implementation is BIND 9
2.5. Windows Implementation
- hosts file is stored under the registry key of
%SystemRoot%\System32\drivers\etc\hosts
2.6. Security
- DNS Cache Poisoning: exploits tutorial (e.g: birthday attack on transaction id)
3. HTTP
HTTP is a stateless protocol, which means the server side does not remember any state info of the client.
HTTP can use either non-persistent connections or persistent connections (default)
- non-persistent connections: each request/response should be sent over a separate TCP connection (connection get closed everytime)
- persistent connections: request/response use the same TCP connection (server keeps the connection alive)
The benefit of the persistent connection is that it requires only 1 RTT (round-trip-time) after the first connection, while the non-persistent connection requires 2 RTT every time (because of handshake)
3.1. Versions
Major versioning are:
- HTTP/1.0 (RFC1945)
- HTTP/1.1 (RFC2616)
- HTTP/2.0 (RFC7540)
- HTTP/3.0 ?!
3.1.1. HTTP/1.0
For every TCP connection, there is only 1 request and 1 response.
3.1.2. HTTP/1.1
It supports connection reuse (i.e for every TCP connection, there could be multiple request and responses)
3.1.3. HTTP/2
Uses multiplexing (resources to be delivered are interleaved and arrive at the client at the same time)
3.1.4. HTTP/3
HTTP/3 uses QUIC, a multiplexed transport protocol built on UDP
3.2. Message
3.2.1. Request Message
Note that HTTP/2 has different format, it uses frames instead of the message here.
In HTTP < 2, A request message example is as follows:
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: fr
where the first line is the request line, containing
- method field
- URL field
- version
The subsequent lines are header lines, some examples are
- Accept-Language: language preference
- Connection: close to tell the server not to be persistent
- User-agent: distinguish browser
- Content-type: media type in the body (e.g: application/json)
After the header lines, there is the entity-body
3.2.2. Response Message
HTTP/1.1 200 OK
Connection: close
Date: Tue, 18 Aug 2015 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 18 Aug 2015 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...)
It has status line, header lines and entity body.
The status line contains
- protocol version
- status code (e.g: 200)
- status message (e.g.: OK)
The header lines might be
- Connection: close to tell the client that server is going to close the connection
- Date: when the response is made at server
- Server: server-side info (like the User-Agent)
- Content-Type
3.3. Methods
3.3.1. POST
Content-Type can be used to distinguish different data sending method.
application/x-www-form-urlencoded
# reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST
POST /test HTTP/1.1
Host: foo.example
Content-Type: application/x-www-form-urlencoded
Content-Length: 27
field1=value1&field2=value2
urlencoded means both key/value are encoded like URL (space is represented as '%20')
multipart/form-data
# reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST
POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="boundary"
--boundary
Content-Disposition: form-data; name="field1"
value1
--boundary
Content-Disposition: form-data; name="field2"; filename="example.txt"
value2
--boundary--
application/json
sends json
# reference: https://www.ibm.com/docs/en/cics-ts/5.2?topic=samples-example-http-request-json-body
POST /genapp/customers/
Host: www.example.com
Content-Type: application/json
Content-Length: nn 1
{
"customers":
{
"firstName": "Joe”,
"lastName": “Bloggs”,
"fullAddress":
{
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
}
}
}
POST from form
When the post is triggered inside form tag, this can be configured with <form enctype='value'>
where value can only be application/x-www-form-urlencoded
(default), multipart/form-data
or text/plain
other Content-Type such as application/json
cannot be sent from form. Those have to be done with other options such as XMLHttpRequest.
3.3.2. PUT
diff between PUT vs POST is that PUT is idempotent: calling it 1 time vs several times has same effect. (recall this property also shows in lots of places, for example, the projection operator)
3.4. Cookie
HTTP itself is a stateless protocol, cookie is used by server to identify users, it is defined at RFC6265
- It consists of name, value (basically a map). typically will contain something like SESS_ID, but can contain other keys such as Domain
- Domain and Path attributes can be used to specify a given site and route
- Expiration can be used to expire cookie (when omitted, the cookie becomes a session cookie which will get deleted when browser is closed)
- It also can have flags such as HttpOnly, Secure, SameSite
The cookie has 4 parts:
- cookie header line in the response message (e.g: Set-Cookie: 1678)
- cookie header line in the request message (e.g: Cookie: 1678)
- cookie file on the client
- backend db managing cookie on the server
4. Email
There are three components related to Email sending system
- user agents (e.g: Outlook)
- mail servers
- SMTP
The mail message has similar format to the HTTP message: consisting of headers and body delimited by a CRLF.
A example of the header:
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.
4.1. SMTP
SMTP (RFC 5321) uses TCP to transfer mail from sender's server to the recipient's server. It can also be used to transfer mail from the sender's user agent to sender's mail server.
It commonly use port 25. Unlike HTTP (pull protocol), SMTP is a push protocol, which means the connection is initiated by the machine which want to send emails.
One constraint of SMTP is that it restricts the body of all mail messages to simple 7-bit ASCII (because it was introduced in 1982)
The main SMTP commands are
- HELO (abbreviation fro HELLO)
- MALL FROM
- RCPT TO
- DATA
- QUIT
4.2. POP3, IMAP
POP3 and IMAP are mail access protocol, in which the user reads email with a client app instead of logging into the mail server.
POP3 (RFC1939) is a simple protocol. It is a command based protocol, and it progresses through three phases:
- authentication: send username cmd, password cmd to server, replied with OK or ERR
- transaction: user agent retrieves message and it can mark messages for deletion (list, retr, dele commands)
- update: user agent sends quit command and server deletes marked messages
IMAP (RFC3501) is a much more complex email access protocol. For example, IMAP allows user to create folders and associate email with folders (again, of course, by using commands)
It also permits users to download components of a email (for low-bandwidth purpose)
5. P2P
5.1. BitTorrent
The collection of all pears participating in the distribution of a particular file is called a torrent. A file is divided into equal-size chunks (with typical size of 256KB). Over time each peer accumulates more and more chunks.
The operating steps are roughly
- When a new peer joins a torrent, it registers itself with the tracker to obtain a random subset of peers (called neighboring peers)
When downloading
- It asks each neighbor to obtain the list of chunks they have
- It asks for the chunk it does not have based on the rarest first technique.
When uploading
- It determines the four peers that are uploading to it at the highest rate, and give them the highest priority (unchoked)
- Every 30 seconds, it picks 1 additional neighbor at random and sends it chunks.
- Every 10 seconds, it recalculates the rates and possibly modifies the set of 4 peers.
Applications
- use aria2 for linux client
5.2. Skype
Each user is assigned to a super peer which maps Skype username to IP address.
When Alice want to call Bob, they are first connected to their respective super peers when they are login. When Bob pick up the call, they are redicted to the relay node. These arch are designed to avoid NAT restrictions of the home routers.
6. Reference
[1] Fall, Kevin R., and W. Richard Stevens. TCP/IP illustrated, volume 1: The protocols. addison-Wesley, 2011.
[2] MDN Docs: https://developer.mozilla.org/en-US/