CSAPP_lab ProxyLab

Last updated on 6 months ago

proxylab

准备

所有文件均可以从官网上直接下载:Lab Assignments

基本概念

一个 HTTP 报文由请求行 request line 和一个或多个请求头 request header 组成,二者的格式分别为:

1
2
method URI version
header-name: header-data
  • URI, uniform resource identifier 是用来标识和定位互联网上的资源,它有 URL, uniform resource locatorURN, uniform resource name 两种形式。换句话说,urlurn 都是 uri 的子集
    • URL 用于提供资源的标识以及如何访问该资源,通常由协议类型 http, https, ftp、主机名 www.example.com 和路径名 /index.html 构成,例如:https://www.example.com/index.html
    • URN 是一种资源名称的字符序列,URN 与资源的位置无关,而是一个专注于资源唯一性的名称
    • 总的来说,URL 用于标识资源的位置,URN 用于标识资源的名称;前者提供访问资源的路径,后者提供该资源的唯一名称
  • 在请求行中,Host 报头不是必须的,那么这里有一个问题是:如果没有 Host 报头,那么该报文该发向哪个服务器
    • 实际上,我们在浏览器中输入 URL 时,就已经指定了服务器的 IP 地址,也就是说对于客户端而言,需要先与服务器取得联系,再发送 HTTP 请求报文

HTTP 的请求行是 URI,这里通常只会写路径名,也就是形如 GET /index.html HTTP/1.1

在本实验中,proxy 收到的一定是类似 GET https://www.example.com/index.html HTTP/1.1 这种形式,然后以 GET /index.html HTTP/1.1 转发出去,并设置相应的字段值

例子

解包后的 ./tiny 中有一个课本当中的 TINY WBE 服务器,我们可以简单实验一下:

我们采用一下端口:

1
2
./port-for-user.pl droh
droh: 45806

开一个终端,启动 tiny 服务器,然后另开一个终端,用 telnet 进行连接:

1
2
3
./tiny 45806

telnet localhost 45806

得到以下输出(telnet 的请求行输入完毕后需要按两下回车):

第一次:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# telnet

telnet localhost 45806
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.0 200 OK
Server: Tiny Web Server
Content-length: 120
Content-type: text/html

<html>
<head><title>test</title></head>
<body>
<img align="middle" src="godzilla.gif">
Dave O'Hallaron
</body>
</html>
Connection closed by foreign host.

# tiny

Accepted connection from (localhost, 35116)
GET / HTTP/1.1

第二次:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# telnet

telnet localhost 45806
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /cgi-bin/adder?15000&213 HTTP/1.0

HTTP/1.0 200 OK
Server: Tiny Web Server
Connection: close
Content-length: 115
Content-type: text/html

Welcome to add.com: THE Internet addition portal.
<p>The answer is: 15000 + 213 = 15213
<p>Thanks for visiting!
Connection closed by foreign host.

# tiny

Accepted connection from (localhost, 60314)
GET /cgi-bin/adder?15000&213 HTTP/1.0

调试

我们主要使用 curl 进行调试,只会使用到两个参数 -v--proxy,前者可以显示更详细的信息,后者表示使用代理服务器

如果直接需要对 tiny 进行通讯,那么:

1
2
curl -v http://localhost:45807/home.html
curl -v http://localhost:45807/cgi-bin/adder?50\&70

我们可以通过 proxytiny 进行通讯,有:

1
2
curl -v --proxy http://localhost:45806 http://localhost:45807/home.html
curl -v --proxy http://localhost:45806 http://localhost:45807/cgi-bin/adder?50\&70

实验过程

Part 1: Basic

在这一部分,我们需要将浏览器传入给 proxy 的报文转发给服务器,具体地:

假设浏览器向 proxy 发出一个请求头:

1
GET http://www.cmu.edu/hub/index.html HTTP/1.1

proxy 需要将其转换为:

1
2
3
4
5
GET /hub/index.html HTTP/1.1
Host: http://www.cmu.edu
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3)Gecko/20120305 Firefox/10.0.3
Connection: close
Proxy-Connection: close

也就是说,我们需要将原先的请求头从完整的 URL 改为 URL 的后缀,然后分别对 Host, User-Agent, Connection, Proxy-Connection 字段进行修改(如果浏览器发出的报文包含请求头,那么我们会忽略对应的请求头)

其次,我们尤其需要注意,在转发时我们的函数需要对文本文件和二进制文件均有效

在底层文件的表示上,二者都是由 01 组成的,但文本文件中每一行会有 \r\n ,因此我们可以读取文本文件的一行,而对于二进制文件则无法这么做(因为我们无法确定一行到底有多长)

我们关注转发代码:

1
2
3
4
5
6
7
8
9
forwardfd = Open_clientfd(header.hostname, header.port);
Rio_readinitb(&server_rio, forwardfd);
Rio_writen(forwardfd, forwardBuf, strlen(forwardBuf));

size_t n;
while((n = Rio_readlineb(&server_rio, forwardBuf, MAXLINE)) != 0) {
fprintf(stdout, "proxy recived %ld bytes\n", n);
Rio_writen(fd, forwardBuf, strlen(forwordBuf));
}

这个代码只能针对文本文件运行,这是因为在 while 中调用 Rio_writen 时,我们是一行一行写回的,而对于二进制文件则没有行这个概念,所以该代码无法对二进制文件生效

while 循环中的 Rio_writen 改为:Rio_writen(fd, forwardBuf, n); 即可

这里的 n 为实际读取到的字节数,我们用 n 来表示每次写回的缓冲器大小

完整代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
#include "csapp.h"
#include <strings.h>
//#include <stdio.h>

/* Recommended max cache and object sizes */
#define MAX_CACHE_SIZE 1049000
#define MAX_OBJECT_SIZE 102400

/* You won't lose style points for including this long line in your code */
static const char *user_agent_hdr = "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n";

typedef struct requestHeaders {
char hostname[MAXLINE];
char port[MAXLINE];
char filename[MAXLINE];
}requesthdrs;

void doit(int fd);
void prase_url(char* uri, requesthdrs* header);
void read_requesthdrs(rio_t* rp);
void clienterror(int fd, char *cause, char *errnum, char *shortmsg, char *longmsg);
void forwordMessege(char* buf, requesthdrs* headers, rio_t* rp);

void sighandler(int sig)
{
;
}

int main(int argc, char* argv [])
{
//printf("%s", user_agent_hdr);
if(argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(1);
}
int listenfd, connfd;
socklen_t clientlen;
struct sockaddr_storage clientaddr;
char hostname[MAXLINE], port[MAXLINE];
Signal(SIGPIPE, sighandler);

listenfd = Open_listenfd(argv[1]);
while(1) {
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd, (SA*)&clientaddr, &clientlen);
Getnameinfo((SA*)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0);
printf("Accept Connection from (%s, %s)\n", hostname, port);
doit(connfd);
Close(connfd);
}
return 0;
}

void doit(int fd)
{
char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];
char forwardBuf[MAXLINE];
requesthdrs header;
int forwardfd;
rio_t client_rio, server_rio;
Rio_readinitb(&client_rio, fd);

Rio_readlineb(&client_rio, buf, MAXLINE);

//printf("recived header: %s\n", buf);

sscanf(buf, "%s %s %s", method, uri, version);
//ignore the case of characters
if(strcasecmp(method, "GET")) {
clienterror(fd, method, "501", "Not implemented", "Proxy dose not implement this method\n");
fprintf(stderr, "%s: Proxy dose not implement this method\n", method);
return;
}

prase_url(uri, &header);
forwordMessege(forwardBuf, &header, &client_rio);


//printf("-----------------------------------------\n");
//printf("%s\n", forwardBuf);
//printf("host: %s, port: %s, file: %s\n", header.hostname, header.port, header.filename);


forwardfd = Open_clientfd(header.hostname, header.port);
Rio_readinitb(&server_rio, forwardfd);
Rio_writen(forwardfd, forwardBuf, strlen(forwardBuf));

size_t n;
while((n = Rio_readlineb(&server_rio, forwardBuf, MAXLINE)) != 0) {
fprintf(stdout, "proxy recived %ld bytes\n", n);
Rio_writen(fd, forwardBuf, n);
}

Close(forwardfd);
}

void read_requesthdrs(rio_t* rp)
{
char buf[MAXLINE];
Rio_readlineb(rp, buf, MAXLINE);
while(strcmp(buf, "\r\n")) {
Rio_readlineb(rp, buf, MAXLINE);
}
return;
}

void forwordMessege(char* buf, requesthdrs* headers, rio_t* rp)
{
char tmp[MAXLINE], getLine[MAXLINE], hostLine[MAXLINE];
char userAgentLine[MAXLINE], connectionLine[MAXLINE], proxyConnectionLine[MAXLINE];
sprintf(getLine, "GET %s HTTP/1.0\r\n", headers->filename);
sprintf(hostLine, "Host: %s\r\n", headers->hostname);
sprintf(userAgentLine, "User-Agent: %s", user_agent_hdr);
sprintf(connectionLine, "Connection: close\r\n");
sprintf(proxyConnectionLine, "Proxy-Connection: close\r\n");

char* ptr = buf;
sprintf(ptr, getLine); ptr += strlen(getLine);
sprintf(ptr, hostLine); ptr += strlen(hostLine);
sprintf(ptr, userAgentLine); ptr += strlen(userAgentLine);
sprintf(ptr, connectionLine); ptr += strlen(connectionLine);
sprintf(ptr, proxyConnectionLine); ptr += strlen(proxyConnectionLine);

Rio_readlineb(rp, tmp, MAXLINE);
while(strcmp(tmp, "\r\n")) {
if(!strncasecmp(tmp, "Host", strlen("Host")) || !strncasecmp(tmp, "User-Agent", strlen("User-Agent"))
|| !strncasecmp(tmp, "Connection", strlen("Connection"))
|| !strncasecmp(tmp, "Proxy-Connection", strlen("Proxy-Connection"))) {
Rio_readlineb(rp, tmp, MAXLINE);
continue;
}
sprintf(ptr, tmp); ptr += strlen(tmp);
Rio_readlineb(rp, tmp, MAXLINE);
}
sprintf(ptr, "\r\n");
}

void prase_url(char* uri, requesthdrs* header)
{
//example: GET http://www.cmu.edu/hub/index.html HTTP/1.1
char* ptr = strstr(uri, "//");
if(ptr == NULL) {
//example: GET /index.html HTTP/1.1 or GET / HTTP/1.1
char* idx = index(uri, '/');
strcat(header->filename, idx);
strcpy(header->port, "80");
return;
} else {
//GET http://www.cmu.edu/hub/index.html HTTP/1.1 or GET http://www.cmu.edu:80/hub/index.html HTTP/1.1
char* idx = index(ptr + 2, '/');
char* port = index(ptr + 2, ':');
if(port) {
int portNum;
//example: GET http://www.cmu.edu:80/hub/index.html HTTP/1.1
sscanf(port + 1, "%d%s", &portNum, header->filename);
sprintf(header->port, "%d", portNum);
*port = '\0';
} else {
//example: GET http://www.cmu.edu/hub/index.html HTTP/1.1
sscanf(idx, "%s", header->filename);
strcpy(header->port, "80");
*idx = '\0';
}
strcpy(header->hostname, ptr + 2);
}
}

void clienterror(int fd, char *cause, char *errnum,
char *shortmsg, char *longmsg)
{
char buf[MAXLINE];

/* Print the HTTP response headers */
sprintf(buf, "HTTP/1.0 %s %s\r\n", errnum, shortmsg);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "Content-type: text/html\r\n\r\n");
Rio_writen(fd, buf, strlen(buf));

/* Print the HTTP response body */
sprintf(buf, "<html><title>Tiny Error</title>");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<body bgcolor=""ffffff"">\r\n");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "%s: %s\r\n", errnum, shortmsg);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<p>%s: %s\r\n", longmsg, cause);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<hr><em>The Tiny Web server</em>\r\n");
Rio_writen(fd, buf, strlen(buf));
}

运行结果:

注:这里的测试主要是通过判断 tiny 直接返回的结果与通过 proxytiny 沟通返回的结果是否相同

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
*** Basic ***
Starting tiny on 1508
Starting proxy on 7697
1: home.html
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
2: csapp.c
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
3: tiny.c
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
4: godzilla.jpg
Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy
Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
5: tiny
Fetching ./tiny/tiny into ./.proxy using the proxy
Fetching ./tiny/tiny into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
Killing tiny and proxy
basicScore: 40/40

Part 2: Concurrency

简易实现

并发性这一部分非常简单,我们只需要新建一个 thread ,然后在 thread 中执行 doit 即可,书中有相关代码可以参考

需要改动的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//main
sem_t mutex;
int *connfd;
while(1) {
clientlen = sizeof(clientaddr);
connfd = Malloc(sizeof (int));
P(&mutex);
*connfd = Accept(listenfd, (SA*)&clientaddr, &clientlen);
V(&mutex);
Getnameinfo((SA*)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0);
printf("Accept Connection from (%s, %s)\n", hostname, port);
Pthread_create(&tid, NULL, thread, connfd);
}

//thread
void* thread(void* vargp)
{
int connfd = *(int*)vargp;
Pthread_detach(Pthread_self());
Free(vargp);
doit(connfd);
Close(connfd);
return NULL;
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
*** Concurrency ***
Starting tiny on port 20531
Starting proxy on port 13639
Starting the blocking NOP server on port 26938
Trying to fetch a file from the blocking nop-server
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Fetching ./tiny/home.html into ./.proxy using the proxy
Checking whether the proxy fetch succeeded
Success: Was able to fetch tiny/home.html from the proxy.
Killing tiny, proxy, and nop-server
concurrencyScore: 15/15

预线程化实现

如果我们另外实现 sbuf 相关的函数的话,需要对 Makefile 文件进行改动

预线程化的基本思想是,预先创建一系列的线程,然后服务器监听端口。当有客户端与服务器连接时,服务器会将已连接描述符加入到全局缓冲区当中

每个预先创建的线程则会等待全局缓冲区当中的内容,一旦缓冲区非空(存在已连接描述符),那么便会有线程取出缓冲区内的描述符,单独与该客户端进行连接

代码实现上非常的简单,书里面也有对应的代码,可以参考

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//main
sbuf_init(&sbuf, SBUF_SIZE);
for(int i = 0; i < NTHREAD; i ++) {
Pthread_create(&tid, NULL, thread, NULL);
}

while(1) {
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd, (SA*)&clientaddr, &clientlen);
Getnameinfo((SA*)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0);
printf("Accept Connection from (%s, %s)\n", hostname, port);
sbuf_insert(&sbuf, connfd);
}

//thread
void* thread(void* vargp)
{
Pthread_detach(Pthread_self());
while(1) {
int connfd = sbuf_remove(&sbuf);
doit(connfd);
Close(connfd);
return NULL;
}
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
*** Concurrency ***
Starting tiny on port 7553
Starting proxy on port 33623
Starting the blocking NOP server on port 30905
Trying to fetch a file from the blocking nop-server
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Fetching ./tiny/home.html into ./.proxy using the proxy
Checking whether the proxy fetch succeeded
Success: Was able to fetch tiny/home.html from the proxy.
Killing tiny, proxy, and nop-server
concurrencyScore: 15/15

Part 3: Cache

对于共享变量才需要加锁进行保护,对于非共享变量则可以不用加锁

我们需要设计一个全局的 cache,使得我们的 proxy 可以使用 cache 来加速

cache 中每一行的设计为:

1
2
3
4
5
6
7
typedef struct {
char buf[MAX_OBJECT_SIZE];
char url[MAXLINE];
int size;//cache block size
int valid;//1 or 0
int timestamp;
} cacheLine;

在每一行中,我们采取 url 的方式进行寻址;由于需要实现 LRU,因此我们用 timestamp 用于表示时间戳;valid 则表示当前行是否有效

整个 cache 的设计如下:

1
2
3
4
5
typedef struct {
cacheLine line[CACHELINE];
int readcnt, currentTime;
sem_t mutex, writer;
} cache_t;

我们需要使用 reader/writer 模型,因此需要变量 readcnt, mutex, writer ,分别用于:表示读者数量;提供互斥访问共享资源;表示写者数量

为了提高效率,我们需要采取读者优先的原则,这里在书中与完整的例子

除此之外,每一行中都有一个时间戳 timestamp 来表示当前 block 加入 cache 的时间,因此我们需要一个变量来表示当前时间,每次要对 cache 进行写入时,对该变量增加

cache 的初始化如下:

1
2
3
4
5
6
7
8
9
10
11
12
void cache_init()
{
cache.readcnt = 0;
cache.currentTime = 0;
Sem_init(&cache.mutex, 0, 1);
Sem_init(&cache.writer, 0, 1);
for(int i = 0; i < CACHELINE; i ++) {
cache.line[i].valid = 0;
cache.line[i].timestamp = 0;
cache.line[i].size = 0;
}
}

我们采取 url 在每一行中进行匹配,因此我们需要遍历 cache 中的所有行。给定一个 url,我们需要给出其对应行的下标 idx

1
2
3
4
5
6
7
8
9
10
11
//return idx in cache if success, -1 on error
int getCacheIndex(char* url)
{
int ret = -1;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid && !strcmp(cache.line[i].url, url)) {
ret = i;
}
}
return ret;
}

对于一个线程而言,在转发 HTTP 请求之前需要先判断当前 url 是否在 cache 中出现过,如果出现过则直接在 cache 中读取即可(也就是当前线程为读者)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int idx = getCacheIndex(uri);

if(idx != -1) {
P(&cache.mutex);
cache.readcnt++;
if(cache.readcnt == 1)
P(&cache.writer);
V(&cache.mutex);

Rio_writen(fd, cache.line[idx].buf, cache.line[idx].size);

P(&cache.mutex);
cache.readcnt--;
if(cache.readcnt == 0)
V(&cache.writer);
V(&cache.mutex);

printf("Cached\n");
return;
}

对于将数据写入 cache,我们只需要在转发服务器回送的数据时,将每次 proxy 接受到的数据都额外用数组记录下来,然后直接对 cache 赋值即可

这里的 buf 需要预先清空

1
2
3
4
5
6
7
8
9
10
int cacheSize = 0;
memset(buf, 0, sizeof buf);
while((n = Rio_readlineb(&server_rio, forwardBuf, MAXLINE)) != 0) {
fprintf(stdout, "proxy recived %ld bytes\n", n);
Rio_writen(fd, forwardBuf, n);
strcat(buf, forwardBuf);
cacheSize += n;
}

cacheWrite(buf, uriBackup, cacheSize);

在对 cache 进行写入操作时,如果当前的 buf 的大小超过 MAX_OBJECT_SIZE,那么则不缓存,然后便是 cache lab 的步骤了:

  • 首先检查 cache 中是否有空闲空间,如果存在则直接赋值,如果不存在则用 LRU 选择一个下标

在对 cache 中具体的行进行赋值时,这里便是写者,具体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void cacheWrite(char* buf, char* url, int size)
{
if(size > MAX_OBJECT_SIZE) return;
int idx = -1;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid == 0) {
idx = i;
break;
}
}
if(idx == -1) {
//LRU
int mxTime = 0;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid && cache.currentTime - cache.line[i].timestamp > mxTime) {
mxTime = cache.currentTime - cache.line[i].timestamp;
idx = i;
}
}
}
P(&cache.writer);
strcpy(cache.line[idx].buf, buf);
strcpy(cache.line[idx].url, url);
cache.line[idx].size = size;
cache.line[idx].timestamp = ++cache.currentTime;
cache.line[idx].valid = 1;
V(&cache.writer);
}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
*** Cache ***
Starting tiny on port 7126
Starting proxy on port 11498
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Killing tiny
Fetching a cached copy of ./tiny/home.html into ./.noproxy
Success: Was able to fetch tiny/home.html from the cache.
Killing proxy
cacheScore: 15/15

完整代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
#include <strings.h>
#include "csapp.h"
#include "sbuf.h"
//#include <stdio.h>

/* Recommended max cache and object sizes */
#define MAX_CACHE_SIZE 1049000
#define MAX_OBJECT_SIZE 102400

//the number of thread
#define NTHREAD 16
#define SBUF_SIZE 32

//cache line
#define CACHELINE 10

/* You won't lose style points for including this long line in your code */
static const char *user_agent_hdr = "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n";

typedef struct requestHeaders {
char hostname[MAXLINE];
char port[MAXLINE];
char filename[MAXLINE];
}requesthdrs;

void doit(int fd);
void prase_url(char* uri, requesthdrs* header);
void read_requesthdrs(rio_t* rp);
void clienterror(int fd, char *cause, char *errnum, char *shortmsg, char *longmsg);
void forwardMessage(char* buf, requesthdrs* headers, rio_t* rp);
void* thread(void* vargp);
void sighandler(int sig) { ; }

//golbal variable
sbuf_t sbuf;

typedef struct {
char buf[MAX_OBJECT_SIZE];
char url[MAXLINE];
int size;//cache block size
int valid;//1 or 0
int timestamp;
} cacheLine;

typedef struct {
cacheLine line[CACHELINE];
int readcnt, currentTime;
sem_t mutex, writer;
} cache_t;

cache_t cache;

void cache_init()
{
cache.readcnt = 0;
cache.currentTime = 0;
Sem_init(&cache.mutex, 0, 1);
Sem_init(&cache.writer, 0, 1);
for(int i = 0; i < CACHELINE; i ++) {
cache.line[i].valid = 0;
cache.line[i].timestamp = 0;
cache.line[i].size = 0;
}
}

//return idx in cache if success, -1 on error
int getCacheIndex(char* url)
{
int ret = -1;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid && !strcmp(cache.line[i].url, url)) {
ret = i;
}
}
return ret;
}

void cacheWrite(char* buf, char* url, int size)
{
if(size > MAX_OBJECT_SIZE) return;
int idx = -1;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid == 0) {
idx = i;
break;
}
}
if(idx == -1) {
//LRU
int mxTime = 0;
for(int i = 0; i < CACHELINE; i ++) {
if(cache.line[i].valid && cache.currentTime - cache.line[i].timestamp > mxTime) {
mxTime = cache.currentTime - cache.line[i].timestamp;
idx = i;
}
}
}
P(&cache.writer);
strcpy(cache.line[idx].buf, buf);
strcpy(cache.line[idx].url, url);
cache.line[idx].size = size;
cache.line[idx].timestamp = ++cache.currentTime;
cache.line[idx].valid = 1;
V(&cache.writer);
}

int main(int argc, char* argv [])
{
//printf("%s", user_agent_hdr);
if(argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(1);
}
int listenfd, connfd;
socklen_t clientlen;
struct sockaddr_storage clientaddr;
char hostname[MAXLINE], port[MAXLINE];
pthread_t tid;

Signal(SIGPIPE, sighandler);
cache_init();
listenfd = Open_listenfd(argv[1]);

sbuf_init(&sbuf, SBUF_SIZE);
for(int i = 0; i < NTHREAD; i ++) {
Pthread_create(&tid, NULL, thread, NULL);
}

while(1) {
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd, (SA*)&clientaddr, &clientlen);
Getnameinfo((SA*)&clientaddr, clientlen, hostname, MAXLINE, port, MAXLINE, 0);
printf("Accept Connection from (%s, %s)\n", hostname, port);
sbuf_insert(&sbuf, connfd);
}
return 0;
}

void* thread(void* vargp)
{
Pthread_detach(Pthread_self());
while(1) {
int connfd = sbuf_remove(&sbuf);
doit(connfd);
Close(connfd);
return NULL;
}
}

void doit(int fd)
{
char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];
char forwardBuf[MAXLINE], uriBackup[MAXLINE];
requesthdrs header;
int forwardfd;
rio_t client_rio, server_rio;
Rio_readinitb(&client_rio, fd);

Rio_readlineb(&client_rio, buf, MAXLINE);

//printf("recived header: %s\n", buf);

sscanf(buf, "%s %s %s", method, uri, version);
strcpy(uriBackup, uri);
//ignore the case of characters
if(strcasecmp(method, "GET")) {
clienterror(fd, method, "501", "Not implemented", "Proxy dose not implement this method\n");
fprintf(stderr, "%s: Proxy dose not implement this method\n", method);
return;
}

int idx = getCacheIndex(uri);


if(idx != -1) {
P(&cache.mutex);
cache.readcnt++;
if(cache.readcnt == 1)
P(&cache.writer);
V(&cache.mutex);

Rio_writen(fd, cache.line[idx].buf, cache.line[idx].size);

P(&cache.mutex);
cache.readcnt--;
if(cache.readcnt == 0)
V(&cache.writer);
V(&cache.mutex);

printf("Cached\n");
return;
}


prase_url(uri, &header);
forwardMessage(forwardBuf, &header, &client_rio);


// printf("-----------------------------------------\n");
// printf("%s\n", forwardBuf);
// printf("host: %s, port: %s, file: %s\n", header.hostname, header.port, header.filename);


forwardfd = Open_clientfd(header.hostname, header.port);
Rio_readinitb(&server_rio, forwardfd);
Rio_writen(forwardfd, forwardBuf, strlen(forwardBuf));

size_t n;
int cacheSize = 0;
memset(buf, 0, sizeof buf);
while((n = Rio_readlineb(&server_rio, forwardBuf, MAXLINE)) != 0) {
fprintf(stdout, "proxy recived %ld bytes\n", n);
Rio_writen(fd, forwardBuf, n);
strcat(buf, forwardBuf);
cacheSize += n;
}

cacheWrite(buf, uriBackup, cacheSize);

Close(forwardfd);
}

void read_requesthdrs(rio_t* rp)
{
char buf[MAXLINE];
Rio_readlineb(rp, buf, MAXLINE);
while(strcmp(buf, "\r\n")) {
Rio_readlineb(rp, buf, MAXLINE);
}
return;
}

void forwardMessage(char* buf, requesthdrs* headers, rio_t* rp)
{
char tmp[MAXLINE], getLine[MAXLINE], hostLine[MAXLINE];
char userAgentLine[MAXLINE], connectionLine[MAXLINE], proxyConnectionLine[MAXLINE];
sprintf(getLine, "GET %s HTTP/1.0\r\n", headers->filename);
sprintf(hostLine, "Host: %s\r\n", headers->hostname);
sprintf(userAgentLine, "User-Agent: %s", user_agent_hdr);
sprintf(connectionLine, "Connection: close\r\n");
sprintf(proxyConnectionLine, "Proxy-Connection: close\r\n");

char* ptr = buf;
sprintf(ptr, getLine); ptr += strlen(getLine);
sprintf(ptr, hostLine); ptr += strlen(hostLine);
sprintf(ptr, userAgentLine); ptr += strlen(userAgentLine);
sprintf(ptr, connectionLine); ptr += strlen(connectionLine);
sprintf(ptr, proxyConnectionLine); ptr += strlen(proxyConnectionLine);

Rio_readlineb(rp, tmp, MAXLINE);
while(strcmp(tmp, "\r\n")) {
if(!strncasecmp(tmp, "Host", strlen("Host")) || !strncasecmp(tmp, "User-Agent", strlen("User-Agent"))
|| !strncasecmp(tmp, "Connection", strlen("Connection"))
|| !strncasecmp(tmp, "Proxy-Connection", strlen("Proxy-Connection"))) {
Rio_readlineb(rp, tmp, MAXLINE);
continue;
}
sprintf(ptr, tmp); ptr += strlen(tmp);
Rio_readlineb(rp, tmp, MAXLINE);
}
sprintf(ptr, "\r\n");
}

void prase_url(char* uri, requesthdrs* header)
{
//example: GET http://www.cmu.edu/hub/index.html HTTP/1.1
char* ptr = strstr(uri, "//");
if(ptr == NULL) {
//example: GET /index.html HTTP/1.1 or GET / HTTP/1.1
char* idx = index(uri, '/');
strcat(header->filename, idx);
strcpy(header->port, "80");
return;
} else {
//GET http://www.cmu.edu/hub/index.html HTTP/1.1 or GET http://www.cmu.edu:80/hub/index.html HTTP/1.1
char* idx = index(ptr + 2, '/');
char* port = index(ptr + 2, ':');
if(port) {
int portNum;
//example: GET http://www.cmu.edu:80/hub/index.html HTTP/1.1
sscanf(port + 1, "%d%s", &portNum, header->filename);
sprintf(header->port, "%d", portNum);
*port = '\0';
} else {
//example: GET http://www.cmu.edu/hub/index.html HTTP/1.1
sscanf(idx, "%s", header->filename);
strcpy(header->port, "80");
*idx = '\0';
}
strcpy(header->hostname, ptr + 2);
}
}

void clienterror(int fd, char *cause, char *errnum,
char *shortmsg, char *longmsg)
{
char buf[MAXLINE];

/* Print the HTTP response headers */
sprintf(buf, "HTTP/1.0 %s %s\r\n", errnum, shortmsg);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "Content-type: text/html\r\n\r\n");
Rio_writen(fd, buf, strlen(buf));

/* Print the HTTP response body */
sprintf(buf, "<html><title>Tiny Error</title>");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<body bgcolor=""ffffff"">\r\n");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "%s: %s\r\n", errnum, shortmsg);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<p>%s: %s\r\n", longmsg, cause);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "<hr><em>The Tiny Web server</em>\r\n");
Rio_writen(fd, buf, strlen(buf));
}

运行结果

整体代码运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
$ ./driver.sh 
*** Basic ***
Starting tiny on 29159
Starting proxy on 4721
1: home.html
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
2: csapp.c
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
3: tiny.c
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/tiny.c into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
4: godzilla.jpg
Fetching ./tiny/godzilla.jpg into ./.proxy using the proxy
Fetching ./tiny/godzilla.jpg into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
5: tiny
Fetching ./tiny/tiny into ./.proxy using the proxy
Fetching ./tiny/tiny into ./.noproxy directly from Tiny
Comparing the two files
Success: Files are identical.
Killing tiny and proxy
basicScore: 40/40

*** Concurrency ***
Starting tiny on port 4453
Starting proxy on port 15584
Starting the blocking NOP server on port 1357
Trying to fetch a file from the blocking nop-server
Fetching ./tiny/home.html into ./.noproxy directly from Tiny
Fetching ./tiny/home.html into ./.proxy using the proxy
Checking whether the proxy fetch succeeded
Success: Was able to fetch tiny/home.html from the proxy.
Killing tiny, proxy, and nop-server
concurrencyScore: 15/15

*** Cache ***
Starting tiny on port 7126
Starting proxy on port 11498
Fetching ./tiny/tiny.c into ./.proxy using the proxy
Fetching ./tiny/home.html into ./.proxy using the proxy
Fetching ./tiny/csapp.c into ./.proxy using the proxy
Killing tiny
Fetching a cached copy of ./tiny/home.html into ./.noproxy
Success: Was able to fetch tiny/home.html from the cache.
Killing proxy
cacheScore: 15/15

totalScore: 70/70

CSAPP_lab ProxyLab
https://nishikichisato.github.io/2023/08/21/CSAPP_LAB/ProxyLab/
Author
Nishiki Chisato
Posted on
August 21, 2023
Updated on
August 21, 2023
Licensed under