MIT 6.S081 Lab1 util：Unix utilities | UNIX 实用程序

前置准备

主要内容为如何使用已有的系统调用编写用户程序, 例如ls、xargs、find等常用命令的实现。

根据课程官网的要求，需要看完Lecture 1,阅读完教材第一章$Operation \space System \space interfaces$。

此外，实验手册中的Hints很重要，大多数实验跟着提示一步一步实现都能完成。

Sleep (easy)

Statement

为xv6实现 $UNIX$ 中的sleep程序。sleep程序应该暂定用户指定的tick数量。tick是xv6内核定义的时间概念，即定时器两次中断之间的间隔时间。你的solution应该存储在user/sleep.c文件下。

Hints

在开始实现前，确保你阅读了xv6 book的第一章。
查看其他xv6中实现的其他用户程序(例如user/echo.c、user/grep.c、user/rm.c)，了解如何从终端传递参数到程序中。
如果用户忘记传递参数，sleep应该显示错误消息并中断。
终端参数作为字符串传递，你需要用atoi(在user/ulib.c中实现)将其转换为int类型。
使用sleep系统调用。
有关实现sleep系统调用（见sys_sleep）的 xv6 内核代码，请参见 kernel/sysproc.c。有关可从用户程序调用sleep系统调用的 C 语言定义，请参见user/user.h；有关从用户代码跳转到内核中sleep的汇编代码，请参见 user/usys.S。
确保main函数中使用exit()来退出程序。
将sleep.c添加到Makefile的 UPROGS 中；完成后，在终端输入 make qemu 来编译你的程序，你就可以在 xv6 shell 中运行它了。

Analysis & Solution

首先按照Hint-1的提示看一眼user/echo.c是什么，可以看到下面的一段代码：

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main(int argc, char* argv[]) {
    int i;

    for (i = 1; i < argc; i++) {
        write(1, argv[i], strlen(argv[i]));
        if (i + 1 < argc) {
            write(1, " ", 1);
        } else {
            write(1, "\n", 1);
        }
    }
    exit(0);
}

argc存储终端传入的参数数量，argv存储传入的参数的具体内容。

于是可以写出下面的代码：

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main(int argc, char* argv[]) {
  if (argc != 2) {
    fprintf(2, "Usage: sleep <ticks>\n");
    exit(1);
  }
  int time = atoi(argv[1]);
  sleep(time);

  exit(0);
}

然后在Makefile中的UPROGS添加一行$U/_sleep\，

运行qemu make，编译完成后输入sleep 10, 可以看到成功运行。

运行./grade-lab-util sleep测试一下。

PingPong (easy)

Statement

编写一个程序，使用 UNIX 系统调用在两个进程之间通过一对管道 “ping-pong” 传送一个字节，每个方向发送一次。父进程应向子进程发送一个字节，子进程应打印<pid>: received ping（<pid>是其进程 ID），并将一个字节写入父进程的管道，然后退出；父进程应从子进程读取该字节，打印<pid>: received pong，然后退出。您的解决方案应放在 user/pingpong.c 文件中。

Hints

使用pipe创建管道
使用fork创建子进程
使用read来向管道中的读取端读取数据，write来向管道中的写入端写入数据。
使用getpid来获取当前进程的PID。
添加程序到Makefile里的UPROGS
xv6的用户程序可以使用有限的库函数，列出在user/user.h中。源代码(包括其他的系统调用)在user/ulib.c、user/printf.c、user/umalloc.c。

Analysis & Solution

和书上1.3的例子差不多，用大小为2的数组创建管道，其中p[0]是读取端，p[1]是写入端，然后fork分别发送读取。

write和read的第二个参数是一个地址，用char存储的话需要取地址，用char[]就没这么多事了。

需要注意的是，子进程和父进程的文件描述符是一样的，但并不是引用，在父进程关闭fd不会影响子进程。

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int main(int argc, char* argv[]) {
    int p[2];
    pipe(p);

    char msg[2] = "@";
    char buf[2];

    if (fork() != 0) {
        if (write(p[1], msg, 1) != 1) {
            fprintf(2, "cannot write a byte to child.\n");
            exit(1);
        }
        close(p[1]);

        if (read(p[0], buf, sizeof(buf))
        close(p[0]);
        fprintf(1, "%d: received pong\n", getpid());
    } else {
        if (read(p[0], buf, sizeof(buf)) != 1) {
            fprintf(2, "cannot read a byte from parent.\n");
            exit(1);
        }
        close(p[0]);
        fprintf(1, "%d: received ping\n", getpid());

        if (write(p[1], msg, 1) != 1) {
            fprintf(2, "cannot write a byte to parent.\n");
            exit(1);
        }
        close(p[1]);
    }
e

运行结果：

Primes (moderate / hard)

Statement

通过pipe和fork编写一个并发版本的素数筛。这个想法来自与$Doug \space Mcllroy$，Unix管道的发明人。本页下半部分的图片和周围的文字说明了如何操作。你的解决方案应该存放在user/primes.c文件下。

第一个进程将$2$到$35$输入管道。对于每个质数，你都需要创建一个进程，该进程从左边的邻居管道中读取数据，并通过另一个管道向右边的邻居写入数据。由于xv6的文件描述符是有限的，我们只需要筛出35以内的质数。

Hints

及时关闭进程不需要的文件描述符，不然程序会用完有限的文件描述符。
一旦第一个进程发送了35，它应该直到所有的管道都关闭才退出，包括所有子进程、孙进程等。
提示：当写入端的管道被关闭时，read返回$0$。
最简单的方式是直接向管道写入32位的int数据，而不是用ASCII码来处理。
确保只在需要创建管道的时候创建管道。
添加程序到UPROGS。

Analysis & Solution

其实就是通过管道来实现埃氏筛(wiki打不开点这个)。

简单解释一下。第一个进程传入$2 \rightarrow 35$给第二个进程，第二个进程输出$2$, 之后把所有没有因子$2$的数都传递给下一个进程。同样，第三个进程输出$3$，然后将所有没有因子$3$的数都传给第$4$个进程…。当管道中没有数的时候，说明所有质数都被筛出来了。整个过程就是参考资料中给的这张图。

很显然这是一个递归，还是一个线性的递归，读入完关闭管道就可以了。

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

void primes(int left[2], int base) {
    fprintf(1, "prime %d\n", base);

    int right[2];
    pipe(right);

    // 将剔除后的数写入缓冲区
    int cur, first = -1;
    while (read(left[0], &cur, sizeof(cur)) == sizeof(cur)) {
        if (cur % base == 0) continue;
        if (first == -1) first = cur;
        if (write(right[1], &cur, sizeof(cur)) != sizeof(cur)) {
            fprintf(2, "pid %d: cannot wirite %d bytes to right.\n", getpid(), sizeof(cur));
            exit(1);
        }
    }
    close(left[0]);

    if (first == -1) {
        exit(0);
    }

    if (fork() != 0) {
        close(right[1]);
        close(right[0]);
        // 第i个子进程等待第i+1个子进程结束
        wait((int*)0);
    } else {
        close(right[1]);
        primes(right, first);
    }

    return ;
}

int main(int argc, char* argv[]) {
    int p[2];
    pipe(p);

    // 主进程 所有数都写入管道
    for (int i = 2; i <= 35; i++) {
        if (write(p[1], &i, sizeof(int)) != sizeof(int)) {
            fprintf(2, "cannot write %d bytes.\n", sizeof(int));
        }
    }
    
    if (fork() != 0) {
        close(p[1]);
        close(p[0]);
        // 主进程等待第一个子进程结束
        wait((int*)0);
    } else {
        close(p[1]);
        primes(p, 2);
    }

    exit(0);
}

运行结果：

find (moderate)

Statement

写一个简单版本的UNIX find程序：在文件树中找到有特定文件名的文件。你的解决方案应该放在user/find.c下。

Hints

查看user/ls.c学习如何读取目录。
通过递归来允许查找子目录。
不要递归进'.'和'..'中。
对文件系统的更改会在运行qemu是持续存在，要恢复感觉的文件系统，请运行make clean，然后再运行make qemu。
你将会用到C strings。请参阅 $K & R$第5.5节。
注意不能像 Python 那样使用 == 比较字符串。请使用 strcmp() 代替。
将程序添加到 Makefile 的 UPROGS 中。

Analysis & Solution

提示中提到了user/ls.c文件，我们先看一下ls命令的源码。

一开始是一个fmtname的函数，传入一个字符串，然后限制字符串长度为DIRSIZ。

之后就是ls函数的主体。先看一下dirent和stat两个结构体是什么。

在kernel/fs.h的末尾有这么两段代码

// Directory is a file containing a sequence of dirent structures.
#define DIRSIZ 14

struct dirent {
    ushort inum;
    char name[DIRSIZ];
};

注释中明确说了"目录是一个包括一系列dirent结构体的文件’"。并且name字段的大小是DIRSIZ，这也解释了为什么fmtname要限制字符串长度。

在kernel/stat.h中可以看到下面这段代码

#define T_DIR 1      // Directory
#define T_FILE 2     // File
#define T_DEVICE 3   // Device

struct stat {
    int dev;       // File system's disk device
    uint ino;      // Inode number
    short type;    // Type of file
    short nlink;   // Number of links to file
    uint64 size;   // Size of file in bytes
};

结合user/ls.c中两个if中的处理。

if ((fd = open(path, 0)) < 0) {
        fprintf(2, "ls: cannot open %s\n", path);
        return;
    }

    if (fstat(fd, &st) < 0) {
        fprintf(2, "ls: cannot stat %s\n", path);
        close(fd);
        return;
    }

至此文件系统就能看出个大概了。在XV6中，一切都是以文件的形式来存储的，无论是目录、文本、设备等等所有东西，都被内核视为文件，只不过按不同的格式读取，或者以stat等结构体来标明他是什么。

正因为一切都是文件，所以我们可以用open系统调用来打开任意对象(例如上面的open(path， 0))，并返回一个文件描述符，通过这个文件描述符，我们可以按照不同的结构来读出对象里的内容(可以从path中读出dirent, 即目录中的文件名、路径名)。

对于每个读出的dirent对象，可以用fstat系统调用来获取他的信息，存储在一个stat类型的结构体中。例如，stat.type中指定了当前dirent是文件、目录、还是设备。

于是这道题的思路就很清晰了，不停从用户指定的路径中读出dirent, 如果当前dirent是文件，则判断dirent.name是否和要查找的文件相同。如果当前dirent是目录，则递归遍历这个子目录。

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
#include "kernel/fs.h"

void find(char* path, char* filename) {
    int fd;
    if ((fd = open(path, 0)) < 0) {
        fprintf(2, "find: cannot open %s\n", path);
        return;
    }

    struct stat st;
    if (fstat(fd, &st) < 0) {
        fprintf(2, "find: cannot stat %s\n", path);
        close(fd);
        return;
    }

    if (st.type != T_DIR) {
        fprintf(2, "find: %s should be a dir, but found a file.\n", path);
        exit(1);
    }

    char buf[512];
    
    // strlen(path) + 1 表示path的长度 + '\0'
    // DIRSIZ 是文件名最大长度, 见kernel/fs.h 54行
    // 最后还要在末位加上`/`，e.g. code/user -> code/user/
    if (strlen(path) + 1 + DIRSIZ + 1 > sizeof buf) {
        fprintf(2, "find: path too long.\n");
        exit(1);
    }

    strcpy(buf, path);
    char *p = buf + strlen(buf);
    *p ++ = '/';
    // 运算优先级: *p ---> p ++. 即p指向'/'的后一个位置

    struct dirent de;
    while (read(fd, &de, sizeof(de)) == sizeof(de)) {
        if (de.inum == 0) continue;
        // 将de.name复制到p后面,最大长度为DIRSIZ
        memmove(p, de.name, DIRSIZ);
        p[DIRSIZ] = 0;
        if (stat(buf, &st) < 0) {
            fprintf(2, "find: cannot stat %s\n", buf);
            continue;
        }
        switch (st.type) {
        case T_FILE:
            if (strcmp(de.name, filename) == 0) fprintf(1, "%s\n", buf);
            break;
        case T_DIR:
            if (strcmp(de.name, ".") == 0) continue;
            if (strcmp(de.name, "..") == 0) continue;
            find(buf, filename);
        }
    }
}

int main(int argc, char* argv[]) {
    if (argc < 2) {
        fprintf(2, "Usage: find [path] <filename>\n");
        exit(1);
    }

    if (argc == 2) {
        find(".", argv[1]);
    } else {
        find(argv[1], argv[2]);
    }

    exit(0);
}

运行结果：

xargs (moderate)

Statement

编写一个简单版本的 UNIX xargs 程序：从标准输入中读取行，并为每一行运行一个命令，同时将该行作为参数提供给命令。你的解决方案应放在 user/xargs.c 文件中。

下面的示例说明了 xarg 的行为：

1
2
3

$ echo hello too | xargs echo bye
bye hello too
$

请注意，这里的命令是 “echo bye”，而附加参数是 “hello too”，因此命令是 “echo bye hello too”，输出结果是 "bye hello too”

请注意，UNIX 上的 xargs 会进行优化，一次向命令提供多个参数。我们不希望你进行这种优化。要使 UNIX 上的 xargs 按我们希望的方式运行，请在运行时将 -n 选项设为 1。例如

$ echo "1\n2" | xargs -n 1 echo line
line 1
line 2
$

Hints

使用 fork 和 exec 在每一行输入中调用命令。在父进程中使用 wait 等待子进程完成命
令。
要读取单行输入内容，每次读取一个字符(read)，直到出现换行符（‘\n’）。
kernel/param.h 声明了 MAXARG，这在需要声明 argv 数组时可能有用。
将程序添加到 Makefile 的 UPROGS 中。
对文件系统的更改会在运行 qemu 时持续存在；要获得一个干净的文件系统，请运行 make clean，然后再运行 make qemu。

Analysis & Solution

xargs的功能是给定一个命令，例如上面的echo line,然后从 标准输入流 中读出许多行，把每一行的输入作为参数附加到给定的命令中。

比如

1
2
3

$  find . b | xargs echo hello
hello ./b
hello ./a/b

对每一行读入都fork然后exec，模拟这个过程就行(指针指晕了)。

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
#include "kernel/param.h"

int getline(char* buf, const int max) {
    int cnt = 0;
    while (read(0, buf, sizeof(char)) > 0) {
        if (++cnt > max) {
            fprintf(2, "xargs: input lines are too long.\n");
            exit(1);
        }
        if (*buf++ == '\n') break;
    }
    *(buf - 1) = 0;
    return cnt;
}

int parse(char** args, char* buf, const int max) {
    int cnt = 0;
    while (*buf) {
        if (cnt >= max) {
            fprintf(2, "xargs: args are too many.\n");
            exit(1);
        }
        while (*buf && *buf == ' ') buf ++;
        if (*buf == 0) break;
        args[cnt ++] = buf;
        while (*buf && *buf != ' ') buf ++;
        if (*buf) *buf++ = '\0';
    }
    return cnt;
}

int main(int argc, char* argv[]) {
    if (argc < 2) {
        fprintf(2, "Usage: xargs <command> [arguments]\n");
        exit(1);
    }

    char* args[MAXARG] = {0};
    for (int i = 1; i < argc; i++) {
        args[i - 1] = argv[i];
    }

    char buf[512] = {0};
    while (getline(buf, sizeof buf)) {
        parse(args + argc - 1, buf, MAXARG - argc + 1);
        int pid = fork();
        if (pid > 0) {
            wait((int*)0);
        } else if (pid == 0) {
            char cmd[128];
            strcpy(cmd, args[0]);
            exec(cmd, args);
        } else {
            fprintf(2, "xargs: fork error.\n");
            exit(1);
        }
    }

    exit(0);
}