godo - Audit docker runtime info using go

	Commit message (Collapse)	Author	Files	Lines
2024-09-02	Show filt result in tree&json, fix sth in listenerHEAD master dev	We-unite	1	-0/+102
	In the listener, I change the order coroutines are started to avoid 'send on a closed channel'. Besides, the method to get syscall names and numbers are not so universial, so let's go back to check unistd.h. In the filter, the output is set to be written to ./log dir. Pid tree are shown in logs/tree.log, and detail info in pids.log, while file info in the logs/files.log. tree.log shows a tree just like `tree` command, the other two files are written in json. What's more, the flags while opening files are also checked ans showed in files.log.
2024-08-22	Write documents of the program.	We-unite	11	-57/+1127
	Add README.md on the design of the whole program, and how its every part(listener, filter) works, finally how to compile and use them. Besides, notes.md records the things and technology learned in this program, such as how to read kernel src, how the pthread_create/fork/ clone syscall works on processes and threads, the techs used to make docker container works well, and books to be read. Good good study, day day up.
2024-08-16	Solve the connector problem.solved	We-unite	1	-0/+0
	The reason why connector cant be used by listerer immediately is that listener needs a register to connector so that connector pulled up. Now it's fixed in the netlink submodule.
2024-08-15	Fix rootfs by cgroup, clean file name, etc.	We-unite	5	-56/+153
	1. about root fs The setns is used by a process(for example, process 12345), to enter a namespace of another process(also, process 12000). Process 12345 opens visual file /proc/12000/ns/xxx, gets a fd, then setns(fd, nstype). Here xxx represents for special type of namespace such as mnt/ipc. Param nstype can be found out in manual. In short, switching namespace uses not fileName but file descriptor, which makes it too hard to listen to setns, because the fd info may have been lost on the road, or it's still on road, not in db. This would make significant error! So, in this commit, I check /proc/pid/cgroup. Although it has nothing to do with root filesystem, it contains docker id. Record it, and deal with it in the filter: For each process that has pivot_root, it records its docker id, we remember the map from docker id to rootfs; then check all processes on the tree, if it has docker id, add the corresponding rootfs. 2. Exit time of pids to be zero Besides, I fix the exit time of pid in this commit. After merging the same processes, sort them in ascending order, so that in each tgid node, the main pid is always the first thread. Then, check other pids' exit time, if is zero, assumpt that exit time is the same as main pid, which means the process exit while the thread is running. 3. Wrong parent I fix the ppid of threads. For example, process 10 has a child process 20, 20 has threads 20 and 23. When pid 20 is recvd, the ppid and parentTgid in message must be 10. But then, 10 exits, and the parent process of 20 comes to be 1, then 20 makes thread 23. When pid 23 is recvd, the ppid and parentTgid is 1, that's totally wrong! Also, using the sorted process array, we can easily find the main thread, so get the real parent, and check the ppid of other threads. 4. Clean file name The original file name in database may be complex(such as "/1/2/./../3"). Clean it with go pkg "path" 5. Next step TODO: Fix the netlink connector, may it usable immediately after powering on. Then the view.
2024-08-14	Filter mainly finished, fix sth in lintener	We-unite	3	-102/+265
	This commit I make some changes: - The filter got mainly finished. - Build a big node by the same tgid, and use the tgid node to build th tree we need by bfs. - Filt relative files, and for the files not closed, add close time stamp according to the exit time of their pids. - Put all the results into database. Besides, I enlarge the buffer size of netlink connector and channels in lintener. TODO: - the pivot_root syscall is used only by the initial shell(`docker start` makes a shell), other processes of shell change their root by changing namespace(mnt ns?), using setns syscall. So fix it. - It's time to fix the netlink connector socket.
2024-08-13	Filtering process data from mongodb	We-unite	3	-98/+308
	First of all, fix sth in listener to fit the function of filter. Ori- ginally, listener mark the /usr/bin/containerd process id with star, but the children in db is updated by ppid, which is pid of parent but not tgid, so the stared pid has no children. To Fix this, we add all the pid of /usr/bin/containerd into the db, and set their ptgid/tgid, so that they're just normal process as others. Maybe we should finish the info of these processes? haha. Then, the filter of pid. There're some designed steps to do, and their methods are as follows: - Initially, because of the multithreading execution of listener, there may be several entries for the same process, and we should merge them. Extract data from database into a slice, and use a map to record process info. Iterate the slice, if the pid is in the map, then merge them, else insert into the map. - Then, we should build process tree, but what we have is pid. So use another data structure, iterate merged process map, and build a map from tgid to a slice of processes. Find out the star. Build a map from pid to its tgid. - BFS. Design a simple queue, and build the tree from the root(stared tgid), record all the visited tgid in another map. That's just the tree. As usual, let's talk about the remaining issues: - Some pids did not recieve exit message. Check the exit time of its tgid, or even its ppid. - Optimize the data structure, record the tree by itself. Now the tree is recorded by not only the last helloTree map from tgid to slice but the map from pid to tgid. It's hard to store in the database. Design a better ds, so the viewer can build the tree quickly from the data in db. - For future file filter, the close time, the same file for the same pid, and the pathName of a file, should be paid mych attention. Fighting!
2024-08-12	Add db structure, fix filePath, start filtering	We-unite	19	-175/+437
	This commit I made several changes: - Use structure instead of simple bson.M(interface{}). bson.M has some shortcomings: 1) It makes the database in chaos and hard to read, but this's not important; 2) Some entrys may has more or less content than others, which makes it hard to decode and filt. So I design new data structure to encode and decode. Hopes that there's no bugs. - Fix the way to calculate file path. The original method is to add all the PATH entries together, that's totally wrong! PATH entry has several types, as it shows in "objtype". I can't find it in the kernel src code, so what i know is just "PARENT" means the dir the file is in, while the filename itself has the path, so we whould ignore all "PARENT"s. When the src code is found, we should check it again. - Fix bugs in updating. The update function of mongodb is set to required to has a '$' such as 'set'/'push', so when we update a whole doc, we should use replace but not update function. And, we should never ignore the error infomation it gives us. Hope that there's no more bugs for this Big Change. Now its' time to write filter as well as viewer. Best wishes with NO BUGS!
2024-08-07	Print err in stderr, Find out docker rootfs.collector	We-unite	4	-59/+103
	When I use godo, error infomation comes along with other output, so change all err report into stderr. And I listen to `pivot_root` sys- call to find out the root file system of dockers. However, I'm afraid of causing too more delay, so don't check rootfs of ppid and record in the pid. Besides, the method to deal with pivot_root is hardcoded, which may causes crush. Shall I listen to the chdir syscall to find out exact cwd? Maybe It's useful to the pivot_root? Next step: Find out appropriate data stracture, and add more file operations to be watched. This task must be completed this week.
2024-08-06	Basically fixed info lost	We-unite	3	-36/+1
	As previous envisioned, the lost is caused by slow consumption. So I make several changes: - Delete all the mutexs, especially those in the mongodb. There seems to have no necessity to use thread mutex, because execve, fork and exit has no conflicton(really?) - Insert all pid info into db, just ws what we do to file infos. So we should filter out useful info from them, but this does works to decrease lost infos. Besides, the problem that recvfrom is always blocked may got solved. When the machine is just started, it's blocked; but after i run the C program to connect to ketlink connector and listen to it, both C program and godo can recv infos well. Also, left questions: - Now i use many coroutine in 2nd and 3rd coroutines in the hope that there will be less time to deal info and hear the kernel again as quick as possible. But does it work? As we know, too much thread will slower the program, because too frequent switch between threads or processes. - Sometimes the eventTable has bugs, when eoe is recvd, the pointer in it is null. It may be out of thread confliction? But it's unreasonable, there's only one place to delete the event, that's just in eoe, after it's sent. Or the eoe info is got more than once? - For some processes, when i look into /proc to find cwd and cmeline, it has exited. If i go back to use audit for pid info, it will be hard to distinguish between thread and process. Anyway, It doesn't matter now, but what if? Next step: Figure out the root fs of a docker, and its name.
2024-08-06	Expand connector buffer, put all file info into db	We-unite	5	-33/+45
	This commit I make several changes, reasons are as follows: - Expand netlink connector buffer size. Originally it uses only one page of mem, but sometimes it causes "no enough buffer" errno, or the socket is blocked long time. Now it's 20 pages. - All file infos are thrown into database. As the last commit co- mment, There's 2 tables, "fds" and "files". When a file discriptor is closed, the info in fds will be found, delete, and put into the "file" table with its close time. Left questions: - The netlink connector is always found blocked without any reasons. Fix it, or replace the golang-coded connector with C program? The key is why it's blocked. Maybe it's in the kernel src code. - sometimes audit still losts info(not too much). For instance, I use vim in the docker to change hello.c, the hello.c may be opened but no close info recvd. Or, the swap file of vim, such as .hello.c.swp or .hello.c.swx is not closed. What's more, the hello.c is never written, but swap files are. May vim write to swap files, and replace the origin file? Let's check it. - Besides, when a pid exits, we should check its file discriptors and close them all.
2024-08-05	Try t use coroutine, but starvationstarvation	We-unite	5	-85/+57

2024-08-05	The fds problem may comes from slow consumption	We-unite	3	-16/+9
	There are some possible reasons that have been thought: - auditd lost. Each time I use `auditctl -b xxx` or `auditctl --reset-lost`, there are always a big number of losts. at first i thought it means how many auditd info was lost throw the net, or how many was thrown because of the audit info queue in the kernel was full. However, form the src code of kernel, it actually means how much is thrown away as there's no listener of auditd info. In other words, audit is a userspace-kernel function, but not two independent parts. - audit backlog size. As the above. But when i only listen to the syscall "open", i can almost always hear the info in the docker. So I think this may be because the audit info production is flooding, while in this program i check this and that, causes too much time, the consumption is far slower. Next step, I will use the MVC, all recvd info will be push into the database, and add a new independent part to make database clean and clear. The key problem is, a process can open file1 as fd 3, write, close, and open file2 as fd 3, write, close: which means i must figure out which file to write when "write" event comes. Now i check the pid/fd/close_time in database to choose which is written, but find and check doc also use lots of time. Maybe, use two collections, one is fds that records files not closed, the other records closed files? Besides, as clone/fork/pthread_create all uses syscall clone, but their flags are different. Maybe i can also use `pid/tgid` pair to distinguish between process and thread. Good idea. Be quick, your internship has passed a half. What kinds of answer will you hand in?
2024-08-02	Open is well, but we're back to original question.	We-unite	3	-37/+9
	It's the check(cooked Event) function that causes fileopen crushed, and now I'am sorry to say that i've forgot why i add this function, maybe to check ppid and pid in database in just one function but not the head of each function. However, the check in each function isn't deleted. I discover it by comparing source code with 5d244e3. In theory this would only result in the increase of delay. How does it affect on the fileopen and causes failure? No one knows. The same to kernel connector. If we still add delay while pid exits, the connector will say "Error recv: no enough buffer space", but if we delete the delay, all modules work well. What actually makes the delay in pid exit causes no enouth buffer of connector? How outra- geous it is! Now I've come back to the original question: when i start and use docker quickly(`start && exec && exit` in just one command), the file open/write/close is faithfully recorded; but if i use interactive shell and use vim to change file in docker, nothing happens. Why? Why? Why?
2024-08-01	Use netlink connector to recv pid info, fix exec	We-unite	9	-49/+117
	For some reasons, kernel-connector can catch exec event, but it doesn't tell me about what the process exec and what're its args. So we should use audit to collect these infomations, and complete in the database. However, there's different delays between connector and audit, although they both use netlink socket, as a result of which, exec may comes before fork. we deal with it the same way. But, there's also exec event lost, may because of the check for ppid in exec event, but it's necessary, and if is deleted, too much irrelavent infomation would flood into database, i've tried. So make it there, just go forward. Besides, what's newly discovered is that pthread_create also use clone syscall, but if pid 1 has a thread 2, the exec info will say that pid 2 execs. So i shouldn't ignore connector msg that childPid ne childTgid. This is my first attempt to use git-submodule function in my own pro- ject, also golang local package. Congratulations! Now, fight to fix about file operations. Hope that there wouldn't be too many fucking bugs.
2024-07-30	Try to use kernel connector	We-unite	6	-19/+1417

2024-07-29	bug of fds unfixed	We-unite	2	-3/+3

2024-07-29	Add write	We-unite	2	-0/+36

2024-07-29	Hear file Open and close, especially O_TRUNC	We-unite	5	-51/+133
	this commit i successfully catch open/close syscall, and insert them as an independent collection in mongodb otherwise along with pids. and now I've record those open flag "O_TRUNC" as written.
2024-07-26	Fix execve before fork & Fix regex to match "exit"	We-unite	4	-122/+156
	There's 2 bugs from ancestor commits: - In the 'things_left' tag commit(the grandpa of this commit), we add a function that allows execve comes before fork, but when it happens, I forget to insert the basic info (pid, ppid, etc.), as a result of which it doesn't work in the designed way. Now it is well, insert execve with pid and ppid, so that the fork event can find it and finish other info. However, we shouldn't make start_stamp in this case, so that it's also a flag. I've not removed the unused execve info, waiting for the future. - In the parent commit, the syscallRegex is changed, because when we add more syscalls to be watched, we need more info about their params but not only the first one. Instead of keeping using single a0 to get the first param, i use argsRegex for all the params. But this change causes mismatch of syscallRegex. Now it's fixed.
2024-07-26	The 1st prompt to record file changed by process	We-unite	5	-155/+286
	To record it, we must listen to open/write and several syscalls, and now I've add open into the 2nd coroutine. In syscall open, what we should do is to judge the permission flag (the 2nd param in the syscall), to find out if it can write to the file. If so, the exit code is its file descriptor, and when write is called, the audit shows only file descriptor but no file name. So the next step is to add things into 3rd coroutine, to make the whole program running again, and find out bugs.
2024-07-25	Try ot fix the out-of-order bug, add EXECVE to itthings_left	We-unite	6	-125/+266
	The Most important work during this time is to find out solution to the out-of-order bug. Discribe it here in detail: info from audit may be out of order, which means fork may comes after execve, even after exit. What an absurd penomenon to see a process not yet created to work or exit! To deal with this problem, I've tried several ways: - in the 2nd coroutine, when EOE msg comes, if it's a fork/clone event, send it immediately, otherwise wait for some time(such as 100 ms). But after all it delays longer, and has other problems. - the 2nd coroutine doesn't send directly, but record all the finished event id in a slice, and another thread checks once every one second, if there are sth in slice, send corresponding events in the order of event id. But: event that happens first doesn't always has lower id or time, for example, 1 forks 2, then 2 execve, the audit in kernel it self may gets execve before fork(maybe fork makes other settings), which means execve has earlier timestamp and lower event id. The out- of-order problem is not completely resolved. If we then add delays to non-clone event, a more serious problem happens: we must use mutex to lock the slice recording finished event id to prevent crush between send thread and wait thread, but the wait thread can't get the mutex again, because there are to much clone event and frequent send! - So I use no delay but mongodb, when an execve comes, if pid is not recorded, just insert it and wait for the fork. It does works, but some other works is still left to do: - what should i do if 2 forks 3 comes before 1 forks 2? Now I suggest it doesn't happen, but what if? - when execve comes before fork, i recorded it, but if this process has a parent i don't care, delete, or stays there? Also, as mentioned above, I've add EXECVE field in process into db, records all the execve(time, and args) from the same process. Besides, exit_timestamp and exit_code can be caught now, but too many process has no exit info. This is also to be fixed. Now, let's listen to the file changed by process. Don't forget the to-do works listed above!
2024-07-22	Use mongodb, insert process info into it	We-unite	7	-111/+216
	I failed to print the process tree out. While I'm printing the tree, the tree itself gets changed, maybe deleted. What's more, the output show that there are 4 lines with the same ppid and pid, how an absurd result! It may be caused by multi-thread. So, use database instead. Mongodb uses bson(binary json) to store data but not relational database like mysql, which means it's more easy to use.(?) Beside inserting, I've also solved a question that "fork" is called once but returns twice. For instance, pid 1 forked pid 2, in the audit log it's not an event "syscall=clone,ppid=1,pid=2", but actually two events "syscall=clone,exit=0,ppid=0,pid=1" and "syscall=clone,exit= 2,ppid=0,pid=1", which is just what we see in sys_fork in kernel source. To deal with this, when syscall is clone and exit is 0 we just drop it. Left question: To find out the exit code when a process exit/exit_group, and finish the code to record it in the database.
2024-07-19	Depart the whole program into several files.	We-unite	11	-535/+580
	Put all the src code in only one file is to ugly, so devide it! and mv them into src dir to keep the whole repo clear.
2024-07-19	Fix some bugs, and 3nd coroutine can get pid tree.base	We-unite	2	-51/+113
	For some reasons, the linux kernel has made some changes in syscalls. As shown in src code, we pay attention to fork/vfork/clone to create process, while exit/exit_group to kill it. From my opinion, the fork and clone syscall should be totally different, otherwise there will be only one syscall. However, according to the logs, I heard only clone but no fork, exit_group but no exit. Infact, fork calls clone and then makes some special set. They're different, fork means parents and children, while clone means calling and callee, which allows to share sth between caller and callee. Both fork and clone makes a new process, pthread makes tasks and is called thread. Pid is factually task id. Now the 3 coroutines works well, and I've get a process tree by map[int]*process. Here hides some questions: - is it right for 2nd corutine to send to 3rd as long as eoe? - how to make the delay between exit_group and deletePid clear and suitable? Next works: - Change the pids from map into DataBase, which means that we should devide front-end and back-end. Besides, when you delete sth(such as process exit), don't delete from databese, instead just make a tag and record their exit code. In other words, we judge if it's alive not by entry existance but exit tag. - Make containers recorded, for instance, rootFS, root-process, name, id, etc.. And record them in map, maintain this database table.
2024-07-18	Mainly finish the second coroutine, organize event	We-unite	4	-5/+262
	As is planed, the first coroutine throw rae event infomation to the second, and it organizes all info for the same event accroding to event id, which is unique without shutdown of this computer. There's several defficuties I've encountered, so I list their solution here to remeber: - raw info from 1st coroutine is correct, but wrong when 2nd gets it; or it's correct while recieved, then regular expr goes to match it, the first match is inline with expectations, but the next match goes totally wrong, and the info is different from what is received. Look into the src of go-libaudit, we'll find out that when heard from netlink socket, the read buffer is always the same slice, it first received a long data, then pass the origin slice to rawEvent.Data, and then received a shorter data. rawEvent.Data is passed to 2nd coruntine as a pointer to rawEvent, which means all this 3 process use the same part of memory. Then, when a shorter info comes from socket, the slice won't be moved, otherwise it write aigin to this part of mem, then coroutine 2 will get a dirty data. To deal with it, we change the type of channel from pointer to interface, and make a deep copy of rawEvent before passing down. As a result, the 2nd coroutine gets a copy of message but not origin, it finally comes right. - While designing a regular expr, it's thought correct but miss matched from the right string. There maybe sth wrong that can't be discovered by people's eye, you can try to rewrite the expr, then it may be fixed. Also, there's some hidden dangers: - 2nd coroutine comes with no error checks althouth err variable is set and catched ubder the rules of compiler. we shall make it later. - Is it reasonable to pass cooked event info immediately to 3rd coroutine without waiting some time? Info from network is out of order after all. Fight! Fight! Fight!