CMU-15-213-Intro-to-Computer-Systems

Notes and labs for the course 15-213 Introduction to Computer Systems at CMU

Integer representation

Data types
- char, short, int, long, float, double, pointer
- Word size equals length of pointer datatype
Bit-level operations
Signed / unsigned conversion
Byte ordering
- Big Endian: Sun, PPC Mac, Internet
- Little Endian: x86, ARM

IEEE 754

Numeric form: (-1)^s*M*(2^E)
Encoding
- s: sign bit
- exp: E
- frac: M
Three kinds of values
- Denormalized: exp = 0
- Normalized: 0 < exp < 11..11
- Special: exp = 11...11 (e.g. inf & NaN)
Roundings

x86-64

History
- 8086 -> 386 -> Pentium 4E -> Core 2 -> Core i7
- IA32 -> x86-64
- CISC architecture
- Case study: Core-i7 6700K Skylake
  - Shared L3 cache (LLC)
  - Hyper-Threading: Shared caches, buses & ALUs
Registers
- Can reference low-order 4 bytes (backwards compatibility)
- %rsp is stack top, others are general-purpose
- %rip is instruction pointer (not listed)
- A register for implicitly-set condition codes

Machine-level Programming

Addressing modes
- Normal: (R) -> Mem[Reg[R]]
- Displacement: D(R) -> Mem[Reg[R] + D]
- Complete: D(Rb,Ri,S) -> Mem[Reg[Rb] + S*Reg[Ri] + D]
  - (Rb,Ri) -> Mem[Reg[Rb] + Reg[Ri]]
  - D(Rb,Ri) -> Mem[Reg[Rb] + Reg[Ri] + D]
  - (Rb,Ri,S) -> Mem[Reg[Rb] + S*Reg[Ri]]
Some instructions
- movq Src, Dst
  - Cannot do memory-memory transfer with a single instruction
  - Intel docs use mov Dst, Src
- leaq Src, Dst
  - Src is address mode expression, set Dst to address denoted by expression
  - Similar to p = &x[i]
  - Used for arithmetics for form like x + k * y
  - Does not change condition codes
- addq/subq Src, Dst
- imulq Src, Dst
- salq/sarq/shrq Src, Dst
- xorq/andq/orq Src, Dst
- pushq src
- popq dest
- incr dest
Compiler, Assembler, Linker & Loader
1. Compiler
  - Translates C files (.c) into assembly files (.s)
2. Assembler
  - Translates assembly files (.s) into object files (.o)
  - Missing linkage between compilation units
3. Linker
  - Resolve references between object files
  - Combine with static libraries (malloc, printf, etc)
4. Dynamic linked libraries
  - Linking occurs at runtime
  - Does not take too much disk space
Controls
- Jumping
  - jmp, je, jne, js ...
  - However, branches are very disruptive to instruction flow through pipelines
- Conditional moves: cmovle
  - Do not require control transfer
  - Use -fno-if-conversion flag to supress implicit conversion
- Switch statements
  - Jump table structure
  - Make use of "fall through"
Procedures
- Passing control
  - Procedure call: call label
    - Push return address into stack
    - Jump to label
  - Procedure return: ret
    - Pop return address from stack
    - Jump to this address
  - Return address: Address of next instruction after the call statement
- Passing data
  - First 6 arguments: %rdi, %rsi, %rdx, %rcx, %r8, %r9
  - Other arguments passed using stack
  - Return value: %rax
  - IA-32 pass all arguments in stack
  - Concept of stack frames:
    - Marked by %rbp (optional) and %rsp
    - No additional mechanism for recursion is needed
  - Register saving conditions
    - Caller saved
      - %rdi, %rsi, %rdx, %rcx, %r8, %r9, %rax, %r10, %r11
    - Callee saved
      - %rbx, %r12, %r13, %r14, %rbp
      - %rsp is also a special form of callee-saved
- Memory management
- ABI: Application Binary Interface
Data
1. Arrays
  - 1D arrays
  - Nested 2D arrays: int A[R][C]
  - Multi-level 2D arrays:
2. Structs
  - Represented as block of memory
  - Fields are ordered according to declaration
  - Alignment:
    - Within struct: Each element has alignment requirement K, where K is the size of this element
    - Overall: Each struct has alignment requirement K, where K is the largest alignment of any element in struct
    - To save space, put large data types first
3. Float operations
  - Arguments passed in %xmm0, %xmm1, ...
  - Result returned in %xmm0
  - Different mov instructions are used to move floats
Address space
- Currently using 47-bit addresses (highest address of 0x7fffffffffff)
- Maximum stack size of 8MB on most machines
Vulnerablities
1. Buffer overflow
  - Triggered by functions manipulating strings of arbitrary length
  - gets, strcpy, strcat, scanf, fscanf, sscanf
2. Return-oriented programming (ROT)
  - Make use of "gadgets" in text segment
  - Trigger with ret instruction
Protection
1. Use routines limiting string lengths (user-level)
2. Randomized stack offsets
3. Nonexecutable code segments
4. Stack canaries

Code optimization

Optimization by programmer
1. Code motion: Reduce frequency of computations performed
  
  GCC will do this with -O1
2. Reduction in strength: Reduce costly operation with simpler one
  
  Here, int mul requires 3 clock cycles, int add requires 1 clock cycle
3. Share common subexpressions
Optimization blockers
1. Procedures: Seen as a "black box"
  - Procedures may have side effects
  - May not return same result with same argument
  - Fix: Use inline functions (GCC with -O1 within single file)
2. Memory aliasing: Two memory references specify single location
  - The following code does memory load and store every time, because compiler assume possibility of memory aliasing:
  - Load and store take multiple clock cycles
  - Easily caused by direct access to storage structures
  - Fix: Define local variable to tell compiler not to check for aliasing
  - Get in habit of introducing local variables accumulating within loops
Optimization (by programmer) limitations
1. Most performed within procedures. Newer versions of GCC do interprocedual optimization, but not between codes in different files
2. Based on static information
3. Conservative: Must not change program behavior
Instruction-level parallelism
- Superscalar processor: Issue and execute multuple instructions per cycle, and instructions are scheduled dynamically
- Some instruction have >1 clock cycle latency, but can be pipelined:
- Unrolling
  - Break sequential dependency to break through latency bound (to approach throughput bound)
```
for(int i = 0; i < limit; ++i)
    x = x + d[i];
```
    can be optimized to:
```
for(int i = 0; i < limit; i += 2)
    x = (x + d[i]) + d[i + 1];
```
    but to break sequential dependency:
```
for(int i = 0; i < limit; i += 2)
    x = x + (d[i] + d[i + 1]);
```
  - adding separate accumulators
- Branch prediction
  - Backward branches are often loops, predict taken
  - Forward branches are often if, predict not taken
  - Average better than 95% accuracy

Memory

Storage technologies
1. RAMs
  - Volatile: SRAM & DRAM (caches & main memories)
  - Nonvolatile: ROM, PROM, EPROM, EEPROM (firmware, ssd & disk caches)
2. Rotating disks
3. SSDs
  - Page can be written only after its block has been erased
Locality
- Temporal locality
- Spatial locality
Hierarchy
Caches
- Each level in hierarchy serves as cache for the level below
- Types of cache misses
  1. Cold miss: "Warm up" cache
  2. Capacity miss: Working set larger than cache size
  3. Conflict miss: Limited by positioning restrictions imposed by hardware
- Examples of cache
Cache memories
- Concept of locality
- General organization
  1. Direct mapped cache has (E / associativity = 1)
  2. E-way set associative cache (Here E / associativity = 2)
- Metrics
  1. Miss rate
  2. Hit time
  3. Miss penalty
- Write cache-friendly code
  1. Make the common cases go first
  2. Minimize the misses in inner loops
  3. Try to maximize spatial locality by reading objects sequentially with stride 1
  4. Try to maximize temporal locality by using an object as often as possible once it's read from memory
- Example of matrix multiplication
  - In which order to arrange the loops? Do miss rate analysis!
  - It turns out: kij/ikj > ijk/jik > jki/kji
  - Use blocking: multiplying by sub-matrices

Linking

Why linkers?
1. Modularity
2. Efficiency (separate complilation)
Two kind of linking
1. Static linking
2. Dynamic linking
What does linker do?
1. Symbol resolution
  - Functions, global vars, static vars
  - Definitions are stored in symbol table, an array of entries (name, size, location)
  - Three kind of symbols:
    1. Global symbols: non-static functions and non-static vars
    2. External symbols: defined in other modules
    3. Local symbols: static functions and static vars
    - Note: Do not confuse local symbols with local variables. Local variables are allocated in stack at runtime, and have nothing to do with linker.
  - Symbol resolution
    - Symbols are strong or weak:
      1. Strong: functions and initialized globals
      2. Weak: uninitialized globals
    - Multiple strong symbols are not allowed
    - Choose the strong symbol over weak symbols
    - If there are multiple weak symbols, choose arbitrary one
      - May cause undefined behavior over different compilers
      - Fix: use static and explicit extern
2. Relocation
  - Merge text and data segment
  - Relative location -> absolute location
  - Updates symbol table
    - Relocation entries are used to aid symbol resolving:
      a: R_X86_64_32 array
Three kinds of object files
1. Relocatable object file (.o file)
2. Executable object file (a.out file)
3. Shared object file (.so file or .dll file)
ELF format (Executable and Linkable Format)
- All 3 object files use ELF format
Static libraries (.a archive files)
- Concatenate related relocatable object files into a single file with an index (called an archive)
- During linking, only referenced .o files are linked
- Command line order matters!
  - During scan, keep a list of currently unresolved references
  - If any entries in the unresolved list at end of scan, then error
  - Fix: put libraries at the end of command line
- Commonly used libraries:
  - libc.a (the C standard library)
  - limb.a (the C math library)
- Disadvantages
  - Duplication in storage
  - Bug fixes require relink
  - Fix: shared libraries
Shared libraries
- Dynamic linking can happen at:
  1. Load time
    - Handled by the dynamic linker
    - libc.so usually dynamically linked
  2. Run time
    - dlopen() interface in linux
Library interpositioning
- Can happen at:
  1. Compile time
  2. Link time
  3. Load/run time
- Can be used for:
  1. Detecting memory leaks
  2. Generating address traces

Exception Control Flows (ECF)

ECFs exists in all levels:
1. Exceptions (low level)
  - Processor responses to external events
  - Exception tables
2. Context switch
3. Signals
4. Nonlocal jumps
Exceptions (equivalent to user-kernel transition)
1. Asynchronous (Interrupts)
  - Indicated by INT pin
  - Control flow returns to next instruction
2. Synchronous
  1. Traps
    - Intentional (syscall, breakpoints)
    - Control flow returns to next instruction
  2. Faults
    - Unintentional but possibly recoverable
    - Control flow returns to current instruction or aborts
  3. Aborts
    - Unintentional and unrecoverable
Context switches

Processes

From a programmer's perspective, a process can be:
1. Running: Executing or will be scheduled
2. Stopped: Suspended and will not be scheduled until further notice
3. Terminated: Stopped permanently (zombie)
  - Process terminates when:
    1. SIGTERM received
    2. Return from main()
    3. Called exit()
Creating process: fork()
- fork() called once but returns twice
- exit() and execve() called once but possibly never returns
- Control flow can be modelled with process graphs via toposort:
Reaping child processes: wait()
- Terminated processes become zombies, because its parent may use its exit status or OS tables
- wait() and waitpid() reap zombie child processes
- If parent don't reap:
  1. If parent doesn't terminate: Never diminishes (a kind of memory leak)
  2. If parent does terminate: Reaped by init process (pid == 1)
  - So only need to explicitly reap long-running processes
Loading and running processes: execve()
- int execve(char *filename, char *argv[], char *envp[])
- Loads and runs in the current process
- Overwrites code, data and stack
- Retains PID, open files (e.g. stdout), and signal context
- Called once and never return (except error)
Process groups
- Can be get and set by getpgrp() and setpgid()
- Kill all process in a group with kill -n -<pid>

Signals

Unix shell: An application that runs program on behalf of the user
- Shell contains a basic loop and a eval() function
- Two cases in eval():
  1. Shell built-in command
  2. Not build-in, use fork() and execve()
- Motivation: How to reap both foreground and background jobs?
  - Basic loop: Only reaps foreground jobs
  - Fix: Signals
Signals
- Akin to exceptions and interrupts
- Sent from signal (sometimes at the request of another process via kill)
- Identified by an integer
- Controlled by per-process pending and blocked bit vectors
  - pending vector set and cleared by kernel when signals is sent or received
  - blocked vector can be manipulated by sigprocmask() function
  - So, signals cannot be queued
- Send: pending bit set
- Receive: process reacts to the signal, clears pending bit
  1. Ignore
  2. Terminate
  3. Catch (using user-level function called signal handler)
- Kernels checks for pnb = pending & ~blocked at beginning of a time-slice
  - If pnb == 0:
    - Pass control to next instruction in the process logical flow
  - Else
    1. Choose lease non-zero bit in pnb and forces the process to receive the signal
    2. The receipt of the signal triggers some action by the process (clears pending bit)
    3. Repeat for all remaining nonzero bits
    4. Pass control to next instruction in the process logical flow
- Default action can be one of:
  1. Termination
  2. Stop until restarted by SIGCONT
  3. Ignore
- Override default action by installing signal handlers:
  - handler_t *signal(int signum, handler_t *handler)
  - handler can be one of:
    1. SIG_IGN: Ignore
    2. SIG_DFL: Revert to default
    3. Function pointer to a user-level signal handler
- Signal handlers are a form of concurrency
- Signal handlers can be nested
  - So we need blocking
    1. Implicit blocking: blocks pendings signals of same type
    2. Explicit blocking: sigprogmask() with supporting functions of:
      - sigemptyset()
      - sigfillset()
      - sigaddset()
      - sigdelset()
- How to write safe handlers?
  1. Keep handlers as simple as possible
  2. Call only async-signal-safe function in handlers
    - async-signal-safe functions are reentrent (access only local variables on stack), or cannot be interrupted by another signal handler
    - printf(), malloc() and exit() are not safe
    - write() is the only signal-safe output function
  3. Save and restore errno on entry and exit
  4. Protect accesses to shared data structures by temporarily blocking all signals in both handler and main()
  5. Declare global variables to be volatile, to prevent from being optimized into registers
  6. Declare global flags as volatile sig_atomic_t
    - Flag: variable only read or written (not flag++ or flag+=10)
    - volatile sig_atomic_t are ints on most systems
- Avoid race conditions
  - Cannot make any assumption regarding execution order
  - However, we can control when handlers run by blocking
- Explicitly waiting for signals: suppose handler sets global variable pid:
  - Spin wait: while(!pid) {}
    - Wasteful
  - Pause: while(!pid) pause()
    - Race condition
  - Sleep: while(!pid) sleep(1)
    - Too slow
  - Solution: sigsuspend
    - int sigsuspend(const sigset_t *mask)
    - Equivalent to atomic:
      sigprocmask(SIG_BLOCK, &mask, &prev); pause() sigprocmask(SIG_BLOCK, &prev, NULL);
- Portable signal handling
  - Problem: Different versions of unix have different signal handling semantics
  - Solution: Use sigaction

Virtual Memory

Physical Addressing: Used in microcontrollers, embedded systems, etc.
Mentality: Main memory is a fully-associative cache for disk
- Load doesn't necessarily happen with execve(). It only allocates virtual address space with valid bit of 0
- Loading is a result of a page fault (demand paging)
Kernel memory invisible to application program. Kernel's address space starts with 1.
Every memory access go through cache memory:
- Both memory and cache gets updated after page fault
Address translation: Multi-level page tables
TLB: Small set-associative hardware cache in MMU
Works only because of locality

System-Level I/O

Unix I/O
1. Opening and closing files: open(), close()
2. Reading and writing files: read(), write()
3. Changing file position: lseek()
4. View file metadata: stat()
  - stat() are both a syscall and a linux program
  - Syscalls are in second section of man: man 2 stat
- Always check return codes for these syscalls
File types: Regular, directory, socket, named pipes, symlinks, character and block devices
Short counts: (nbytes < sizeof(buf)) are possible
Wrapper: RIO (robust I/O) package
1. Unbuffered I/O of binary data: rio_readn() and rio_writen()
2. Buffered I/O of text or binary: rio_readlineb() and rio_readnb()
- RIO package is better for input and output on network sockets
Standard I/O
1. Opening and closing: fopen() and fclose()
2. Reading and writing bytes: fread() and fwrite()
3. Reading and writing text lines: fgets() and fputs()
4. Formatted reading and writing: fscanf() and fprintf()
- C program begin with 3 open files:
  1. stdin (descriptor 0)
  2. stdout (descriptor 1)
  3. stderr (descriptor 2)
Trace syscalls with the Linux strace program
Choosing I/O functions
- General: Use highest-level functions
- When to use Unix I/O: Signal handlers because unix I/O functions are async-signal-safe
- When to use standard I/O: Disks, terminals
- When to use RIO: Network sockets
How kernel represents open files
- Open file table: An instance of opening file
  - If a process opens a file twice, there are two open file tables pointing to the same v-node table
- V-node table: File metadata (regardless of whether file is open)
- After fork(), refcnt is incremented:
  - Two processes share a same instance of opened file (including file position)
- dup2(int oldfd, int newfd): Used for I/O redirection
  - After calling dup2(4, 1):
Recommended references:
- W. Richard Stevens & Stephen A. Rago, Advanced Programming in the Unix Environment, 2 nd Edition, Addison Wesley, 2005

Virtual Memory: Systems

End-to-end Core i7 Address Translation
L1 d-cache index and offset have 12 bits is NOT a coincidence: Speed up address translation
Linux organizes VM as collections of areas:
- Fault handling: Traverse the vm_area_structs to check if page is allocated
Private Copy-on-write (COW)
Memory Mapping: void *mmap(void *start, int len, int prot, int flags, int fd, int offset)
- start: A hint address
- prot: PROT_READ, PROT_WRITE, PROT_EXEC
- flags: MAP_ANON, MAP_PRIVATE, MAP_SHARED
- Returns a pointer to the start of mapped area (may not be start)

Dynamic Memory Allocation

Allocators: Maintain the heap as a collection of variable sized blocks, which are either allocated or free
- Explicit allocator: Application allocates and frees
- Implicit allocator: Application allocates but not frees
The malloc package
- void *malloc(size_t size)
- void free(void *p)
- calloc: Initializes allocated blocks to 0
- realloc: Changes size of previously allocated block
- sbrk: Used internally by allocators to grow and shrink the heap
Constraints:
- Applications have few constraints
- Allocators have many constraints:
  1. Can't assume allocation patterns
  2. Must respond immediately to malloc (can't defer allocation)
  3. Can't relocate allocated memory
Performance goal (2 conflicting goals)
- Throughput: Number of completed requests per unit time
- Peak memory utilization: How to efficiently use memory
Fragmentation
- Internal fragmentation:
  - Payload smaller than block size
  - Easy to measure
- External fragmentation:
  - Enough aggregate heap memory, but no single free block large enough
  - Difficult to measure
Keeping track of free blocks
How to find a free block
1. First fit
2. Next fit
3. Best fit
Know how much to free: Header
- Encodes block size (including the header and any padding)
- Alignment means lower-bits of size are 0, used to encode allocated bit
Implicit list
- Allocating a free block: May need to split the block
- Freeing a block: Have to coalesce free blocks (4 cases):
- Singly-linked list cannot free previous block in constant time
  - Fix: Doubly-linked list (head and footer)
  - Optimization:
    - Allocated blocks doesn't need coalescing
    - We have extra bits to encode whether previous block is allocated
    - So, allocated blocks doesn't need footer
- Implicit lists are not commonly used because of linear time. However, the concepts of splitting and coalescing are general to all allcators
Explicit free list
- Maintain list of free blocks using payload area
- Blocks can be in any order (depending on insertion policy)
  - Unordered: LIFO, FIFO
  - Address-ordered
- Much faster than implicit lists when memory is full
Segregated list
Garbage collection
- Mark and sweep collecting
  - Allocate using malloc until run out of space
  - Use extra mark bit for each block
  - Root nodes: Pointers in stack/data section
  - Does not distinguish between pointers/non-pointers, thus "safe"
  - Mark: Start at root nodes and do DFS
  - Sweep: Start at beginning of VM, and free unmarked blocks
  - How to find beginning of block? -- Use a balanced tree

Network Programming

Client-Server Architecture
Network Architecture
- Routers set borders of LANs
- Conceptual view of LANs:
Socket
- To the kernel: An endpoint of communication
- To application: A file descriptor to write/read
- Generic socket address:
- IPv4 specific socket address:
Host and Service Conversion: getaddrinfo
- Convert string representations of hostnames, host addresses, ports and service names to socket address structures
- getnameinfo is the inverse of getaddrinfo
Client/Server interface
telnet: Testing servers
- Creates TCP connection with a server (starts a session)
- Since the encoding of HTTP is ascii, we can hard-code http requests
HTTP
- Content: A sequence of bytes in MIME (Multipurpose Internet Mail Extension) type
- The contents can be either static or dynamic
Dynamic content:
- Produced by server-side program
- If URI containts cgi-bin then serve dynamic content
- Use fork() and exec() to execute new program
- Use env-var QUERY_STRING to pass parameters

Concurrency

Iterative servers have serious flaws.
- Easily get blocked by single misbehaving client
  - Note: Blocking does not happen upon client calling connect() or write(), but upon read(). This is because server's kernel provides buffering
- So we need concurrent servers

Process-based servers
- Parent must close connected socket (parent doesn't get reaped)
- Child should close listening socket (child gets reaped)
- Reap child with SIGCHLD handler
Event-based servers
- Manage multiple connections in user space
- Determine events using select() or epoll()
- Design of choice for high-performance web servers
- However, hard to provide find-grained concurrency
- Cannot take advantage of multi-core
Thread-based servers
- Can run threads in detached mode. It will run independently, and get reaped automatically
- Possible race conditions when passing parameters to new thread in pthread_create()

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
LAB0_cprogramminglab		LAB0_cprogramminglab
LAB1_datalab		LAB1_datalab
LAB2_bomblab		LAB2_bomblab
LAB3_attacklab		LAB3_attacklab
LAB4_cachelab		LAB4_cachelab
LAB5_tshlab		LAB5_tshlab
LAB6_malloclab		LAB6_malloclab
LAB7_proxylab		LAB7_proxylab
Note_Images		Note_Images
Slides		Slides
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMU-15-213-Intro-to-Computer-Systems

Integer representation

IEEE 754

x86-64

Machine-level Programming

Code optimization

Memory

Linking

Exception Control Flows (ECF)

Processes

Signals

Virtual Memory

System-Level I/O

Virtual Memory: Systems

Dynamic Memory Allocation

Network Programming

Concurrency

About

Releases

Packages

Languages

CWEzio/CMU-15-213-Intro-to-Computer-Systems

Folders and files

Latest commit

History

Repository files navigation

CMU-15-213-Intro-to-Computer-Systems

Integer representation

IEEE 754

x86-64

Machine-level Programming

Code optimization

Memory

Linking

Exception Control Flows (ECF)

Processes

Signals

Virtual Memory

System-Level I/O

Virtual Memory: Systems

Dynamic Memory Allocation

Network Programming

Concurrency

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages