计算机系统结构（2月1日）

单周期处理器设计

ISA：软硬件交界面

MIPS、x86、IBM360、JVM

处理器性能：执行时间是衡量一个系统好坏的标志

Microarchitecture CPI cycle time

microcoded >1 short

single-cycle unpipelined 1 long

pipelined 1 short

MIPS指令集（DLX）

1、 32位通用寄存器，R0永远是0

2、 32位单精度浮点，16位双精度浮点寄存器

3、 PC、其它特殊寄存器

4、 8位一个字节，16位半字，32位单字，32位单精度浮点，64位双精度浮点

5、 Load/store结构

6、寄存器间接寻址，PC relative

7、字节编址方式

所有指令都是32位字长

单周期流水线实现

数据通路：数据从流入到流出所经历的存储状态的东西，计算部件、cache、memory等

先设计数据通路，然后再添加控制逻辑

计算机：有限状态机

由寄存器、存储器、PC等等构成了机器的状态，机器对这个状态进行处理、计算得到新的状态，循环进行，因此是有限状态机

冯诺依曼：程序存储的思想

存储程序的思想：根据PC值在存储中取指令，通过指令判断执行什么操作

memory

harvard Architecture：数据和指令分开存放，避免资源冲突

对于存储器，有读、写；对于指令存储器，只读，一般是不可写的

指令编码：

R类型指令：

I类型指令：

数据存储器，

LOAD两个寄存器

ALUSrc

MemToReg：从memory取数据，送到寄存器

分支指令：

控制：

R类型指令：寄存器类型指令通过ALUOp来判断

生成控制信号：

数据通路：寄存器、存储器、存储这些状态的部件，以及传输的部件，总线，包括中继部件MUX，计算部件ALU，构成

流水线基本技术：pipelining

通过时间共享实现对资源的充分利用

吞吐率（整体的效率）、latency

流水线技术不是为了提高latency，通常来说因为加上了其它技术，latency反而加长了

最慢的段会形成瓶颈段

任务要尽可能多，要连续

可能加速比等于流水线段数

流水线有充满和排空的时间

指令执行过程：取值、译码、执行、memory、writeback

Break datapath into 5 stages:

1、insert pipeline registers (插入段间寄存器，保存：中间计算结果，指令，控制信号)

2、Each stage has its own functional units

3、Each stage can execute in 2ns

有的指令只需要4段流水线，有的需要5段，那么就给4段的加上一段，让所有流水线段数一样

流水线数据通路和控制问题：

流水线并没有缩短指令执行时间，而是吞吐率提高了

结构冲突：资源不够用，多个指令共用一个资源

数据冲突：对于标量处理：1、read after write（先写后读），数据相关、依赖；2、先读后写，前相关；3、先写后写冲突，输出相关；

控制冲突：引起程序流发生变化的冲突，例如跳转

分支指令：跳转指令冲突

30%分支操作，每个操作浪费3个周期

分支冲突方法：1、等待；2，判断；3、凡是转移指令均判断发生；4、延迟转移技术

延迟槽：从前面找一条无关的指令放到延迟槽，没有就空指令

延迟转移技术延迟比较大：60%的延迟槽可以被充满

stalls and flushes

forwarding技术：forwarding data from the pipeline registers,instead of waiting for the writeback stage

flush操作和stall操作基本是一样的

流水线动态调整

结构冲突：等待、增加资源

数据冲突：RAW、WAR、WAW

控制冲突：提前计算条件码、分支延迟转移、分支预测、循环展开...

动态调度算法——Tomasulo算法（IBM 360/91）

流水线动态调度 ILP Dynamic Exploitation

动态技术：通过硬件使得引起冲突的概率最小

静态技术：依赖编译器，通过compiler程序调度，避免冲突

指令集并行性：基本块 basic block：这一段没有转移指令

Example：GCC（Gnu C Compiler）：17% control transfer,5 or 6 instructions + 1 branch

软件流水方法：Tomasulo算法的软件实现

数据相关性：程序固有的特性，

通过调度程序使得相关不引起流水线的冲突

名相关：前相关、输出相关

控制相关：

Dynamically Scheduled Pipelines -- Scoreboarding 计分牌

指令按顺序发射，执行乱序

译码分成两端：1、Issue——decode instructions, check for stuctural hazards;2、Read oprands 读操作数 —— wait until no data hazards, then read oprands

1963年，CDC6600：

1、in order issue, out of order execution, out-of-order commit(or completion)

2、no forwarding

3、imprecise interrupt/exception model for now

Four Stages of Scoreboard Control

Issue - decode instructions & check for structural hazards

instructions issued in program order (for hazard checking)
don't issue if structural hazard
don't issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards)

读操作数 Read operands - wait until no data hazards, then read operands

all real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data

no forwarding of data in this model

Execution - operate on operands (EX)

The functional unit begins execution upon receiving oprands

when the result is ready, it notifies the scoreboard that it has completed execution

write result - finish execution (WB)

Stall until no WAR hazards with previous instructions

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

In-order issue; out-of-order execute & commit

CDC 6600 Scoreboard,通过编译，加速比可以提高1.7倍，通过手工优化的方法可以提高2.5倍，But slow memory （no cache） limits benefit

No forwarding hardware

limited to instructions in basic block(small window)

small number of functional units(structural hazards),especially integer/load store units

Do not issue on structural hazards

wait for WAR hazards

prevent WAW hazards

CDC 6600 70%性能的提高，没有软件流水，编译器还不是那么smart

需要解决：1、数据相关；2、通过动态调度缩小stalls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

计算机系统结构（2月1日）

Clone this wiki locally