Graphics Programming Black Book by Michael Abrash.
This repository contains a mirrored copy of the book in its entirety, as well as the original source code written by the author himself. All rights are reserved by the author, Michael Abrash.
- The Human Element of Code Optimization
- Understanding High Performance
- When Fast Isn't Fast
- Rules for Building High-Performance Code
- Know Where You're Going
- Make a Big Map
- Make Lots of Little Maps
- Know the Territory
- Know When It Matters
- Always Consider the Alternatives
- Know How to Turn On the Juice
- The Unique Nature of Assembly Language Optimization
- Instructions: The Individual versus the Collective
- Assembly Is Fundamentally Different
- Transformation Inefficiencies
- Self-Reliance
- Knowledge
- The Flexible Mind
- Where to Begin?
- Understanding and Using the Zen Timer
- The Costs of Ignorance
- The Zen Timer
- The Zen Timer Is a Means, Not an End
- Starting the Zen Timer
- Time and the PC
- Stopping the Zen Timer
- Reporting Timing Results
- Notes on the Zen Timer
- A Sample Use of the Zen Timer
- The Long-Period Zen Timer
- Stopping the Clock
- Example Use of the Long-Period Zen Timer
- Using the Zen Timer from C
- Watch Out for Optimizing Assemblers!
- Further Reading
- Armed with the Zen Timq Onward and Upward
- How the PC Hardware Devours Code Performance
- Cycle-Eaters
- The Nature of Cycle-Eaters
- The 8088's Ancestral Cycle-Eaters
- The 8-Bit Bus Cycle-Eater
- The Impact of the 8-Bit Bus Cycle-Eater
- What to Do about the 8-Bit Bus Cycle-Eater?
- The Prefetch Queue Cycle-Eater*
- Official Execution Times Are Only Part of the Story
- There Is No Such Beast as a True Instruction Execution Time
- Approximating Overall Execution Times
- What to Do about the Prefetch Queue Cycle-Eater?
- Holding Up the 8088
- Dynamic RAM Refresh: The Invisible Hand
- How DRAM Refresh Works in the PC
- The Impact of DRAM Refresh
- What to Do About the DRAM Refresh Cycle-Eater?
- Wait States
- The Display Adapter Cycle-Eater
- The Impact of the Display Adapter Cycle-Eater
- What to Do about the Display Adapter Cycle-Eater?
- Cycle-Eaters: A Summary
- What Does It All Mean?
- Searching Files with Restartable Blocks
- Searching for Text
- Avoiding the String Trap
- Brute-Force Techniques
- Using memchr()
- Making a Search Restartable
- Interpreting Where the Cycles Go
- Knowing When Assembly Is Pointless
- Always Look Where Execution Is Going
- How Machine Instructions May Do More Than You Think
- Memory Addressing and Arithmetic
- Math via Memory Addressing
- The Wonders of LEA on the 386
- Multiplication with LEA Using Non-Powers of Two
- Optimizing Halfway between Algorithms and Cycle Counting
- When LOOP Is a Bad Idea -The Lessons of LOOP and JCXZ
- Avoiding LOOPS of Any Stripe
- Local Optimization
- Unrolling Loops
- Rotating and Shifting with Tables
- NOT Flips Bits -- Not Flags
- Incrementing with and without Carry
- Jumping Languages When You Know It'll Help
- Billy, Don't Be a Compiler
- Don't Call Your Functions on Me, Baby
- Stack Frames Slow So Much
- Torn Between Two Segments
- Why Speeding Up Is Hard to Do
- Taking It to the Limit
- A C-to-Assembly Case Study
- Optimization Odds and Ends from the Field
- Another Look at LEA
- The Kennedy Portfolio
- Speeding Up Multiplication
- Optimizing Optimized Searching
- Short Sorts
- Full 32-Bit Division
- Sweet Spot Revisited
- Hard-core Cycle Counting
- Hardwired Far Jumps
- Setting 32-Bit Registers: Time versus Space
- How Working Quickly Can Bring Execution to a Crawl
- The Case for Delayed Gratification
- The Brute-Force Syndrome
- Wasted Breakthroughs
- Recursion
- Patient Optimization
- New Registers, NewInstructions, New Timings, New Complications
- Family Matters
- Crossing the Gulf to the 286 and the 386
- In the Lair of the Cycle-Eaters, Part II
- System Wait States
- Data Alignment
- Code Alignment
- Alignment and the 386
- Alignment and the Stack
- The DRAM Refresh Cycle-Eater: Still an Act of God
- The Display Adapter Cycle-Eater
- New Instructions and Features: The 286
- New Instructions and Features: The 386
- Optimization Rules: The More Things Change...
- Detailed Optimization
- popf and the 286
- It's Not Just a Bigger 386
- Enter the 486
- Rules to Optimize By
- The Hazards of Indexed Addressing
- Calculate Memory Pointers Ahead of Time
- Caveat Programmor
- Stack Addressing and Address Pipelining
- Problems with Byte Registers
- More Fun with Byte Registers
- Timing Your Own 486 Code
- The Story Continues
- Pipelines and Other Hazards of the High End
- 486 Pipeline Optimization
- BSWAP: More Useful Than You Might Think
- Pushing and Popping Memory
- Optimal 1-Bit Shifts and Rotates
- 32-Bit Addressing Modes
- Optimizing a Pretty Optimum Search Algorithm
- String Searching Refresher
- The Boyer-Moore Algorithm
- Boyer-Moore: The Good and the Bad
- Further Optimization of Boyer-Moore
- Know What You Know
- Unfamiliar Problems with Familiar Data Structures
- Linked Lists
- Dummies and Sentinels
- Circular Lists
- Hi/Lo in 24 Bytes
- Lessons Learned in the Pursuit of the Ultimate Word Counter
- Counting Words in a Hurry
- Which Way to Go from Here?
- Challenges and Hazards
- Blinding Yourself to a Better Approach
- Watch Out for Luggable Assumptions!
- The Astonishment of Right-Brain Optimization
- Levels of Optimization
- Optimization Level 1: Good Code
- Level 2: A New Perspective
- Level 3: Breakthrough
- Enough Word Counting Already!
- The Triumph of Algorithmic Optimization in a Cellular Automata Game
- Conway's Game
- The Rules of the Game
- Where Does the Time Go?
- The Hazards and Advantages of Abstraction
- Heavy-Duty C++ Optimization
- Bringing In the Right Brain
- Re-Examining the Task
- Acting on What We Know
- The Challenge That Ate My Life
- Optimization beyond the Pale
- Breaking the Rules
- Table-Driven Magic
- Keeping Track of Change with a Change List
- A Layperson's Overview of QLIFE
- Learning a Whole Different Set of Optimization Rules
- The Return of Optimization as Art
- The Pentium: An Overview
- Crossing Cache Lines
- Cache Organization
- Faster Addressing and More
- Branch Prediction
- Miscellaneous Pentium Topics
- 486 versus Pentium Optimization
- Going Superscalar
- How Your Carbon-Based Optimizer Can Put the "Super" in Superscalar
- An Instruction in Every Pipe
- V-Pipe-Capable Instructions
- Lockstep Execution
- Superscalar Notes
- Register Starvation
- Focusing on Keeping Both Pentium Pipes Full
- Address Generation Interlocks
- Register Contention
- Exceptions to Register Contention
- Who's in First?
- Pentium Optimization in Action
- A Quick Note on the 386 and 486
- Taking a Spin through What You've Learned
- Zenning
- At the Very Heart of Standard PC Graphics
- The VGA
- An Introduction to VGA Programming
- At the Core
- Linear Planes and True VGA Modes
- Smooth Panning
- Color Plane Manipulation
- Page Flipping
- The Hazards of VGA Clones
- Just the Beginning
- The Macro Assembler
- Taking on Graphics Memory Four Bytes at a Time
- VGA Programming: ALUs and Latches
- Notes on the ALU/Latch Demo Program
- The Barrel Shifter, Bit Mask, and Set/Reset Mechanisms
- VGA Data Rotation
- The BitMask
- The VGA's Set/Reset Circuitry
- Setting All Planes to a Single Color
- Manipulating Planes Individually
- Notes on Set/Reset
- A Brief Note on Word OUTs
- The Write Mode That Grows on You
- A Mode Born in Strangeness
- A Note on Preserving Register Bits
- Write Mode 2, Chunky Bitmaps, and Text-Graphics Coexistence
- Write Mode 2 and Set/Reset
- A Byte's Progress in Write Mode 2
- Copying Chunky Bitmaps to VGA Memory Using Write Mode 2
- Drawing Color-Patterned Lines Using Write Mode 2
- When to Use Write Mode 2 and When to Use Set/Reset
- Mode 13H--320x200 with 256 Colors
- Flipping Pages from Text to Graphics and Back
- Read Modes 0 and 1, and the Color Don't Care Register
- Read Mode 0
- Read Mode 1
- When all Planes "Don't Care"
- Useful Nuggets from the VGA Zen File
- Saving and Restoring EGA and VGA Screens
- 16 Colors out of 64
- Overscan
- A Bonus Blanker
- Modifying VGA Registers
- The Joys and Galling Problems of Using Split Screens on the EGA and VGA
- How the Split Screen Works
- The Split Screen in Action
- VGA and EGA Split-Screen Operation Don't Mix
- Setting the Split-Screen-Related Registers
- The Problem with the EGA Split Screen
- Split Screen and Panning
- The Split Screen and Horizontal Panning: An Example
- Notes on Setting and Reading Registers
- Split Screens in Other Modes
- How Safe?
- When Is 320x200 Really 320x400?
- Why 320x200? Only IBM Knows for Sure
- 320x400 256-Color Mode
- Display Memory Organization in 320x400 Mode
- Reading and Writing Pixels
- Two 256-Color Pages
- Something to Think About
- Taking 256-Color Modes About as Far as the Standard VGA Can Take Them
- Extended 256-Color Modes: What's Not to Like?
- 360x480 256-Color Mode
- How 360x480 256-Color Mode Works
- 480 Scan Lines per Screen: A Little Slower, But No Big Deal
- 360 Pixels per Scan Line: No Mean Feat
- Accessing Display Memory in 360x480 256-Color Mode
- The Basics of VGA Color Generation
- VGA Color Basics
- The Palette RAM
- The DAC
- Color Paging with the Color Select Register
- 256-color Mode
- Setting the Palette RAM
- Setting the DAC
- If You Can't Call the BIOS, Who Ya Gonna Call?
- An Example of Setting the DAC
- Special Effects through Realtime Manipulation of DAC Colors
- Color Cycling
- The Heart of the Problem
- Loading the DAC via the BIOS
- Loading the DAC Directly
- A Test Program for Color Cycling
- Color Cycling Approaches that Work
- Odds and Ends
- The DAC Mask
- Reading the DAC
- Cycling Down
- Implementing and Optimizing Bresenham's Line-Drawing Algorithm
- The Task at Hand
- Bresenham's Line-Drawing Algorithm
- Strengths and Weaknesses
- An Implementation in C
- Looking at EVGALine
- Drawing Each Line
- Drawing Each Pixel
- Comments on the C Implementation
- Bresenham's Algorithm in Assembly
- Faster Bresenham Lines with Run-Length Slice Line Drawing
- Run-Length Slice Fundamentals
- Run-Length Slice Implementation
- Run-Length Slice Details
- Optimizing Run-Length Slice Line Drawing in a Major Way
- Fast Run-Length Slice Line Drawing
- How Fast Is Fast?
- Further Optimizations
- Drawing Polygons Efficiently and Quickly
- Filled Polygons
- Which Side Is Inside?
- How Do You Fit Polygons Together?
- Filling Non-Overlapping Convex Polygons
- Oddball Cases
- Filling Polygons in a Hurry
- Fast Convex Polygon Filling
- Fast Drawing
- Fast Edge Tracing
- The Finishing Touch: Assembly Language
- Maximizing REP STOS
- Faster Edge Tracing
- Dealing with Irregular Polygonal Areas
- Filling Arbitrary Polygons
- Active Edges
- Complex Polygon Filling: An Implementation
- More on Active Edges
- Performance Considerations
- Nonconvex Polygons
- Details, Details
- Names Do Matter when You Conceptualize a Data Structure
- Nomenclature in Action
- Fast Antialiased Lines Using Wu's Algorithm
- Wu Antialiasing
- Tracing and Intensity in One
- Sample Wu Antialiasing
- Notes on Wu Antialiasing
- A Simple and Extremely Fast Animation Method for Limited Color
- Bit-Planes: The Basics
- Stacking the Palette Regsters
- Bit-Plane Animation in Action
- Limitations of Bit-Plane Animation
- Shearing and Page Flipping
- Beating the Odds in the Jaw-Dropping Contest
- 640x480 Page Flipped Animation in 64K...Almost
- A Plethora of Challenges
- A Page Flipping Animation Demonstration
- Write Mode 3
- Drawing Text
- Page Flipping
- Knowing When to Flip
- Enter the Split Screen
- Different Angles on Animation
- Plus Ça Change
- VGA Access Times
- Dirty-Rectangle Animation
- So Why Not Use Page Flipping?
- Dirty Rectangles in Action
- Hi-Res VGA Page Flipping
- Another Interesting Twist on Page Flipping
- Optimizing Dirty-Rectangle Animation
- Dirty-Rectangle Animation, Continued
- Masked Images
- Internal Animation
- Dirty-Rectangle Management
- Drawing Order and Visual Quality
- Introducing the VGA's Undocumented "Animation-Optimal" Mode
- What Makes Mode X Special?
- Selecting 320x240 256-Color Mode
- Designing from a Mode X Perspective
- Hardware Assist from an Unexpected Quarter
- The Internals of Animation's Best Video Display Mode
- Allocating Memory in Mode X
- Copying Pixel Blocks within Display Memory
- Copying to Display Memory
- Who Was that Masked Image Copier?
- How to Make the VGA Really Get up and Dance
- Masked Copying
- Faster Masked Copying
- Notes on Masked Copying
- Animation
- Mode X Animation in Action
- Works Fast, Looks Great
- 3-D Animation Using Mode X
- References on 3-D Drawing
- The 3-D Drawing Pipeline
- Projection
- Translation
- Rotation
- A Simple 3-D Example
- Notes on the 3-D Animation Example
- An Ongoing Journey
- Using Backface Removal to Eliminate Hidden Surfaces
- One-sided Polygons: Backface Removal
- Backface Removal in Action
- Incremental Transformation
- A Note on Rounding Negative Numbers
- Object Representation
- The First Iteration of a Generalized 3-D Animation Package
- This Chapter's Demo Program
- A New Animation Framework: X-Sharp
- Three Keys to Realtime Animation Performance
- Drawbacks
- Where the Time Goes
- The Naked Truth About Speed in 3-D Animation
- Raw Speed, Part 1: Assembly Language
- Raw Speed, Part 11: Look it Up
- Hidden Surfaces
- Rounding
- Having a Ball
- Putting Realistic Surfaces on Animated 3-D Objects
- Support for Older Processors
- Shading
- Ambient Shading
- Diffuse Shading
- Shading: Implementation Details
- Pondering X-Sharp's Color Model in an RGB State of Mind
- A Color Model
- A Bonus from the BitMan
- Using Fast Texture Mapping to Place Pooh on a Polygon
- Principles of Quick-and-Dirty Texture Mapping
- Mapping Textures Made Easy
- Notes on DDA Texture Mapping
- Fast Texture Mapping: An Implementation
- The Critical Role of Experience in Implementing Fast, Smooth Texture Mapping
- Visual Quality: A Black Hole ... Er, Art
- Fixed-Point Arithmetic, Redux
- Texture Mapping: Orientation Independence
- Mapping Textures across Multiple Polygons
- Fast Texture Mapping
- Using the Whole-Brain Approach to Accelerate Texture Mapping
- Texture Mapping Redux
- Left-Brain Optimization
- A 90-Degree Shift in Perspective
- That's Nice--But it Sure as Heck Ain't 9 Cycles
- Don't Stop Thinking about Those Cycles
- Texture Mapping Notes
- What BSP Trees Are and How to Walk Them
- BSP Trees
- Visibility Determination
- Limitations of BSP Trees
- Building a BSP Tree
- Visibility Ordering
- Inorder Walks of BSP Trees
- Know It Cold
- Measure and Learn
- Surfing Amidst the Trees
- Related Reading
- Taking BSP Trees from Concept to Reality
- Compiling BSP Trees
- Parametric Lines
- Parametric Line Clipping
- The BSP Compiler
- Optimizing the BSP Tree
- BSP Optimization: an Undiscovered Country
- The Fundamentals of the Math behind 3-D Graphics
- 3-D Math
- Foundation Definitions
- The Dot Product
- Dot Products of Unit Vectors
- Cross Products and the Generation of Polygon Normals
- Using the Sign of the Dot Product
- Using the Dot Product for Projection
- Rotation by Projection
- Taking a Compiled BSP Tree from Logical to Visual Reality
- BSP-based Rendering
- The Rendering Pipeline
- Moving the Viewer
- Transformation into Viewspace
- Clipping
- Projection to Screenspace
- Walking the Tree, Backface Culling and Drawing
- Notes on the BSP Renderer
- Knowing When to Hurl Conventional Math Wisdom Out the Window
- Not Your Father's Floating-point
- Pentium Floating-point Optimization
- Pipelining, Latency, and Throughput
- FXCH
- The Dot Product
- The Cross Product
- Transformation
- Projection
- Rounding Control
- A Farewell to 3-D Fixed-point
- The Challenge of Separating All Things Seen from All Things Unseen
- VSD: The Toughest 3-D Challenge of All
- The Structure of Quake Levels
- Culling and Visible Surface Determination
- Nodes Inside and Outside the View Frustum
- Overdraw
- The Beam Tree
- 3-D Engine du Jour
- Subdividing Raycast
- Vertex-Free Surfaces
- The Draw-Buffer
- Span-Based Drawing
- Portals
- Breakthrough!
- Simplify, and Keep on Trying New Things
- Learn Now, Pay Forward
- References
- Determining What's Inside Your Field of View
- 3-D Clipping Basics
- Intersecting a Line Segment with a Plane
- Polygon Clipping
- Clipping to the Frustum
- The Lessons of Listing 65.3
- Advantages of Viewspace Clipping
- Further Reading
- Struggling with Z-Order Solutions to the Hidden Surface Problem
- Creative Flux and Hidden Surfaces
- Drawing Moving Objects
- Performance Impact
- Leveling and Improving Performance
- Sorted Spans
- Edges versus Spans
- Edge-Sorting Keys
- Where That l/Z Equation Comes From
- Quake and Z-Sorting
- Decisions Deferred
- Implementing Independent Span Sorting for Rendering without Overdraw
- Quake and Sorted Spans
- Types of l/z Span Sorting
- Intersecting Span Sorting
- Abutting Span Sorting
- Independent Span Sorting
- l/z Span Sorting in Action
- Implementation Notes
- A Radically Different Approach to Lighting Polygons
- Problems with Gouraud Shading
- Perspective Correctness
- Decoupling Lighting from Rasterization
- Size and Speed
- Mipmapping To The Rescue
- Two Final Notes on Surface Caching