-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the tensor-slicing copy for qkv parameters #2198
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RezaYazdaniAminabadi
requested review from
jeffra,
samyam,
tjruwase,
ShadenSmith,
conglongli,
awan-10,
cli99,
eltonzheng,
minjiaz,
duli2012,
mrwyattii,
yaozhewei,
arashb,
xiaoxiawu-microsoft and
samadejacobs
as code owners
August 9, 2022 02:12
tjruwase
approved these changes
Aug 9, 2022
delock
added a commit
to delock/DeepSpeedSYCLSupport
that referenced
this pull request
Nov 8, 2022
* Fix the layer-past for GPT based models (microsoft#2196) * Add gradient_average flag support for sparse grads (microsoft#2188) * Add gradient_average flag support for sparse grads * formatting fixes * Add tests Co-authored-by: Olatunji Ruwase <[email protected]> * Adding additional instructiosn in the compression tutorial on pre-training distillation and quantization for GPT (microsoft#2197) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * Log user config exactly (microsoft#2201) * Fix the tensor-slicing copy for qkv parameters (microsoft#2198) Co-authored-by: Olatunji Ruwase <[email protected]> * Refactor Distributed Tests (microsoft#2180) Refactor Distributed unit tests * fix table syntax (microsoft#2204) Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * Correctly detect offload configuration (microsoft#2208) Co-authored-by: Jeff Rasley <[email protected]> * add cuda 11.7 (microsoft#2211) * add cuda 11.7 * formatting * use torch 1.9 (microsoft#2215) * [zero-3] print warning once and support torch parameter (microsoft#2127) * print warning only once. * add support for torch param and only warn on gpu 0 * remove type checking. will be done on a new PR with more tests. Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * Add support of OPT models (microsoft#2205) * add opt replace policy * simplify inf. api * fix opt replace policy * fix use-cash & add relu * Add support of custom MLP act. function * Revert "simplify inf. api" This reverts commit 9e910fc. * fix the inference API (temp. solution) * fix code formatting * add unit tests for OPT models. * refactor pre-attention layer norm configuration * add support of opt-350m model * refactor the HF model config initialization * fix hf model config issue Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> * fix typos in readme. (microsoft#2218) Co-authored-by: Olatunji Ruwase <[email protected]> * [device abstraction] add device abstraction to allow other device than CUDA be used * Fix regression w. dist_init_required (microsoft#2225) * add doc for new bert example (microsoft#2224) * Remove the random-generator from context during inference (microsoft#2228) * Fix the tensor-slicing copy for qkv parameters * remove the random-generator from context during inference * formatting Co-authored-by: Jeff Rasley <[email protected]> * allow saving ckpt w/o ckpt json + bloom copy fix (microsoft#2237) * Correctly detect zero_offload (microsoft#2213) * Correctly detect offload configuration * Correctly detect offload configuration * Handle deprecated cpu offload setting * Correcly detect zero_offload setting * Minor tweak Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> * update videos (microsoft#2249) * Refactor dist tests: Checkpointing (microsoft#2202) Refactor distributed tests: checkpointing Co-authored-by: Michael Wyatt <[email protected]> * Make OPT policy backward compatible with pre-OPT transformers versions (microsoft#2254) * fix ds-inference without policy (microsoft#2247) Co-authored-by: Jeff Rasley <[email protected]> * bump to 0.7.2 * Enable contiguous gradients with Z1+MoE (microsoft#2250) MoE training with zero stage 1 only works with `contiguous gradients=True`. * [rebase-202208] additional changes needed when rebase to 202208 * [rebase] cleanup direct cuda usage after merge * Correctly detect CPU optimizer usage (microsoft#2257) * Correctly detect CPU optimizer usage * Update nv-transformers-v100.yml (microsoft#2259) Co-authored-by: Jeff Rasley <[email protected]> * [precommit] fix pre-commit issues * Update half precision header guards (microsoft#2261) * fix microsoft#2240: wrong time unit in flops_profiler (microsoft#2241) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * bump to 0.7.3 * Add blob storage to CI runners (microsoft#2260) Add blob storage to CI runners and enable for transformers cache on inference tests * Update replace_module.py, test-gptj.py related fix (microsoft#2269) Fix RuntimeError: Boolean value of Tensor with more than one value is ambiguous when running test-gptj.py * Fix OrderedDict import for python3.6 (microsoft#2267) Co-authored-by: Olatunji Ruwase <[email protected]> * Ds inference/fix mp2 (microsoft#2270) * Trajepl: nebula load fix (microsoft#2182) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: chenguo <[email protected]> * prevent torch ext folder mkdir at tmp (microsoft#2274) * Ds-inference Int8 support through ZeroQuant technology (microsoft#2217) Co-authored-by: Jeff Rasley <[email protected]> * add a new unit test for cuda ops (microsoft#2278) Co-authored-by: cmikeh2 <[email protected]> * Add to codeowners file (microsoft#2279) * [pin_memory] make pin_memory select device type * Memory Access Utility (microsoft#2276) Co-authored-by: Ammar Ahmad Awan <[email protected]> * Fp32 accuracy bug fix (microsoft#2285) Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> * Refactor universal checkpointing and tensor fragments (microsoft#2253) * Refactor universal checkpointing and tensor fragments * Formatting * [ds-inference] fix progress bar (microsoft#2286) when loading the non-sharded checkpoint update the progress bar (fix by @RezaYazdaniAminabadi) - I've just tested it to work. Co-authored-by: Olatunji Ruwase <[email protected]> * Offload all gradients to nvme (microsoft#2282) * fused bias relu unittest (microsoft#2297) * fix for pytest picking up local deepspeed dir instead of installed deepspeed (microsoft#2299) * Fix for Zero3 when MP>1 and at least one batch param undefined (microsoft#2289) Co-authored-by: anthony.301 <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * [downstream] merge from xpu support downstream * Unit test for bias add kernel (microsoft#2298) * added unit test * Update pt_binding.cpp * formatting * Update test_bias_add.py * Update relu.cu with mem_access_utils (microsoft#2306) * Add tensor parallel inference unit tests (microsoft#2232) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Sam Ade Jacobs <[email protected]> * Fix the residual add mp scaling for GPTNeoX (microsoft#2310) * Add unit tests for residual_add kernels (microsoft#2307) * add inference eval scripts (microsoft#2303) * Upgrade P40 tests to torch 1.8 (microsoft#2316) Co-authored-by: Jeff Rasley <[email protected]> * ZeRO-Inference blog (microsoft#2271) * ZeRO-Inference blog * ZeRO-Inference blog * Format fixes * Apply feedback * Feedback * Update docs/_posts/2022-08-27-zero-inference.md Co-authored-by: Stas Bekman <[email protected]> * Update docs/_posts/2022-08-27-zero-inference.md Co-authored-by: Stas Bekman <[email protected]> * Address feedback * Format fixes * More tweaks * long sequence, nvme offload * Add image Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * ZeRO-Inference blog - wrap up (microsoft#2321) * ZeRO-Inference blog - Update README (microsoft#2322) * refactor to use mem_access (microsoft#2317) * add quant unit test (microsoft#2315) * add quant unit test * add codeowner * format fix * fix undefined symbol: curandSetPseudoRandomGeneratorSeed * modify ref fn name and add comment * add comments * add 4bit quant 16groups * fix * modify groups in ref code * parameterize tensor shape * single param * detach tensor * remove -lcurand flag * add back -lcurand flag Co-authored-by: Ammar Ahmad Awan <[email protected]> * only override forward if using cuda-graph (microsoft#2291) * Add more options to inference benchmark (microsoft#2325) * bump to 0.7.4 * MOE residual matmult unit test (microsoft#2323) MOE residual matmul unit tests Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> * [device] port cuda device to literal_device() in new tests * MOE matmult with memaccess (microsoft#2336) * Fix formatting * Remove redundant variable * Refactor residual add kernels (microsoft#2333) Co-authored-by: Ammar Ahmad Awan <[email protected]> * [accel_runtime] add pin_memory to accelerator runtime interface. * mem access for quantize kernel (microsoft#2331) * mem access for quantize kernel * format * format fp32 * modify quant kernel * modify quant kernel2 * modify format * format * fix comments in pytest * fix comments in pytest * format * rerun Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Connor Holmes <[email protected]> * increase min pre-commit versions (microsoft#2346) * Extend scratch buffer for long prompts (microsoft#2212) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * fix zero docs (microsoft#2350) * Inference profiling updates/fixes (microsoft#2348) (microsoft#2349) Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> * Kernel Data Conversion Utility (microsoft#2327) * Unify macro definitions and constants in a single file * Conversion utility implementation. * Fix reversion from formatting * Bugfixes after testing with correct DeepSpeed * Inline markers are available on both HIP + CUDA * Add Onebit Optimzers in __init__ (microsoft#2340) Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * [accelerator abstraction] merge from microsoft#2320 * docs(mixture-of-experts-inference): fix typo in tuto (microsoft#2345) Co-authored-by: Olatunji Ruwase <[email protected]> * download cifar to blob storage (microsoft#2342) Co-authored-by: Olatunji Ruwase <[email protected]> * Refactor gptj_residual_add kernels for better readability (microsoft#2358) Co-authored-by: Reza Yazdani <[email protected]> * Updated issue templates (microsoft#2363) * Update issue templates * fix cuda invalid config error in dequant kernel (microsoft#2362) * format * remove round fn * Add missing pytest fixture scope (microsoft#2353) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> * Extend residual_add kernel tests to conver pre_attn_norm (microsoft#2354) Co-authored-by: Jeff Rasley <[email protected]> * Refactor fused_bias_residual kernels for better readability (microsoft#2356) Co-authored-by: Olatunji Ruwase <[email protected]> * Capture error message during sweep tests (microsoft#2351) * Collect error messages in results.csv Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * fix an exception when recursively casting dicts to fp16 (microsoft#2370) * Refactor remaining distributed tests (microsoft#2216) * batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: Olatunji Ruwase <[email protected]> * Fix the MLP output tensor's shape (microsoft#2380) * allow building with latest CUDA (11.8), it is backwards compatible (microsoft#2390) * pin transformers version for unit tests (microsoft#2402) * Change type to tuple in replace_wo_policy isinstance check (microsoft#2387) Update the isinstance check inside the `replace_wo_policy` function to `tuple` and `str` instead of `dict`, since the layers are provided as a `tuple` type. Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lok Chand Koppaka <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> * Checkpoint backwards-compatbility workaround (microsoft#2384) * Add predicated global load (microsoft#2373) Co-authored-by: Reza Yazdani <[email protected]> * change call site of literal_device, on_accel_device and accel_runtime to get_accelerator() call * add new interface definition from olruwase/accelerator_abstraction * MII blog post (microsoft#2418) Co-authored-by: Samyam Rajbhandari <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> * Fix figure reference (microsoft#2419) * [docs] update news items * [docs] add mii repo link * Add SLURM Multinode Runner (microsoft#2404) Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * Fix issue with corrupted output on long generation for GPT (microsoft#2359) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> * MII blog title update on Readme * DeepSpeed-MII title change in website * Fix GPT Neo-X multi-gpu inference (microsoft#2401) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * MII-Public and MII-Azure subheading in mii post * CI fixes related to triton (microsoft#2422) * [docs] update mii blog title (microsoft#2423) * add SD injection policy (microsoft#2381) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> * [accelerator abstraction] remove name() from interface, device_name() should be used. * merge with master (ec13da6) * fix checkpoint loading when it is a dictionary (microsoft#2425) * Make error regex more generic in collect_results.py (microsoft#2415) Co-authored-by: Jeff Rasley <[email protected]> * fixes microsoft#2389 (microsoft#2411) truncating expert param storage for checkpointing Co-authored-by: Alexander Jipa <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> * Fix for inference gpt-j test (microsoft#2430) * fix for gpt-j failing due to tokenizer error * limit number of gpt-j tokens generated due to low memory * Fixing bug 2361 (microsoft#2410) * fixing bug 2361 * adding pytest for config initialization * chaning expected output to FusedAdam * remove print statement * running yapf on modified files * running pre-commit formatting Co-authored-by: Olatunji Ruwase <[email protected]> * Universal checkpoint for zero stage 1 (microsoft#2284) * Refactor universal checkpointing and tensor fragments * Formatting * Support zero stage1; Expand TP dim * Remove debug prints * Detect sharded optimizer state * Format fixes * Encode reshaping guide * More symbolic constants Co-authored-by: Michael Wyatt <[email protected]> * only add deps if extra is explictly called (microsoft#2432) * Add TestInjectionPolicy inference unittest class for testing custom injection policies (microsoft#2426) This PR adds a TestInjectionPolicy inference unittest class for testing custom injection policies. This test differs from the existing tests in that the injection_policy dictionary is explicitly specified when calling the DeepSpeed init_inference API. The google/t5-v1_1-small text2text-generation model and the roberta-large fill-mask model are added as tests with the injection policy explicitly specified. This is done to expand our unittest coverage to test the path where the replace_wo_policy function is invoked (see microsoftGH-2387). Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> * [memory estimators] new config args sync (microsoft#2431) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * parallelize writing of layer checkpoint files across data parallel instances (microsoft#1419) * parallelize layer checkpoints across data parallel groups * use partition_uniform to determine start/end index values * formatting fix * config: add option for parallel write of layer checkpoints in pipeline stage * yapf fixes * enable parallel layer write according to config param * avoid extraneous makedir when rank 0 writes all layers Co-authored-by: Olatunji Ruwase <[email protected]> * Fix broken link to DeepSpeed Megatron fork (microsoft#2440) Co-authored-by: Lev Kurilenko <[email protected]> * bump to 0.7.5 * [OpBuilder] Add op builder abstraction * convert op builder usage in merged code * merge diff files from upstream * [OpBuilder] add create_op_builder interface in abstract_accelerator.py * remove files that is deleted from upstream * [OpBuilder] add left over op builder usage in tests * [OpBuilder] fix op builder usage in tests * [OpBuilder] fix <op builder>.NAME usage in tests to follow op builder abstraction design * import get_accelerator from deepspeed.accelerator directly * [OpBuilder] remove unused function and sync with main * add missing import * revert changes in device.py to avoid conflict with main * fix alexnet_model to use /tmp instead of /blob * Mingzhi/solve pr108 b (microsoft#115) * move ALL_OPs from __init__.py to all_Op.py to solve circular import * delete deepspeedexamples * fix import * fix regression (microsoft#117) * fix pin_memory * fix regression * fix error Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Mikhail Druzhinin <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Minjia Zhang <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Kamal Raj <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Zhihong Chen <[email protected]> Co-authored-by: Siddharth Singh <[email protected]> Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: 叶志晟 <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: trajep <[email protected]> Co-authored-by: chenguo <[email protected]> Co-authored-by: Arash Bakhtiari <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: anthony.301 <[email protected]> Co-authored-by: Sam Ade Jacobs <[email protected]> Co-authored-by: Guanhua Wang <[email protected]> Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Jean-Louis Queguiner <[email protected]> Co-authored-by: Matt Smith <[email protected]> Co-authored-by: Thomas-MMJ <[email protected]> Co-authored-by: lekurile <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lok Chand Koppaka <[email protected]> Co-authored-by: Samyam Rajbhandari <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Andrey Chernykh <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> Co-authored-by: Joe Mayer <[email protected]> Co-authored-by: Adam Moody <[email protected]> Co-authored-by: AGUL <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addresses #2184 and #2113