Skip to content

Commit

Permalink
Merge pull request openucx#7 from avildema/int4_ci_call
Browse files Browse the repository at this point in the history
Int4 ci call
  • Loading branch information
avildema authored May 19, 2021
2 parents 890b384 + 1e36529 commit f1a3767
Show file tree
Hide file tree
Showing 578 changed files with 40,186 additions and 18,184 deletions.
135 changes: 135 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# C
BasedOnStyle: LLVM
AlignEscapedNewlines: Indent
AlignConsecutiveAssignments: true
AlignConsecutiveDeclarations: false
AlignConsecutiveStructMembers: true
AlignConsecutiveMacros: true
AlignDeclarationByPointer: true
AlignAfterOpenBracket: true
AlignOperands: true
PointerAlignment: Right
DerivePointerAlignment: false
AlignTrailingComments: false
AllowAllArgumentsOnNextLine: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: false
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AllowShortEnumsOnASingleLine: false
AllowDesignatedInitializersOnASingleLine: false
AlwaysBreakAfterReturnType: None
PenaltyReturnTypeOnItsOwnLine: 20
PenaltyBreakAssignment: 100
PenaltyExcessCharacter: 100
PenaltyBreakBeforeFirstCallParameter: 100
PenaltyBreakMemberAccess: 250
PenaltyBreakLastMemberAccess: 300
PenaltyIndentedWhitespace: 0
ColumnLimit: 80
AlwaysBreakBeforeMultilineStrings: false
BinPackArguments: true
BinPackParameters: true
BreakBeforeBraces: Custom
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: true
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
SplitEmptyFunction: true
SplitEmptyRecord: true
SplitEmptyNamespace: true
BreakBeforeBinaryOperators: false
BreakBeforeTernaryOperators: false
BreakStringLiterals: true
ContinuationIndentWidth: 8
IncludeBlocks: Regroup
IndentCaseLabels: false
IndentWidth: 4
KeepEmptyLinesAtTheStartOfBlocks: false
IndentPPDirectives: None
MaxEmptyLinesToKeep: 2
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
SpaceInEmptyParentheses: false
SpaceBeforeParens: ControlStatementsExceptForEachMacros
SpaceBeforeAssignmentOperators: true
SpaceAfterCStyleCast: false
SortIncludes: false
ForEachMacros: ['_UCS_BITMAP_FOR_EACH_WORD',
'FOR_EACH_ENTITY',
'kh_foreach',
'kh_foreach_key',
'kh_foreach_value',
'ucp_unpacked_address_for_each',
'ucs_array_for_each',
'UCS_BITMAP_FOR_EACH_BIT',
'ucs_for_each_bit',
'ucs_for_each_submask',
'ucs_hlist_for_each',
'ucs_hlist_for_each_extract',
'ucs_hlist_for_each_extract_if',
'ucs_list_for_each',
'ucs_list_for_each_safe',
'ucs_memory_type_for_each',
'UCS_PP_FOREACH',
'UCS_PP_FOREACH_SEP',
'ucs_profile_for_each_location',
'ucs_ptr_array_for_each',
'ucs_ptr_array_locked_for_each',
'ucs_queue_for_each',
'ucs_queue_for_each_extract',
'ucs_queue_for_each_safe',
'ucs_timerq_for_each_expired',
'UCT_IB_IFACE_VERBS_FOREACH_RXWQE',
'UCT_RC_VERBS_IFACE_FOREACH_TXWQE',
'UCS_INIT_ONCE',
'UCS_TEST_F',
'UCX_PERF_TEST_FOREACH']
StatementMacros : []
TypenameMacros: ['khash_t', 'ucs_array_t']
WhitespaceSensitiveMacros: []

# CPP
Standard: Cpp11
AccessModifierOffset: -4
AlwaysBreakTemplateDeclarations: false
BreakBeforeInheritanceComma: false
BreakInheritanceList: AfterColon
BreakConstructorInitializers: AfterColon
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 4
Cpp11BracedListStyle: true
Cpp11BracedListLineBreak: true
FixNamespaceComments: true
NamespaceIndentation: None
UseTab: Never
ReflowComments: false
SortIncludes: false
IncludeCategories:
- Regex: '^"'
Priority: 1
- Regex: '^<'
Priority: 2
SortUsingDeclarations: true
TabWidth: 4
SpacesInAngles: false
SpacesBeforeTrailingComments: 1
SpaceAfterTemplateKeyword: false
SpacesInContainerLiterals: false
---
# Java
Language: Java
DisableFormat: true
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A clear and concise description of what the bug is.
### Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
- `cat /etc/issue` or `cat /etc/redhat-release` + `uname -a`
- For Nvidia Bluefield SmartNIC include `cat /etc/mlnx-release` (the string identifies software and firmware setup)
- For RDMA/IB/RoCE related issues:
- Driver version:
- `rpm -q rdma-core` or `rpm -q libibverbs`
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,8 @@ GTAGS
*.swp
compile_commands.json
.idea/
.externalToolBuilders
.classpath
.vscode
src/tools/vfs/ucx_vfs
test/apps/test_init_mt
8 changes: 7 additions & 1 deletion Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,12 @@
EXTRA_DIST =
ACLOCAL_AMFLAGS = -I config/m4

noinst_HEADERS = src/uct/api/uct.h src/uct/api/uct_def.h src/uct/api/tl.h
noinst_HEADERS = \
src/uct/api/uct.h \
src/uct/api/v2/uct_v2.h \
src/uct/api/uct_def.h \
src/uct/api/tl.h

doxygen_doc_files = $(noinst_HEADERS)

doc_dir = $(pkgdatadir)/doc
Expand All @@ -37,6 +42,7 @@ SUBDIRS += $(UCG_SUBDIR)
endif

SUBDIRS += \
src/tools/vfs \
src/tools/info \
src/tools/perf \
src/tools/profile \
Expand Down
181 changes: 175 additions & 6 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,17 +1,186 @@
#
## Copyright (C) Mellanox Technologies Ltd. 2001-2020. ALL RIGHTS RESERVED.
## Copyright (C) Mellanox Technologies Ltd. 2001-2021. ALL RIGHTS RESERVED.
## Copyright (C) UT-Battelle, LLC. 2014-2019. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2020. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2021. ALL RIGHTS RESERVED.
##
## See file LICENSE for terms.
##
#

## Current
### Features: TBD
#### UCX Core TBD
#### UCX Java (API Preview) TBD
### Bugfixes: TBD
### Features:
#### UCP
* Added API for querying UCP library attributes
### Bugfixes:

## 1.10.0 (March 9, 2021)
### Features:
#### Core
* Added support for Nvidia HPC SDK
* Added support for latest PGI and Clang
* Added support for ROCM-3.7+ (warning generated if older version detected)
* Added support for GCC11
#### Architecture
* Added Arm SVE memcpy()
* Redesigned Arm WFE support
* Improved clear_cache performance for Arm
* Added architecture detection for Zhaoxin CPU
#### CI
* Added release builds on CUDA 11
* Enabled performance validation in gtest
* Added new OS for release CI
#### UCP
* Added locality awareness to the transport selection logic for GPU devices
* Added put/offload/short and put/offload/zcopy protocols
* Added receive message nbx routine
* Reworked AM implementation and API, which adds support for RNDV semantics
* Added support for multi-lane connection manager over TCP
* Added support for printing AM tls with info log level
* Implement flush and destroy for UCT EPs on UCP worker
* Reduced UCP request size
* Added support for keepalive protocol
* Added support for multi-fragment protocol
* Added implementation for protocol progress for eager, bcopy, and multicopy
* Improved selection logic for protocol selection
* Added new protocols for UCP get operation
* Added bcopy protocols with support for GPU memory
* Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
* Set SOCKADDR_CM_ENABLE=y by default
* Added support for fast-path short with new tag protocols
* Added a new parameter to control the CM listener's backlog
* Added support sending AM RTS over short message protocol
* Added support for shared memory multi-lane when CM is used
* Added missing async locks
#### UCT
* Added API for keepalive_timeout value
* Added add uct_completion.status
* Allowed transports to access multiple mem_types
* Removed status arg from uct_completion_callback_t
* Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
* Updated documentation for uct_listener_params
* Lowered the log level for certain network errors
* Added cuda_copy wakeup feature
* Added wakeup support for shared memory
#### UCS
* Added "inf" and "auto" values to time units
* Added on-stack constructors for array and string buffer
* Added ucs_ptr_map_t data structure
* Added bool CSWAP
* Improved logging
* Added optimization for namespace processing
* Fixes for connection matching functionality
#### CUDA
* Added support for global IPC cache
#### RDMA CORE (IB, ROCE, etc.)
* Added support for auto detection of adapative routing settings
* Added an option to poll TX CQ every progress iteration
* Added local and remote addresses to the reject error message
* Added support for UAR allocation with non-cacheable memory type
* Added support for multiple flush cancel without completion
* Added async events callback support
* Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
* Added support for connection matching for UD
* Added a check for AM ordering
* Added better support for non-4K MTU values
#### Java (preview)
* Added support for a different javadoc executable path for different java versions
* Added UCS memory type constants
* Added support build on Java10+
* Added support for io-vector datatype.
* Removed libjucx from packages.
#### Tests
* Added CI for CUDA 11
* Added test_ucp_sockaddr_protocols.stream_short
* Reimplemented tests using NBX API
* Added flush(cancel) test
* Added memory_wait mode to perftest
* Added support for clang 10
* Refactored RMA and atomic tests, add memtype support
* Added test for uct_md_mem_query()
* Added request interrupt support
* Added support for connection manager fallbacks
* Added new ucp request test checking for leaks from the ptr_map
#### Documentation
* Added glossaries

### Bugfixes:
#### Portability
* Fixes in print functions to use format string like PRIx64, etc.
* Fixes for Arm v8 cross compilation support
#### Continues Integration:
* Fixes in Github release flow
* Fixes in docker image
#### Packaging
* Removed deb package dependencies
* Fixes in SPEC to make the RPM relocatable
#### Documentation
* Fixes in documentation for ucp_am_recv_data_nbx
* Fixes in quick start example
* Fixes in installation instruction
* Fixes in updates in author list
#### Tests
* Fixes for failures under valgrind runtime
* Fixes in mmap tests for 0-length RMA
* Fixes in definition of LAST_WQE wait timeout
* Fixes in ROCm for mem_buffer test
* Fixes in test name printing format
* Fixes in tcp_sockcm test
#### UCP
* Fixes in worker cleanup flow
* Fixes in RNDV RTS flow
* Fix in length check condition for RMA PUT short
* Fixes in handling failures from AM Bcopy
* Fix in a release flow of deferred data
* Fixes for invalid ID and handling of status in RNDV
* Fixes in short active message reply protocol
#### CUDA
* Fixes in managed memory support
* Fixes in topology detection
#### RDMA CORE (IB, ROCE, etc.)
* Fixes in assert definitions
* Fixes in printing an error about invalid AM Bcopy length for UD
* Fixes for thread safety support
* Fixes to get ROCE device name according to GID
* Fixes for SL selection
* Fixes in create STRICT_ORDER key
* Fixes addressing performance degradation in UD transport due to excess async events
* Fixes in QP destroy
* Fixes for CQ creation failure using old Verbs API
#### UGNI
* Fixing disable logic in config
* Fixing clang 11 warnings
#### Java
* Fixes in build dependencies
* Fixes in constructing UcpRequest object on error
* Fixes in exception handling on endpoint closure request
* Fixes for segfault in UcpErrorHandler
#### UCP
* Fixes in datatype support for get_zcopy RNDV
* Fixes in connection manager disconnect
* Fixes in assert definitions
* Fixes in completion flow for failed EP
* Fixes in flush error handling flow
* Fixes in latency calculations for wireup protocol
* Fixes in offload completion with inlined data
* Fixes in unpacking flow
* Fixes in error handling for various protocols
#### UCT
* Fixes in flush TX
* Fixes in checks for enabling GPU Direct RDMA
#### UCS
* Fixes for crashes on incorrect value set in config
* Fixes in ptr_array
* Fixes in maximal size for ucs_snprintf_safe()
* Fixes in compilation warning
* Fixes in ucs_aarch64_dsb(_op) definition
#### TCP
* Fixes in default route interface confirmation flow
* Fixes in PUT protocol
* Fixes in max connection limit and improved error reporting
#### UCM
* Fixing crash on prevent unload
* Fixes in libucm_rocm
* Fixes for few racing conditions

## 1.9.0 (September 19, 2020)
### Features:
Expand Down
Loading

0 comments on commit f1a3767

Please sign in to comment.