1 Star 0 Fork 0

夜之子/legion

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
CHANGES.txt 40.87 KB
一键复制 编辑 原始数据 按行查看 历史
Elliott Slaughter 提交于 2023-06-26 10:59 . Update CHANGES.txt.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838
This file lists the major changes as they appear in the stable branch. No
attempt is made to keep this list accurate for the master branch.
Version 23.06.0 (June 28, 2023)
* Build
- Fixes for CMake build on macOS
- Fixes for HIP build when arch is specified
* Realm
- Support for better backtraces via libdw and libunwind
- Improve scalability and performance in task spawning by caching
the triggering operation of an event if one is provided
- Fix a minor issue with affinity queries to properly clear the
user-provided vector before populating it
- Add more accurate GPU memory bandwidth affinity calculations if
NVML is available
- Refactor CPU core topology enumeration to serve systems without
NUMA capabilities (like Jetson ARM systems)
- Improve scalability and performance of task spawning by moving event
reuse freelists to be per-processor, reducing lock contention
- Add a microbenchmark for measuring task throughput more accurately
- Add a series of Realm API tutorials
- Replace `CU_EVENT_DEFAULT` with `CU_EVENT_DISABLE_TIMING` for better
performance of CUDA events
- Support Kokkos interop for the HIP module
- Fixes for Realm tests on macOS
* Tools
- Legion Prof now supports search in the new profiler UI
- Legion Prof now supports an HTTP client/server interface. Launch the
server with `--serve` (on port 8080 by default) and attach a client to
it with `--attach http://127.0.0.1:8080`
- Legion Prof now supports a new achival mode via the `--archive`
flag. Generate an offline profile and view it either via `--attach` or
by uploading it to a server and navigating to
`https://legion.stanford.edu/prof-viewer/?url=...`
- Legion Prof modes (client/server/viewer) are now parallel by
default, and perform heavy computations off the UI thread for
better responsiveness
- Add support for rendering indirect copies (i.e., gather/scatter)
- Fix rendering of profiles over HTTP with old profiler UI
- Fix profiling of copies with different numbers of hops between instances
Version 23.03.0 (March 27, 2023)
* Build
- Minimum supported CMake version is now 3.16. (Some optional features may
continue to require even newer versions.)
- Minimum supported GCC version is now 8.
- Minimum supported CUDA version is now 10.
* Legion
- Added support for padded layout constraints to provide scratch space
in instances for tasks to use (see examples/padded_instances).
- Added support for tiled layout constraints to provide an ability to
layout instances by breaking down dimensions (see examples/tiling).
* Realm
- An experimental UCX network backend has been added.
- Updated the Kokkos interop to support Kokkos 4.0.
* Python
- Support loading Legion as a library from a stock Python interpreter.
* Regent
- Fixes to avoid leaking futures.
- Improvements to Regent's predicate optimization.
* Tools
- Legion Prof now supports a native viewer UI. Enable it with the `viewer`
feature (e.g., `cargo run --features=viewer`) and use the flag
`--view`.
- Legion Prof now has better support for rendering a subset of available
nodes. Pass all log files (from all nodes) into Legion Prof and add the
`--subnodes` flag to specify which ones to render. This ensures all
copies in/out of those nodes will be shown correctly.
Version 22.12.0 (December 30, 2022)
* Regent
- Support for nested predication of `if` and `while` statements
* Realm
- Support priorities for Copy operations
- Support building with multiple network backends enabled, and use
-ll:networks (gasnetex/gasnet1/mpi/none) to pick which one to use
during runtime
- Separate CUDA runtime from Realm by removing all references to CUDA
runtime and relying only on driver API, which fixes an issue when
mixing static and dynamic cudart across an application and improves
Realm’s compatibility across driver versions
* Tools
- Legion Prof support visualization of Channel of indirect copy, and
Instances being used by different operations including Task, Copy
and Fill
Version 22.09.0 (September 30, 2022)
* Python
- Support for running packages via `legion_python -m`
- Support for Jupyter Notebook on single node execution.
* Regent
- Deprecated support for LLVM versions less than 11 in
`setup_env.py`. These versions will be removed in the next
release. LLVM 13 is recommended, except on ARM where LLVM 11 is
currently required
- Added support for provenance for all launcher operations
- Debug info is no longer generated by default in order to
optimize compile times. To re-enable it, run with
`-fdebuginfo 1`
* Legion
- Most Legion APIs now support passing a provenance string.
This provenance information is passed through to tools like
Legion Spy and Legion Prof so users can map what they are
seeing back to their source code. In the future, provenance
strings will also be used by all Legion error messages as well.
* Realm
- Support for fills of arbitrary instances (via multi-hop paths where
needed)
- Fixed crashes when using external instances and network-registered
memory at the same time
- Removed all direct references to CUDA runtime library in CUDA module
- Caching of minimum-cost data transfer path for repeated copies
- Dependent partitioning support for image and preimage using structured
(~affine) transforms in addition to existing unstructured (field-based)
images/preimages
Version 22.06.0 (June 29, 2022)
* Regent
- Support for cross-products in index launches, as well as
multi-level projection functors.
- Support for HIP on AMD GPUs has been added. All tasks marked with
`__demand(__cuda)` are automatically eligible. Note that the name of
the annotation may change in the future to something more general, but
for now no change is being made. Some CUDA flags have migrated to more
general names. See below.
- The flag `-fcuda 1` is deprecated. Use `-fgpu cuda` instead.
- The flag `-fcuda-offline` is deprecated. Use `-fgpu-offline` instead.
- The flag `-fcuda-arch` is deprecated. Use `-fgpu-arch` instead.
- Enable HIP support with `-fgpu hip` and use the `-fgpu-offline` and
`-fgpu-arch` flags as necessary/appropriate.
- Support for new flag `-ffast-math 1` which enables fast-math
optimizations on CPU and GPU. By default, CPU code has this
disabled, and GPU code uses only the `contract` flag in LLVM
to generate FMA instructions. For compute-intensive
applications, additional performance can sometimes be unlocked
by enabling the full suite of optimizations with `-ffast-math 1`,
at the cost of numerical accuracy.
- Performance improvements for CUDA allow recent LLVM versions
(e.g., 13) to match or exceed the performance of LLVM
3.8. Previously, performance regressions made LLVM 3.8 the
most performant version for use with CUDA. The recommended
LLVM version moving forward is 13, and `setup_env.py` has been
updated to set this on all platforms.
- The versions of GASNet and Terra are now pinned by default in
`setup_env.py`. You can choose versions explicitly with
`GASNET_VERSION` (as before, though the previous default was
unpinned) and `--terra-branch`, respectively.
* Realm
- Allow use of system OpenMP runtime (instead of Realm-provided one) with
`-DLegion_OpenMP_SYSTEM_RUNTIME=ON`. This allows inter-operation with
libraries that have already been linked to the system runtime, but
limits each process to a single OMP processor.
Version 22.03.0 (March 27, 2022)
* Build
- Minimum supported cmake version is now 3.7. (Some optional features
continue to require even newer versions.)
* Realm
- Numerous bug fixes in the `gasnetex` network layer
- CUDA and HIP support allow direct specification of which gpus to
use via `-ll:gpu_ids` command-line option
- Added support for copy paths using Cuda IPC between gpus on the same
physical node
- For applications using CUDA without the runtime API hijack AND only
submitting work to the default CUDA stream, `-cuda:legacysync 1`
improves the overhead of detecting the completion of device-side work
launched by a task
- Realm reduction copies may now indicate exclusive access to the
destination instance, improving performance by allowing simple
load/store instead of atomic operations
- Custom reduction operations (including Legion's built-in ones) can
provide HIP implementations, permitting in-place reductions in
HIP device memory
* Regent
- Support for custom serialization of types in task parameters and results
- New experimental timing library under std/timing
Version 21.12.0 (December 31, 2021)
* Realm
- Performance improvements for multi-dimensional copies, especially
inter-process transfers
- Support for loading CUDA driver (if present) at runtime instead of
link time, allowing same binary to be used on systems with and without
CUDA-capable GPUs (enabled with -DLegion_CUDA_DYNAMIC_LOAD=ON in
cmake build)
- A separate `Memory` is now created per process for external (system)
memory instances. This memory has no capacity for creating instances
and can confuse applications or Legion mappers that assume exactly
one Memory of kind `SYSTEM_MEM` exists. Old behavior can be obtained
with `-ll:ext_sysmem 0`, but this can fail for configurations that
register system memory with the network and/or GPUs
- The `MemoryQuery` now supports a `has_capacity` predicate to restrict
results to just memories with sufficient total (not current!) capacity
to allocate an instance of a specified size
* Build
- Cmake allows control of max nodes (-DLegion_MAX_NUM_NODES=...) and
max processors/node (-DLegion_MAX_NUM_PROCS=...) supported by
Legion build
- Added dependency tracking to make-based builds
Version 21.09.0 (September 28, 2021)
* Realm
- Numerous bug fixes in the `gasnetex` network layer
- Support for HIP memory type registration with GASNet (with
GASNet version 2021.9.0+)
- Arguments to spawned tasks may now be arbitrarily large (network-
specific limits have been eliminated)
* Regent
- Improved support for dynamic checks on index launches with
potential interference between different region arguments
- Extensive fixes for separate compilation. This mode has now
been verified to work with large-scale applications
- Removed long-obsolete support for `__demand(__external)`
* Pygion
- Add support for layout constraints
Version 21.06.0 (June 24, 2021)
* Build
- Version information is now compiled into Realm and Legion. This takes
the form of a string (e.g. "legion-21.06.0") rather than anything
that can be compared (i.e. no semantic versioning here). Compile-time
defines `REALM_VERSION` and `LEGION_VERSION` are available as well as
run-time calls `Realm::Runtime::get_library_version` and
`Legion::Runtime::get_library_version`.
* Regent
- Support for dynamic checks on projection functors, enabling a
much larger class of loops to be supported as index launches
- Support for local tasks (i.e., without going through the
runtime) via `__demand(__local)`
* Realm
- Windows (MSVC) builds are now tested in CI and and therefore more likely
to work
- Realm runtime can now be shutdown and reinitialized in the same process.
(Exception: GASNet-based network layers do not support this.)
- Registration of host memory with CUDA driver is skipped for host
memories larger than 1GB by default due to CUDA driver overhead.
This threshold can be increased (or decreased) with `-cuda:hostreg`
* Tools
- New Rust implementation of Legion Prof is 5-15x faster than the
original (even with PyPy). For more details, see:
https://legion.stanford.edu/profiling/#rust-legion-prof
Version 21.03.0 (March 30, 2021)
* Build
- Cmake can build an embedded copy of GASNet as part of the Legion build
with `-DLegion_EMBED_GASNet=ON`
* Regent
- Contains three breaking changes to the Regent calling convention:
- Reductions are now aggregated into region requirements and
sorted by the index of the first field in the field space
among the set of fields for each reduction.
- Task arguments may be passed through either `args` or
`local_args` for index launched tasks. (Previously Regent
only used `local_args`.)
- Region values passed via `args` to an index-launched task may
be *bogus*. Instead the region requirement should be used to
obtain the original region.
- Support for constant time index launches. These are enabled
automatically, but can be forced on or off with `__demand` or
`__forbid` with `__constant_time_launches`. This should
improve scalability at extreme node counts.
- Support for `rescape` and `remit` to generate metaprogrammed
code more easily.
- Experimental support for separate compilation via `-fspeparate 1`
allows Regent programs to be compiled in parts (potentially in
parallel). Note that separate compilation currently cannot be
used with Bishop and requires one of either parallel or
incremental compilation if `regentlib.start` is used (does not
apply to `regentlib.saveobj` or `regentlib.save_tasks`).
* Legion
- In the control replication branch users will find a new implementaiton
of Legion's physical analysis that uses heuristics to select which
sub-trees should be used for performing the analysis. Disjoint and
complete partitions are especially helpful in aiding the runtime.
- There is a new implementation of the index space math inside of the
runtime that now soundly and precisely detect congruences between
index space math operations. This fixes a long-running class of bugs
that would cause memory explosions in the physical analysis.
- In the control replication branch users can now map future values into
memories the same as they do with regions. This means that future
payloads can be placed directly on devices like GPUs. Similarly, the
runtime now accepts future data from tasks that also reside in any
memory in the machine including device memories.
- Both the master and control replication branches have support for
index space attach operations.
- Expensive transitive reductions on traces are now computed in the
background allowing trace replays to begin replaying immediately
with only partial optimizations.
* Realm
- Custom reduction operations (including Legion's built-in ones) can
provide CUDA implementations, permitting in-place reductions in
CUDA device memory
- Support for CUDA managed memory (via `-ll:msize`) that is coherent for
both host and device access. Includes support for `__managed__`
variables (only single-GPU if using CUDA runtime hijack mode)
- `Event::wait` may be called outside of Realm tasks, having the same
thread-blocking behavior as `Event::external_wait`
- Experimental support for AMD HIP. Note that testing coverage is
incomplete, and breakages may occur in between releases. For more
details, see:
https://github.com/StanfordLegion/legion/issues/1028
version 20.12.0 (December 28, 2020)
* Build
- Legion and Realm now require a compiler with (at least) c++11 support
- Python scripts (e.g. legion_prof and legion_spy) require Python 3.5
* Realm
- Improved performance of inter-node instance copies when data is not
contiguous in source and/or destination
- Improved responsiveness of utility processors by not using them for
background work by default
- Experimental support for building on Windows with MSVC
- Improved performance (and correctness) when running CUDA tasks without
the runtime hijack enabled
- Added `gasnetex` network layer that uses GASNet-EX's native API (instead
of the legacy GASNet-1 API support). Requires GASNet version 2020.11.0
or newer. For more details, see:
https://github.com/StanfordLegion/legion/issues/986
* Legion
- The mapping interface no longer requires the runtime to return valid
instances for empty regions (e.g. regions with no points their index space)
* Tools
- Legion Spy now has support for arbitrary number of dimensions
* Examples
- `examples/nccl` gives a simple example of using NCCL with Legion
Version 20.09.0 (September 28, 2020)
* Legion
- Support for mapper-controlled reuse of reduction instances. See:
https://github.com/StanfordLegion/legion/issues/545
- Support for creating compact instances of sparse index spaces. See:
https://github.com/StanfordLegion/legion/issues/624
* Realm
- Switched from function-specific internal threads to generic "background
workers" that are shared by all subsystems. The number of workers is
controlled by `-ll:bgwork` (default=2). For further details, see:
https://github.com/StanfordLegion/legion/issues/662
- Numerous bug/performance/memory leak fixes
- Support for OpenMP-enabled code running on a Python processor. The
total number of threads available to the processor is set with
`-ll:pyomp` (default=1 - i.e. just the initial thread)
- Support for C++ tasks on Python processors. A C++ task does NOT take
the Python GIL by default - the task body should call
`PyGILState_{Ensure,Release}` as needed
- Increased the maximum number of instances in a single memory from 64K
to 4 million.
- Improved performance of concurrent CUDA GPU->GPU copies with 3+ GPUs
* Tools
- An installed version of Legion now includes legion_spy, legion_prof
scripts
Version 20.06.0 (June 29, 2020)
* Regent
- Support for `std/format` module for type-safe formatted printing
- Support for documentation with LDoc
- Support for `__future` operator to import a C API future
* Legion
- Support for inlining tasks into leaf contexts
- Support for global registration callbacks inside of tasks
- Added semantic tags for source file and line location
- Support for multi-region accessors for region requirements with
co-location constraints
- Changes to semantics of deletion for index spaces, field spaces, and
logical regions. For details, see:
https://github.com/StanfordLegion/legion/issues/812
- Support for creating fields spaces with initial fields
* Realm
- Subgraphs can be used to capture a template of Realm operations
that will be executed repeatedly. Subgraph definitions include
support for "interpolating" values into individual operations'
arguments on each instantiation of the subgraph template
- `create_weighted_subspaces` supports `size_t` weights for precise
control over the size of each subspace
- Added support for `omp critical` constructs and dynamic loop
schedules in OpenMP tasks
- Added support for `cudaStreamLegacy` and `cudaStreamPerThread` in
CUDA tasks
- Realm logs now include a timestamp (relative to runtime init)
by default. This behavior can be disabled with `-logtime 0`
- Performance improvements for copies/fills of 3D instances spaces in
GPU device memory
- Added ability to compute a set of "covering rectangles" for sparse
index spaces, allowing more compact representation in memory
- Added `MultiAffineAccessor` for accessing compact instances
- Added ability to delete a `ProcessorGroup`
Version 20.03.0 (March 31, 2020)
* Regent
- Behavior change: `__fields` and `__physical` now both require
explicit field names, i.e., `__fields(r.{x, y})` rather than
`__fields(r)`. This makes the behavior more unambiguous and
helps to avoid bugs
- Added `complete` and `incomplete` keywords that can be used to
mark partitions as such
- Added support for setting mapper ID and tag via
`t:set_mapper_id()` and `t:set_mapping_tag_id()`
- Initial support for predicated execution of `if` and `while`
statements
- Fixed several bugs, memory leaks and improved compile times
* Legion
- Introduction of Fortran bindings for Legion
- Support for creating deferred index spaces from future values
- Support for construction of partitions from a map of domains or
from a future map
- Support for reducing a future map to a single future asynchronously
* Realm
- Support for Kokkos parallel launch constructs in Realm (and therefore
Legion) tasks. Currently supported Kokkos execution spaces
are: Serial, OpenMP, CUDA. Application data remains in logical
regions, but accessors can be converted to Kokkos (unmanaged) Views
if needed. See the `kokkos_interop` example
- Introduction of experimental MPI-based network layer, enabled with
`REALM_NETWORKS=mpi` (make) or `-DRealm_NETWORKS=mpi` (cmake).
Use `REALM_NETWORKS=gasnet1` (or USE_GASNET=1, which still works)
for the GASNet-based network layer (which works with GASNet-1 or
GASNet-EX)
- CUDA Runtime API interposer (a.k.a. "hijack") can now be disabled with
`USE_CUDART_HIJACK=0` (make) or `-DLegion_HIJACK_CUDART=OFF` (cmake).
This can reduce effectivenes of task-parallelism for CUDA tasks, so
use only if needed
- More control over GPU selection via: `-cuda:skipgpus N` which leaves the
first N GPUs available for other uses, `-cuda:skipbusy` which skips
over busy GPUs, and `-cuda:minavailmem M` which skips GPUs with less
than M device memory available
- Reduction in memory usage of Realm internal data structures
* Tools
- There is a now a generic launcher script for running Python code
with Legion that will execute an aribtrary Python program in the
top-level task of a Legion program. This script mirrors the interface
to CPython as closely as possible.
- Legion Spy now supports verification and rendering of indirection copies
- Legion Prof supports Instance layout constraints related to dimension
ordering and field alignnment
- Legion Prof contains a menu option for viewing ready state of operations
Version 19.12.0 (December 31, 2019)
* Build
- Both builds (Make and CMake) now generate `legion_defines.h` and
`realm_defines.h`. By default these headers are generated in
the source directory (Make) or build directory (CMake). This
means that languages such as Regent and Python no longer
require MAX_DIM to be specified explicitly
* Regent
- Support for CUDA 10
- Support for field polymorphic tasks
- Substantially improved the generality of the index launch
optimization. Task arguments of the form p[i+k] may now be
used, where k is a variable defined outside of the loop
- Add flag `-foverride-demand-index-launch` which can be used to
force loops to be index launched in cases where the compiler
cannot prove the disjointness of read-write region
arguments
- Added reductions for complex64
- The scripts `install.py` and `setup_env.py` now use CMake to
build Terra by default, which should improve portability on
most machines
- The behavior of `-fcuda 1` has changed: this flag will now issue
an error if CUDA cannot be enabled (e.g. because the build
does not support CUDA, or because the machine has no
GPUs). Omitting this flag will now enable CUDA if it is
available (and will not error if it is not available).
The behavior of `-fopenmp 1` has changed similarly.
- The behavior of `__demand(__cuda)` has changed. This will now
issue an error if a loop is not eligible for the CUDA
transformation, regardless of whether CUDA is actually
available on the current machine or not. The behavior of
`__demand(__openmp)` has changed similarly.
- The annotation `__allow(__cuda)` is now permitted, and permits
(but does not require) tasks to be optimized with CUDA.
- Experimental support for 2D kernel launch in the CUDA code generation
* Python
- Add support for copies
- Copies and fills now support multiple fields
- Tasks (including index launches) now support setting the mapper
ID and tag
* Legion
- A major overhaul of the Legion physical analysis to use an
approach based on bounding volume hierarchies. The change is
not visible to users, but will likely impact performance. Most
programs will get faster; programs that create many partitions
frequently on the fly may get slower. The later case will be fixed
in an upcoming release.
- Added support for indirect copy operations such as gather and
scatter onto existing copy launchers
* Realm
- `Event::subscribe` allows polling via `Event::has_triggered` to
(eventually) succeed
- Addition of `CompletionQueue` objects that allow multiple unordered
`Event` triggers to be efficiently handled by a single consumer
- Support for `omp_get_level`, `omp_in_parallel`, and
`omp_set_num_threads` in tasks running on OpenMP processors
- Support for unstructured scatter and/or gather in copies. (Handling
structured cases as well as fills/reductions remains a work in
progress.)
- Removed all calls to `Event::wait` from inside other Realm API calls.
Applications now must make sure that index spaces and instance
metadata are valid before use. For details, see:
https://github.com/StanfordLegion/legion/issues/465
Version 19.09.1 (September 13, 2019)
* Regent
- Fix for correctness bug in task inlining. See:
https://github.com/StanfordLegion/legion/issues/582
Version 19.09.0 (September 9, 2019)
* Regent
- __demand(__index_launch) has been added as an alternative to
__demand(__parallel) on for loops that avoids confusion with the
auto-parallelizer. __demand(__parallel) on for loops is deprecated and
now issues a warning; in a future release this warning will be
upgraded to an error. For details, see:
https://github.com/StanfordLegion/legion/issues/520
- Multi-field expasion is deprecated and now issues an error. The error
can be temporarily downgraded to a warning, but it is advised that
users migrate codes away from this syntax as it will become a hard
error in a future release. For details, see:
https://github.com/StanfordLegion/legion/issues/501
* Legion
- Support for a built-in collection of reduction operators including
sum, product, max, and min over a variety of types for CPUs and GPUs
* Realm
- assorted bug, performance, and memory leak fixes
- fills to attached HDF5 instances are orders of magnitude faster
- support for reusing HDF5 file handles with `-hdf5:openfiles` option
- control which rank opens an HDF5 file with a `rank=nnn:` filename prefix
* Build System
- Makefile-based flow attempts to detect CUDA location and GASNet conduit
if they are not specified
- Makefile-based flow defaults to building CUDA fat binaries, but can still
be overridden with the `GPU_ARCH` setting, which now accepts SM arch
numbers (e.g. "70") as well as names (e.g. "volta")
Version 19.06.0 (June 27, 2019)
* Legion
- All tools (Legion Prof, Legion Spy, etc.) now support Python 2 and 3
- The flag -lg:warn_backtrace prints a backtrace on each warning
to allow easier pinpointing of problematic code
* Realm
- Support for building against debug versions of GASNet
- Significantly reduced runtime overhead for small Realm tasks
- External HDF5 instances work with datasets in groups
- Scheduler locking allows spin-waiting for non-reentrant
operations (e.g. Python module imports)
- Memory size (e.g. "-ll:csize") arguments accept k/m/g/t
size suffixes
- Better error messages when Realm memory sizes are too large
* Regent
- The image, preimage and restrict partitioning operators now
accept an optional disjoint or aliased keyword to specify the
disjointness of the resulting partition
- The address of operator (&) is now supported
- Support for explicit field maps for HDF5
* Legion Prof
- Menu option to select a subset of the profile information
for viewing
- Grouping of memory channels, utilization and additional details
such as source and destination nodes/processors associated with
the channels
- Physical instances contain additional information about the regions
they belong to
* Python
- Support for partitioning operators equal and restriction
- Support for bool and complex types
- Support for must epoch launches
- Support for returning a future out of a fence
- Fixes for macOS
Version 19.04.0 (April 30, 2019)
* Legion
- Support for dimensions > 3. Set MAX_DIM at build time
(or -DLegion_MAX_DIM in CMake) to build with any number of
dimensions up to 9.
- Change VariantID to 32 bits to match AUTO_GENERATE_ID
- Improved mapper interfaces for instance allocation and
failed instance allocation due to layout constraint conflicts
* Regent
- Support for index fills
- Support for disabling structure-slicing on structs by setting
__no_field_slicing on the struc type
- Substantial improvements to the auto-parallelizer, CUDA and
OpenMP code generators
- Substantial improvements in compile time for tasks with large
numbers of fields
- Build fixes for macOS
- setup_env.py now works on macOS
* Realm
- support for #pragma omp single sections in OpenMP processors
- Realm IDs uses explicit bit packing instead of fragile C bit fields
- numerous fixes for create_equal_subspace deppart operations
- Support for CUDA 10
* Legion Prof
- Added support for recording GPU processor times
Version 18.12.0 (December 27, 2018)
* Realm
- More assorted bug fixes
- Minor performance improvements in logging and accessor code
- Handle signals on an alternate stack for better debugging/backtraces
* Regent
- Added a new built-in complex type
- Experimental support for building with PUC Lua
- Multiple fixes to CUDA code generation, vectorization,
auto-parallelization, and mapping optimization
- Better error messages for __demand(__leaf) and so on
* Python
- Use PyGILState for threading for compatibility with modules (e.g. numpy)
- Support for calling tasks written in Regent
Version 18.09.0 (September 19, 2018)
* Legion
- Support for physical tracing, which can provide up to 7x improvement in
loops with very small tasks. Can be enabled in the mappers that
inherit from DefaultMapper using -dm:memoize 1
* Realm
- Assorted minor bug fixes
- Support for development snapshots of GASNet-EX (using GASNet-1
compatibility interfaces for now)
* Regent
- Changed precedence of logical operators (and, or) to match that of
Lua and Terra (or is now lower-precedence than and)
- Full support for accessing sparse multi-dimensional regions
- Initial support for incremental compilation. Enable with
REGENT_INCREMENTAL=1
- Changes to make compilation entirely deterministic
- Multiple compilation speed improvements
- Support for CUDA scalar reductions
- Experimental support for parallel prefix operators, including CUDA
* Python
- Support for defining methods as tasks
- Support for passing futures to tasks and index tasks
- Support for explicit return types on extern tasks
- Improved support for Futures with encodings other than pickle
Version 18.05.0 (May 31, 2018)
* Legion
- Migrated all node-local Legion reservations to use Realm
fast reservations and removed no longer necessary continuations
- Added support for mapper attached data to all Mappable types
- Added support for assigning a block of IDs to a library in a consistent
way across nodes via generate_library_task_ids and friends
* Realm
- Added support for "fast" reservations that have better
performance characteristics for reservations local to a node
* C API
- Updated projection functor API to match Legion C++ API
* Regent
- Regent now generates disjointness constraints for affine
expressions in partition accesses. E.g. p[i] and p[i+1] are
now known to be disjoint at compile time as long as p is a
disjoint partition
- Support for non-trivial projection functors in index space launches
such as f(p[i+1])
- Improvements to compile time spent in various optimization passes
- Support for parallel compilation with the flag -fjobs N
- Miscellaneous fixes
Version 18.02.0 (February 2, 2018)
* Legion
- Support for PowerPC vector intrinsics
- FieldAccessors support "view" coordinates and equivalent bounds checks
- Improved schedule priorities for Legion meta-tasks
* Realm
- Operation priority can now be adjust after a task/copy is launched
- Assorted bug/memory leak fixes
- AffineAccessors support an optional translation from "view" coordinates
to actual coordinates in the instance being accessed
* Regent
- Experimental support for calling Regent tasks from C/C++
- Support for building with CMake
- Support for running on PowerPC
* Bindings
- Obsolete Lua and Terra bindings have been removed. The remaining Terra
bindings have been renamed to Regent and now produce libregent.so
Version 17.10.0 (October 27, 2017)
* Legion
- Introduction of new partitioning API based on dependent partitioning
- Deprecation of old partitioning API, LegionRuntime::{Arrays,Accessors}
namespaces
* Realm
- Dependent partitioning API, including dimension-aware IndexSpace
- Point/Rect types moved to Realm namespace
- Instance creation allows caller to choose precise memory layout
- Accessors moved to Realm namespace, changed to match new instance layouts
* C API
- The C API is now accessed via the `legion.h` header file. Note that this
is still a redirect back to the current `legion/legion_c.h` header
* Legion Prof
- Added support for minimally invasive dumping of intermediate
profiling data while the application is still running for long runs
* Python
- New Python API bindings and native support for Python processors
Compile with USE_PYTHON=1 and run with -ll:py 1 to enable Python
Also see examples/python_interop for an example
Version 17.08.0 (August 24, 2017)
* Build system
- Added HDF_ROOT variable to customize HDF5 install location
* Legion
- New error message format and online reference at
http://legion.stanford.edu/messages
* Legion Prof
- Added new compact binary format for profile logs
- Added flag: -hl:prof_logfile prof_%.gz
* Realm
- Fixes to support big-endian systems
- Several performance improvements to DMA subsystem
- Added REALM_DEFAULT_ARGS environment variable
containing flags to be inserted at front of command line
* Regent
- Removed new operator. Unstructured regions are now
fully allocated by default
- Added optimization to automatically skip empty tasks
- Initial support for extern tasks that are defined elsewhere
- Tasks that use __demand(__openmp) are now constrained
to run on OpenMP processors by default
- RDIR: Better support for deeper nested region trees
Version 17.05.0 (May 26, 2017)
* Build system
- Finally removed long-obsolete SHARED_LOWLEVEL flag
* Legion
- Added C++14 [[deprecated]] attribute to existing deprecated APIs.
All examples should all compile without deprecation warnings
- Added Legion executor that enables support for interoperating
with Agency inside of Legion tasks
* Realm
- Switched to new DMA engine
- Initial support for OpenMP "processors". Compile with USE_OPENMP
and run with flags -ll:ocpu and -ll:othr.
* Regent
- Added support running normal tasks on I/O processors
- Added support for OpenMP code generation via __demand(__openmp)
* C API
- Removed the following deprecated types:
legion_task_result_t
(obviated by the new task preamble/postamble)
- Removed the following deprecated APIs:
legion_physical_region_get_accessor_generic
legion_physical_region_get_accessor_array
(use legion_physical_region_get_field_accessor_* instead)
legion_runtime_set_registration_callback
(use legion_runtime_add_registration_callback instead)
legion_runtime_register_task_void
legion_runtime_register_task
legion_runtime_register_task_uint32
legion_runtime_register_task_uint64
(use legion_runtime_preregister_task_variant_* instead)
legion_future_from_buffer
legion_future_from_uint32
legion_future_from_uint64
legion_future_from_bytes
(use legion_future_from_untyped_pointer instead)
legion_future_get_result
legion_future_get_result_uint32
legion_future_get_result_uint64
legion_future_get_result_bytes
(use legion_future_get_untyped_pointer instead)
legion_future_get_result_size
(use legion_future_get_untyped_size instead)
legion_future_map_get_result
(use legion_future_map_get_future instead)
Version 17.02.0 (February 14, 2017)
* General
- Bumped copyright dates
* Legion
- Merged versioning branch with support for a higher performance
version numbering computation
- More efficient analysis for index space task launches
- Updated custom projection function API
- Added support for speculative mapping of predicated operations
- Added index space copy and fill operations
* Legion Prof
- Added a stats view of processors grouped by node and processor type
- Added ability to collapse/expand each processor/channel/memory in
a timeline. To collapse/expand a row, click the name. To
collapse/expand the children of a row, click on the triangle
next to the name.
- Grouped the processor timelines to be child elements under the stats
views
- Added on-demand loading of each processor/stats in a timeline.
Elements are only loaded when you expand them, saving bandwidth
* CMake
- Switched to separate flags for each of the Legion extras directories:
-DLegion_BUILD_APPS (for ./apps)
-DLegion_BUILD_EXAMPLES (for ./examples)
-DLegion_BUILD_TUTORIAL (for ./tutorial)
-DLegion_BUILD_TESTS (for ./test)
Version 16.10.0 (October 7, 2016)
* Realm
- HDF5 support: moved to Realm module, added DMA channels
- PAPI support: basic profiling (instructions, caches, branches) added
* Build flow
- Fixes to support compilation in 32-bit mode
- Numerous improvements to CMake build
* Regent
- Improvements to vectorization of structured codes
* Apps
- Removed bit-rotted applications - some have been replaced by examples
or Regent applications
* Tests
- New test infrastructure and top-level test script `test.py`
Version 16.08.0 (August 30, 2016)
* Realm
- Critical-enough ("error" and "fatal" by default, controlled with
-errlevel) logging messages are mirrored to stderr when -logfile is
used
- Command-line options for logging (-error and new -errlevel) support
English names of logging levels (spew, debug, info, print,
warn/warning, error, fatal, none) as well as integers
* Legion
- Rewrite of the Legion shutdown algorithm for improved scalability
and avoiding O(N^2) behavior in the number of nodes
* Regent
- Installer now prompts for RDIR installation
* Tools
- Important Legion Spy performance improvements involving transitive
reductions
Version 16.06.0 (June 15, 2016)
* Legion
- New mapper API:
use ShimMapper for limited backwards compatibility
- New task variant registration API
supports specifying layout constraints for region requirements
old interface is still available but deprecated
- Several large bug fixes for internal version numbering computation
* C API
- The context parameter for many API calls has been removed
* Tools
- Total re-write of Legion Spy
Version 16.05.0 (May 2, 2016)
* Lots of stuff - we weren't itemizing things before this point.
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/z778520016/legion.git
git@gitee.com:z778520016/legion.git
z778520016
legion
legion
stable

搜索帮助

0d507c66 1850385 C8b1a773 1850385