forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 1
/
RELEASE_NOTES
125 lines (104 loc) · 5.52 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
RELEASE NOTES FOR SLURM VERSION 17.02
13 July 2016
IMPORTANT NOTES:
THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO RUN BUT
NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured
MaxJobID value as needed to eliminate any confusion.
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 17.02 slurmdbd will work with Slurm daemons of version 15.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld. The
buffer pool size must be smaller than the size of the MySQL tmpdir. This is
needed when converting large tables over to the new database schema.
Slurm can be upgraded from version 15.08 or 16.05 to version 17.02 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.
If using SPANK plugins that use the Slurm APIs, they should be recompiled when
upgrading Slurm to a new major release.
NOTE: If you are not using Munge, but are using the "service" scripts to
start Slurm daemons, then you will need to remove this check from the
etc/slurm*service scripts.
HIGHLIGHTS
==========
-- In order to support federated jobs, the MaxJobID configuration parameter
default value has been reduced from 2,147,418,112 to 67,043,328 and its
maximum value is now 67,108,863. Upon upgrading, any pre-existing jobs that
have a job ID above the new range will continue to run and new jobs will get
job IDs in the new range.
-- The database index for jobs is now 64 bit. If you happen to be close to
4 billion jobs in your database you will want to update your slurmctld at
the same time as your slurmdbd to prevent roll over of this variable as
it is 32 bit previous versions of Slurm.
-- All memory values (in MB) are now 64 bit. Previously, nodes with > 2TB of
memory would not schedule or enforce memory limits correctly.
-- Removed AIX, BlueGene/L and BlueGene/P support.
RPMBUILD CHANGES
================
CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
-- Add new TRESWeights option to NodeName line for caclulating how busy a node
is. Currently only used for federation configurations.
COMMAND CHANGES (see man pages for details)
===========================================
-- Add commands to sacctmgr for managing and displaying federations.
OTHER CHANGES
=============
API CHANGES
===========
Changed members of the following structs
========================================
Memory conversion to uint64_t in: job_descriptor, job_info, node_info,
partition_info, resource_allocation_response_msg, slurm_ctl_conf,
slurmd_status_msg, slurm_step_ctx_params_t
Variables converted are: actual_real_mem, def_mem_per_cpu, free_mem,
max_mem_per_cpu, mem_spec_limit, pn_min_memory, real_memory, req_mem.
Added members to the following struct definitions
=================================================
In slurmdb_cluster_cond_t: Added List federation_list
In slurmdb_cluster_rec_t: Added fed, lock, sockfd
In struct job_record: Added fed_details
In connection_arg_t: Added bool persist
In slurmctld_lock_t: Added federation
In will_run_response_msg_t: Added double sys_usage_per to report back how busy a
cluster is.
In slurm_ctl_conf: Added mail_domain.
In slurm_msg_t: Added buffer to keep received message buffer to use for later
purposes.
In job_desc_msg_t: Added fed_siblings to track which clusters have sibling jobs.
In slurm_job_info_t: Added fed_origin_str, fed_siblings, fed_siblings_str to
display job federation information.
Added the following struct definitions
======================================
Added slurmdb_cluster_fed_t to store federation information on
slurmdb_cluster_rec_t.
Added slurmdb_federation_cond_t for selecting federations from db.
Added slurmdb_federation_rec_t to represent federation objects.
Added job_fed_details_t for storing federated job information.
Added sib_msg_t for sending messages to siblings.
Removed members from the following struct definitions
=====================================================
Changed the following enums and #defines
========================================
Added MAX_JOB_ID (0x03FFFFFF)
Added DEBUG_FLAG_FEDR flag for federation debugging.
Added cluster_fed_states enum and defines for federation states.
Changed DEFAULT_MAX_JOB_ID from 0x7fff0000 to 0x03ff0000.
Added SELECT_NODEDATA_TRES_ALLOC_FMT_STR to select_nodedata_type.
Added SELECT_NODEDATA_TRES_ALLOC_WEIGHTED to select_nodedata_type.
Changed MEM_PER_CPU flag to 0x8000000000000000 from 0x80000000.
Added SLURM_MSG_KEEP_BUFFER msg flag to instruct slurm_receive_msg() to save the
buffer ptr.
Added the following API's
=========================
Added slurm_load_federation() to retrieve federation info from cluster.
Added slurm_print_federation() to print federation info retrieved from cluster.
Added slurm_get_priority_flags() to retrieve priority flags from slurmctld_conf.
Changed the following API's
============================