Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas-2
Server suspend and resume is a feature that allows a running server to
gracefully finish of all running requests. The most common use case for
this is graceful shutdown, where you would like a server to complete all
running requests, reject any new ones, and then shut down, however there
are also plenty of other valid use cases (e.g. suspend the server,
modify a data source or some other config, then resume).

User View:

For the users point of view two new operations will be added to the server:

suspend(timeout)
resume()

A runtime only attribute suspend-state (is this a good name?) will also
be added, that can take one of three possible values, RUNNING,
SUSPENDING, SUSPENDED.

A timeout attribute will also be added to the shutdown operation. If
this is present then the server will first be suspended, and the server
will not shut down until either the suspend is successful or the timeout
occurs. If no timeout parameter is passed to the operation then a normal
non-graceful shutdown will take place.

In domain mode these operations will be added to both individual server
and a complete server group.

Implementation Details

Suspend/resume operates on entry points to the server. Any request that
is currently running must not be affected by the suspend state, however
any new request should be rejected. In general subsystems will track the
number of outstanding requests, and when this hits zero they are
considered suspended.

We will introduce the notion of a global SuspendController, that manages
the servers suspend state. All subsystems that wish to do a graceful
shutdown register callback handlers with this controller.

When the suspend() operation is invoked the controller will invoke all
these callbacks, letting the subsystem know that the server is suspend,
and providing the subsystem with a SuspendContext object that the
subsystem can then use to notify the controller that the suspend is
complete.

What the subsystem does when it receives a suspend command, and when it
considers itself suspended will vary, but in the common case it will
immediatly start rejecting external requests (e.g. Undertow will start
responding with a 503 to all new requests). The subsystem will also
track the number of outstanding requests, and when this hits zero then
the subsystem will notify the controller that is has successfully
suspended.
Some subsystems will obviously want to do other actions on suspend, e.g.
clustering will likely want to fail over, mod_cluster will notify the
load balancer that the node is no longer available etc. In some cases we
may want to make this configurable to an extent (e.g. Undertow could be
configured to allow requests with an existing session, and not consider
itself timed out until all sessions have either timed out or been
invalidated, although this will obviously take a while).

If anyone has any feedback let me know. In terms of implementation my
basic plan is to get the core functionality and the Undertow
implementation into Wildfly, and then work with subsystem authors to
implement subsystem specific functionality once the core is in place.

Stuart







The

A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Andrig Miller
What I am bringing up is more subsystem specific, but it might be valuable to think about.  In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction?

Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered?

Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction?

I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me.

Andy

----- Original Message -----

> From: "Stuart Douglas" <[hidden email]>
> To: "Wildfly Dev mailing list" <[hidden email]>
> Sent: Monday, June 9, 2014 4:38:55 PM
> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)
>
> Server suspend and resume is a feature that allows a running server
> to
> gracefully finish of all running requests. The most common use case
> for
> this is graceful shutdown, where you would like a server to complete
> all
> running requests, reject any new ones, and then shut down, however
> there
> are also plenty of other valid use cases (e.g. suspend the server,
> modify a data source or some other config, then resume).
>
> User View:
>
> For the users point of view two new operations will be added to the
> server:
>
> suspend(timeout)
> resume()
>
> A runtime only attribute suspend-state (is this a good name?) will
> also
> be added, that can take one of three possible values, RUNNING,
> SUSPENDING, SUSPENDED.
>
> A timeout attribute will also be added to the shutdown operation. If
> this is present then the server will first be suspended, and the
> server
> will not shut down until either the suspend is successful or the
> timeout
> occurs. If no timeout parameter is passed to the operation then a
> normal
> non-graceful shutdown will take place.
>
> In domain mode these operations will be added to both individual
> server
> and a complete server group.
>
> Implementation Details
>
> Suspend/resume operates on entry points to the server. Any request
> that
> is currently running must not be affected by the suspend state,
> however
> any new request should be rejected. In general subsystems will track
> the
> number of outstanding requests, and when this hits zero they are
> considered suspended.
>
> We will introduce the notion of a global SuspendController, that
> manages
> the servers suspend state. All subsystems that wish to do a graceful
> shutdown register callback handlers with this controller.
>
> When the suspend() operation is invoked the controller will invoke
> all
> these callbacks, letting the subsystem know that the server is
> suspend,
> and providing the subsystem with a SuspendContext object that the
> subsystem can then use to notify the controller that the suspend is
> complete.
>
> What the subsystem does when it receives a suspend command, and when
> it
> considers itself suspended will vary, but in the common case it will
> immediatly start rejecting external requests (e.g. Undertow will
> start
> responding with a 503 to all new requests). The subsystem will also
> track the number of outstanding requests, and when this hits zero
> then
> the subsystem will notify the controller that is has successfully
> suspended.
> Some subsystems will obviously want to do other actions on suspend,
> e.g.
> clustering will likely want to fail over, mod_cluster will notify the
> load balancer that the node is no longer available etc. In some cases
> we
> may want to make this configurable to an extent (e.g. Undertow could
> be
> configured to allow requests with an existing session, and not
> consider
> itself timed out until all sessions have either timed out or been
> invalidated, although this will obviously take a while).
>
> If anyone has any feedback let me know. In terms of implementation my
> basic plan is to get the core functionality and the Undertow
> implementation into Wildfly, and then work with subsystem authors to
> implement subsystem specific functionality once the core is in place.
>
> Stuart
>
>
>
>
>
>
>
> The
>
> A timeout attribute will also be added to the shutdown command,
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests.


> On 9 Jun 2014, at 18:04, Andrig Miller <[hidden email]> wrote:
>
> What I am bringing up is more subsystem specific, but it might be valuable to think about.  In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction?

It waits for the transaction to finish before shutting down.

Stuart

>
> Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered?
>
> Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction?
>
> I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me.
>
> Andy
>
> ----- Original Message -----
>> From: "Stuart Douglas" <[hidden email]>
>> To: "Wildfly Dev mailing list" <[hidden email]>
>> Sent: Monday, June 9, 2014 4:38:55 PM
>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful    Shutdown)
>>
>> Server suspend and resume is a feature that allows a running server
>> to
>> gracefully finish of all running requests. The most common use case
>> for
>> this is graceful shutdown, where you would like a server to complete
>> all
>> running requests, reject any new ones, and then shut down, however
>> there
>> are also plenty of other valid use cases (e.g. suspend the server,
>> modify a data source or some other config, then resume).
>>
>> User View:
>>
>> For the users point of view two new operations will be added to the
>> server:
>>
>> suspend(timeout)
>> resume()
>>
>> A runtime only attribute suspend-state (is this a good name?) will
>> also
>> be added, that can take one of three possible values, RUNNING,
>> SUSPENDING, SUSPENDED.
>>
>> A timeout attribute will also be added to the shutdown operation. If
>> this is present then the server will first be suspended, and the
>> server
>> will not shut down until either the suspend is successful or the
>> timeout
>> occurs. If no timeout parameter is passed to the operation then a
>> normal
>> non-graceful shutdown will take place.
>>
>> In domain mode these operations will be added to both individual
>> server
>> and a complete server group.
>>
>> Implementation Details
>>
>> Suspend/resume operates on entry points to the server. Any request
>> that
>> is currently running must not be affected by the suspend state,
>> however
>> any new request should be rejected. In general subsystems will track
>> the
>> number of outstanding requests, and when this hits zero they are
>> considered suspended.
>>
>> We will introduce the notion of a global SuspendController, that
>> manages
>> the servers suspend state. All subsystems that wish to do a graceful
>> shutdown register callback handlers with this controller.
>>
>> When the suspend() operation is invoked the controller will invoke
>> all
>> these callbacks, letting the subsystem know that the server is
>> suspend,
>> and providing the subsystem with a SuspendContext object that the
>> subsystem can then use to notify the controller that the suspend is
>> complete.
>>
>> What the subsystem does when it receives a suspend command, and when
>> it
>> considers itself suspended will vary, but in the common case it will
>> immediatly start rejecting external requests (e.g. Undertow will
>> start
>> responding with a 503 to all new requests). The subsystem will also
>> track the number of outstanding requests, and when this hits zero
>> then
>> the subsystem will notify the controller that is has successfully
>> suspended.
>> Some subsystems will obviously want to do other actions on suspend,
>> e.g.
>> clustering will likely want to fail over, mod_cluster will notify the
>> load balancer that the node is no longer available etc. In some cases
>> we
>> may want to make this configurable to an extent (e.g. Undertow could
>> be
>> configured to allow requests with an existing session, and not
>> consider
>> itself timed out until all sessions have either timed out or been
>> invalidated, although this will obviously take a while).
>>
>> If anyone has any feedback let me know. In terms of implementation my
>> basic plan is to get the core functionality and the Undertow
>> implementation into Wildfly, and then work with subsystem authors to
>> implement subsystem specific functionality once the core is in place.
>>
>> Stuart
>>
>>
>>
>>
>>
>>
>>
>> The
>>
>> A timeout attribute will also be added to the shutdown command,
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Jason T. Greene

IIRC the behavior for a tx timeout is a rollback, but we should check that.

> On Jun 9, 2014, at 6:50 PM, Stuart Douglas <[hidden email]> wrote:
>
> Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests.
>
>
>> On 9 Jun 2014, at 18:04, Andrig Miller <[hidden email]> wrote:
>>
>> What I am bringing up is more subsystem specific, but it might be valuable to think about.  In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction?
>
> It waits for the transaction to finish before shutting down.
>
> Stuart
>
>>
>> Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered?
>>
>> Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction?
>>
>> I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me.
>>
>> Andy
>>
>> ----- Original Message -----
>>> From: "Stuart Douglas" <[hidden email]>
>>> To: "Wildfly Dev mailing list" <[hidden email]>
>>> Sent: Monday, June 9, 2014 4:38:55 PM
>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful    Shutdown)
>>>
>>> Server suspend and resume is a feature that allows a running server
>>> to
>>> gracefully finish of all running requests. The most common use case
>>> for
>>> this is graceful shutdown, where you would like a server to complete
>>> all
>>> running requests, reject any new ones, and then shut down, however
>>> there
>>> are also plenty of other valid use cases (e.g. suspend the server,
>>> modify a data source or some other config, then resume).
>>>
>>> User View:
>>>
>>> For the users point of view two new operations will be added to the
>>> server:
>>>
>>> suspend(timeout)
>>> resume()
>>>
>>> A runtime only attribute suspend-state (is this a good name?) will
>>> also
>>> be added, that can take one of three possible values, RUNNING,
>>> SUSPENDING, SUSPENDED.
>>>
>>> A timeout attribute will also be added to the shutdown operation. If
>>> this is present then the server will first be suspended, and the
>>> server
>>> will not shut down until either the suspend is successful or the
>>> timeout
>>> occurs. If no timeout parameter is passed to the operation then a
>>> normal
>>> non-graceful shutdown will take place.
>>>
>>> In domain mode these operations will be added to both individual
>>> server
>>> and a complete server group.
>>>
>>> Implementation Details
>>>
>>> Suspend/resume operates on entry points to the server. Any request
>>> that
>>> is currently running must not be affected by the suspend state,
>>> however
>>> any new request should be rejected. In general subsystems will track
>>> the
>>> number of outstanding requests, and when this hits zero they are
>>> considered suspended.
>>>
>>> We will introduce the notion of a global SuspendController, that
>>> manages
>>> the servers suspend state. All subsystems that wish to do a graceful
>>> shutdown register callback handlers with this controller.
>>>
>>> When the suspend() operation is invoked the controller will invoke
>>> all
>>> these callbacks, letting the subsystem know that the server is
>>> suspend,
>>> and providing the subsystem with a SuspendContext object that the
>>> subsystem can then use to notify the controller that the suspend is
>>> complete.
>>>
>>> What the subsystem does when it receives a suspend command, and when
>>> it
>>> considers itself suspended will vary, but in the common case it will
>>> immediatly start rejecting external requests (e.g. Undertow will
>>> start
>>> responding with a 503 to all new requests). The subsystem will also
>>> track the number of outstanding requests, and when this hits zero
>>> then
>>> the subsystem will notify the controller that is has successfully
>>> suspended.
>>> Some subsystems will obviously want to do other actions on suspend,
>>> e.g.
>>> clustering will likely want to fail over, mod_cluster will notify the
>>> load balancer that the node is no longer available etc. In some cases
>>> we
>>> may want to make this configurable to an extent (e.g. Undertow could
>>> be
>>> configured to allow requests with an existing session, and not
>>> consider
>>> itself timed out until all sessions have either timed out or been
>>> invalidated, although this will obviously take a while).
>>>
>>> If anyone has any feedback let me know. In terms of implementation my
>>> basic plan is to get the core functionality and the Undertow
>>> implementation into Wildfly, and then work with subsystem authors to
>>> implement subsystem specific functionality once the core is in place.
>>>
>>> Stuart
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The
>>>
>>> A timeout attribute will also be added to the shutdown command,
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Scott Marlow
In reply to this post by Stuart Douglas-2
On 06/09/2014 06:38 PM, Stuart Douglas wrote:

> Server suspend and resume is a feature that allows a running server to
> gracefully finish of all running requests. The most common use case for
> this is graceful shutdown, where you would like a server to complete all
> running requests, reject any new ones, and then shut down, however there
> are also plenty of other valid use cases (e.g. suspend the server,
> modify a data source or some other config, then resume).
>
> User View:
>
> For the users point of view two new operations will be added to the server:
>
> suspend(timeout)
> resume()
>
> A runtime only attribute suspend-state (is this a good name?) will also
> be added, that can take one of three possible values, RUNNING,
> SUSPENDING, SUSPENDED.

The SuspendController "state" might be a shorter attribute name and just
as meaningful.

When are we in the RUNNING state?  Is that simply the pre-state for
SUSPENDING?

>
> A timeout attribute will also be added to the shutdown operation. If
> this is present then the server will first be suspended, and the server
> will not shut down until either the suspend is successful or the timeout
> occurs. If no timeout parameter is passed to the operation then a normal
> non-graceful shutdown will take place.

Will non-graceful shutdown wait for non-daemon threads or terminate
immediately (call System.exit()).

>
> In domain mode these operations will be added to both individual server
> and a complete server group.
>
> Implementation Details
>
> Suspend/resume operates on entry points to the server. Any request that
> is currently running must not be affected by the suspend state, however
> any new request should be rejected. In general subsystems will track the
> number of outstanding requests, and when this hits zero they are
> considered suspended.
>
> We will introduce the notion of a global SuspendController, that manages
> the servers suspend state. All subsystems that wish to do a graceful
> shutdown register callback handlers with this controller.
>
> When the suspend() operation is invoked the controller will invoke all
> these callbacks, letting the subsystem know that the server is suspend,
> and providing the subsystem with a SuspendContext object that the
> subsystem can then use to notify the controller that the suspend is
> complete.
>
> What the subsystem does when it receives a suspend command, and when it
> considers itself suspended will vary, but in the common case it will
> immediatly start rejecting external requests (e.g. Undertow will start
> responding with a 503 to all new requests). The subsystem will also
> track the number of outstanding requests, and when this hits zero then
> the subsystem will notify the controller that is has successfully
> suspended.
> Some subsystems will obviously want to do other actions on suspend, e.g.
> clustering will likely want to fail over, mod_cluster will notify the
> load balancer that the node is no longer available etc. In some cases we
> may want to make this configurable to an extent (e.g. Undertow could be
> configured to allow requests with an existing session, and not consider
> itself timed out until all sessions have either timed out or been
> invalidated, although this will obviously take a while).
>
> If anyone has any feedback let me know. In terms of implementation my
> basic plan is to get the core functionality and the Undertow
> implementation into Wildfly, and then work with subsystem authors to
> implement subsystem specific functionality once the core is in place.
>
> Stuart
>
>
>
>
>
>
>
> The
>
> A timeout attribute will also be added to the shutdown command,
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas


Scott Marlow wrote:

> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>> Server suspend and resume is a feature that allows a running server to
>> gracefully finish of all running requests. The most common use case for
>> this is graceful shutdown, where you would like a server to complete all
>> running requests, reject any new ones, and then shut down, however there
>> are also plenty of other valid use cases (e.g. suspend the server,
>> modify a data source or some other config, then resume).
>>
>> User View:
>>
>> For the users point of view two new operations will be added to the server:
>>
>> suspend(timeout)
>> resume()
>>
>> A runtime only attribute suspend-state (is this a good name?) will also
>> be added, that can take one of three possible values, RUNNING,
>> SUSPENDING, SUSPENDED.
>
> The SuspendController "state" might be a shorter attribute name and just
> as meaningful.

This will be in the global server namespace (i.e. from the CLI
:read-attribute(name="suspend-state").

I think the name 'state' is just two generic, which kind of state are we
talking about?

>
> When are we in the RUNNING state?  Is that simply the pre-state for
> SUSPENDING?

99.99% of the time. Basically servers are always running unless they are
have been explicitly suspended, and then they go from suspending to
suspended. Note that if resume is called at any time the server goes to
RUNNING again immediately, as when subsystems are notified they should
be able to begin accepting requests again straight away.

We also have admin only mode, which is a kinda similar concept, so we
need to make sure we document the differences.

>
>> A timeout attribute will also be added to the shutdown operation. If
>> this is present then the server will first be suspended, and the server
>> will not shut down until either the suspend is successful or the timeout
>> occurs. If no timeout parameter is passed to the operation then a normal
>> non-graceful shutdown will take place.
>
> Will non-graceful shutdown wait for non-daemon threads or terminate
> immediately (call System.exit()).

It will execute the same way it does today (all services will shut down
and then the server will exit).

Stuart

>
>> In domain mode these operations will be added to both individual server
>> and a complete server group.
>>
>> Implementation Details
>>
>> Suspend/resume operates on entry points to the server. Any request that
>> is currently running must not be affected by the suspend state, however
>> any new request should be rejected. In general subsystems will track the
>> number of outstanding requests, and when this hits zero they are
>> considered suspended.
>>
>> We will introduce the notion of a global SuspendController, that manages
>> the servers suspend state. All subsystems that wish to do a graceful
>> shutdown register callback handlers with this controller.
>>
>> When the suspend() operation is invoked the controller will invoke all
>> these callbacks, letting the subsystem know that the server is suspend,
>> and providing the subsystem with a SuspendContext object that the
>> subsystem can then use to notify the controller that the suspend is
>> complete.
>>
>> What the subsystem does when it receives a suspend command, and when it
>> considers itself suspended will vary, but in the common case it will
>> immediatly start rejecting external requests (e.g. Undertow will start
>> responding with a 503 to all new requests). The subsystem will also
>> track the number of outstanding requests, and when this hits zero then
>> the subsystem will notify the controller that is has successfully
>> suspended.
>> Some subsystems will obviously want to do other actions on suspend, e.g.
>> clustering will likely want to fail over, mod_cluster will notify the
>> load balancer that the node is no longer available etc. In some cases we
>> may want to make this configurable to an extent (e.g. Undertow could be
>> configured to allow requests with an existing session, and not consider
>> itself timed out until all sessions have either timed out or been
>> invalidated, although this will obviously take a while).
>>
>> If anyone has any feedback let me know. In terms of implementation my
>> basic plan is to get the core functionality and the Undertow
>> implementation into Wildfly, and then work with subsystem authors to
>> implement subsystem specific functionality once the core is in place.
>>
>> Stuart
>>
>>
>>
>>
>>
>>
>>
>> The
>>
>> A timeout attribute will also be added to the shutdown command,
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Jaikiran Pai-2
In reply to this post by Stuart Douglas-2
This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too?

-Jaikiran
On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote:
Server suspend and resume is a feature that allows a running server to 
gracefully finish of all running requests. The most common use case for 
this is graceful shutdown, where you would like a server to complete all 
running requests, reject any new ones, and then shut down, however there 
are also plenty of other valid use cases (e.g. suspend the server, 
modify a data source or some other config, then resume).

User View:

For the users point of view two new operations will be added to the server:

suspend(timeout)
resume()

A runtime only attribute suspend-state (is this a good name?) will also 
be added, that can take one of three possible values, RUNNING, 
SUSPENDING, SUSPENDED.

A timeout attribute will also be added to the shutdown operation. If 
this is present then the server will first be suspended, and the server 
will not shut down until either the suspend is successful or the timeout 
occurs. If no timeout parameter is passed to the operation then a normal 
non-graceful shutdown will take place.

In domain mode these operations will be added to both individual server 
and a complete server group.

Implementation Details

Suspend/resume operates on entry points to the server. Any request that 
is currently running must not be affected by the suspend state, however 
any new request should be rejected. In general subsystems will track the 
number of outstanding requests, and when this hits zero they are 
considered suspended.

We will introduce the notion of a global SuspendController, that manages 
the servers suspend state. All subsystems that wish to do a graceful 
shutdown register callback handlers with this controller.

When the suspend() operation is invoked the controller will invoke all 
these callbacks, letting the subsystem know that the server is suspend, 
and providing the subsystem with a SuspendContext object that the 
subsystem can then use to notify the controller that the suspend is 
complete.

What the subsystem does when it receives a suspend command, and when it 
considers itself suspended will vary, but in the common case it will 
immediatly start rejecting external requests (e.g. Undertow will start 
responding with a 503 to all new requests). The subsystem will also 
track the number of outstanding requests, and when this hits zero then 
the subsystem will notify the controller that is has successfully 
suspended.
Some subsystems will obviously want to do other actions on suspend, e.g. 
clustering will likely want to fail over, mod_cluster will notify the 
load balancer that the node is no longer available etc. In some cases we 
may want to make this configurable to an extent (e.g. Undertow could be 
configured to allow requests with an existing session, and not consider 
itself timed out until all sessions have either timed out or been 
invalidated, although this will obviously take a while).

If anyone has any feedback let me know. In terms of implementation my 
basic plan is to get the core functionality and the Undertow 
implementation into Wildfly, and then work with subsystem authors to 
implement subsystem specific functionality once the core is in place.

Stuart







The

A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Jaikiran Pai-2
One other question - When a server is put into suspend mode, is it going to trigger undeployment of certain deployed deployments? And would that be considered the expected behaviour? More specifically, when the admin triggers a suspend, are the subsystems allowed to trigger certain operations which might stop the services that back the currently deployed deployments or are they expected to keep those services in a started/UP state?

-Jaikiran
On Tuesday 10 June 2014 10:16 AM, Jaikiran Pai wrote:
This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too?

-Jaikiran
On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote:
Server suspend and resume is a feature that allows a running server to 
gracefully finish of all running requests. The most common use case for 
this is graceful shutdown, where you would like a server to complete all 
running requests, reject any new ones, and then shut down, however there 
are also plenty of other valid use cases (e.g. suspend the server, 
modify a data source or some other config, then resume).

User View:

For the users point of view two new operations will be added to the server:

suspend(timeout)
resume()

A runtime only attribute suspend-state (is this a good name?) will also 
be added, that can take one of three possible values, RUNNING, 
SUSPENDING, SUSPENDED.

A timeout attribute will also be added to the shutdown operation. If 
this is present then the server will first be suspended, and the server 
will not shut down until either the suspend is successful or the timeout 
occurs. If no timeout parameter is passed to the operation then a normal 
non-graceful shutdown will take place.

In domain mode these operations will be added to both individual server 
and a complete server group.

Implementation Details

Suspend/resume operates on entry points to the server. Any request that 
is currently running must not be affected by the suspend state, however 
any new request should be rejected. In general subsystems will track the 
number of outstanding requests, and when this hits zero they are 
considered suspended.

We will introduce the notion of a global SuspendController, that manages 
the servers suspend state. All subsystems that wish to do a graceful 
shutdown register callback handlers with this controller.

When the suspend() operation is invoked the controller will invoke all 
these callbacks, letting the subsystem know that the server is suspend, 
and providing the subsystem with a SuspendContext object that the 
subsystem can then use to notify the controller that the suspend is 
complete.

What the subsystem does when it receives a suspend command, and when it 
considers itself suspended will vary, but in the common case it will 
immediatly start rejecting external requests (e.g. Undertow will start 
responding with a 503 to all new requests). The subsystem will also 
track the number of outstanding requests, and when this hits zero then 
the subsystem will notify the controller that is has successfully 
suspended.
Some subsystems will obviously want to do other actions on suspend, e.g. 
clustering will likely want to fail over, mod_cluster will notify the 
load balancer that the node is no longer available etc. In some cases we 
may want to make this configurable to an extent (e.g. Undertow could be 
configured to allow requests with an existing session, and not consider 
itself timed out until all sessions have either timed out or been 
invalidated, although this will obviously take a while).

If anyone has any feedback let me know. In terms of implementation my 
basic plan is to get the core functionality and the Undertow 
implementation into Wildfly, and then work with subsystem authors to 
implement subsystem specific functionality once the core is in place.

Stuart







The

A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev



_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Michael Musgrove
In reply to this post by Jason T. Greene
I agree with Stuart, it should wait for the transaction to finish before
shutting down.

And yes (with caveats), Jason, when the timeout is reached our transaction reaper will abort the transaction. However, if the transaction was started with a timeout value of 0 it will never abort. Also, if the suspend happens when there are prepared transactions then it's too late to cancel the transaction and they will be recovered when the system is resumed.

Note also that suspending before an in-flight transaction has prepared is probably safe since the resource will either:

- rollback the branch if all connections to the db are closed (when the system suspends); or
- rollback the branch if the XAResource timeout (set via the XAResource.setTransactionTimeout()) value is reached

[And since it was never prepared we have no log record for it so we would not do anything on resume]

Mike

> IIRC the behavior for a tx timeout is a rollback, but we should check that.
>
>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas <[hidden email]> wrote:
>>
>> Something I forgot to mention is that we will need a switch to turn this off, as there is a small but noticeable cost with tracking in flight requests.
>>
>>
>>> On 9 Jun 2014, at 18:04, Andrig Miller <[hidden email]> wrote:
>>>
>>> What I am bringing up is more subsystem specific, but it might be valuable to think about.  In case of the time out of the graceful shutdown, what behavior would we consider correct in terms of an inflight transaction?
>> It waits for the transaction to finish before shutting down.
>>
>> Stuart
>>
>>> Should it be a forced rollback, so that when the server is started back up, the transaction manager will not find in the log a transaction to be recovered?
>>>
>>> Or, should it be considered the same as a crashed state, where transactions should be recoverable, and the recover manager wouuld try to recover the transaction?
>>>
>>> I would lean towards the first, as this would be considered graceful by the administrator, and having a transaction be in a state where it would be recovered on a restart, doesn't seem graceful to me.
>>>
>>> Andy
>>>
>>> ----- Original Message -----
>>>> From: "Stuart Douglas" <[hidden email]>
>>>> To: "Wildfly Dev mailing list" <[hidden email]>
>>>> Sent: Monday, June 9, 2014 4:38:55 PM
>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful    Shutdown)
>>>>
>>>> Server suspend and resume is a feature that allows a running server
>>>> to
>>>> gracefully finish of all running requests. The most common use case
>>>> for
>>>> this is graceful shutdown, where you would like a server to complete
>>>> all
>>>> running requests, reject any new ones, and then shut down, however
>>>> there
>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>> modify a data source or some other config, then resume).
>>>>
>>>> User View:
>>>>
>>>> For the users point of view two new operations will be added to the
>>>> server:
>>>>
>>>> suspend(timeout)
>>>> resume()
>>>>
>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>> also
>>>> be added, that can take one of three possible values, RUNNING,
>>>> SUSPENDING, SUSPENDED.
>>>>
>>>> A timeout attribute will also be added to the shutdown operation. If
>>>> this is present then the server will first be suspended, and the
>>>> server
>>>> will not shut down until either the suspend is successful or the
>>>> timeout
>>>> occurs. If no timeout parameter is passed to the operation then a
>>>> normal
>>>> non-graceful shutdown will take place.
>>>>
>>>> In domain mode these operations will be added to both individual
>>>> server
>>>> and a complete server group.
>>>>
>>>> Implementation Details
>>>>
>>>> Suspend/resume operates on entry points to the server. Any request
>>>> that
>>>> is currently running must not be affected by the suspend state,
>>>> however
>>>> any new request should be rejected. In general subsystems will track
>>>> the
>>>> number of outstanding requests, and when this hits zero they are
>>>> considered suspended.
>>>>
>>>> We will introduce the notion of a global SuspendController, that
>>>> manages
>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>> shutdown register callback handlers with this controller.
>>>>
>>>> When the suspend() operation is invoked the controller will invoke
>>>> all
>>>> these callbacks, letting the subsystem know that the server is
>>>> suspend,
>>>> and providing the subsystem with a SuspendContext object that the
>>>> subsystem can then use to notify the controller that the suspend is
>>>> complete.
>>>>
>>>> What the subsystem does when it receives a suspend command, and when
>>>> it
>>>> considers itself suspended will vary, but in the common case it will
>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>> start
>>>> responding with a 503 to all new requests). The subsystem will also
>>>> track the number of outstanding requests, and when this hits zero
>>>> then
>>>> the subsystem will notify the controller that is has successfully
>>>> suspended.
>>>> Some subsystems will obviously want to do other actions on suspend,
>>>> e.g.
>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>> load balancer that the node is no longer available etc. In some cases
>>>> we
>>>> may want to make this configurable to an extent (e.g. Undertow could
>>>> be
>>>> configured to allow requests with an existing session, and not
>>>> consider
>>>> itself timed out until all sessions have either timed out or been
>>>> invalidated, although this will obviously take a while).
>>>>
>>>> If anyone has any feedback let me know. In terms of implementation my
>>>> basic plan is to get the core functionality and the Undertow
>>>> implementation into Wildfly, and then work with subsystem authors to
>>>> implement subsystem specific functionality once the core is in place.
>>>>
>>>> Stuart
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The
>>>>
>>>> A timeout attribute will also be added to the shutdown command,
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev


--
Michael Musgrove
Transactions Team
e: [hidden email]
t: +44 191 243 0870

Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson (US), Michael O'Neill(Ireland)

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
In reply to this post by Jaikiran Pai-2
They still have an entry point, the subsystem tracks outstanding timers and will not execute new ones while suspended.

Stuart

Sent from my iPhone

On 9 Jun 2014, at 23:46, Jaikiran Pai <[hidden email]> wrote:

This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too?

-Jaikiran
On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote:
Server suspend and resume is a feature that allows a running server to 
gracefully finish of all running requests. The most common use case for 
this is graceful shutdown, where you would like a server to complete all 
running requests, reject any new ones, and then shut down, however there 
are also plenty of other valid use cases (e.g. suspend the server, 
modify a data source or some other config, then resume).

User View:

For the users point of view two new operations will be added to the server:

suspend(timeout)
resume()

A runtime only attribute suspend-state (is this a good name?) will also 
be added, that can take one of three possible values, RUNNING, 
SUSPENDING, SUSPENDED.

A timeout attribute will also be added to the shutdown operation. If 
this is present then the server will first be suspended, and the server 
will not shut down until either the suspend is successful or the timeout 
occurs. If no timeout parameter is passed to the operation then a normal 
non-graceful shutdown will take place.

In domain mode these operations will be added to both individual server 
and a complete server group.

Implementation Details

Suspend/resume operates on entry points to the server. Any request that 
is currently running must not be affected by the suspend state, however 
any new request should be rejected. In general subsystems will track the 
number of outstanding requests, and when this hits zero they are 
considered suspended.

We will introduce the notion of a global SuspendController, that manages 
the servers suspend state. All subsystems that wish to do a graceful 
shutdown register callback handlers with this controller.

When the suspend() operation is invoked the controller will invoke all 
these callbacks, letting the subsystem know that the server is suspend, 
and providing the subsystem with a SuspendContext object that the 
subsystem can then use to notify the controller that the suspend is 
complete.

What the subsystem does when it receives a suspend command, and when it 
considers itself suspended will vary, but in the common case it will 
immediatly start rejecting external requests (e.g. Undertow will start 
responding with a 503 to all new requests). The subsystem will also 
track the number of outstanding requests, and when this hits zero then 
the subsystem will notify the controller that is has successfully 
suspended.
Some subsystems will obviously want to do other actions on suspend, e.g. 
clustering will likely want to fail over, mod_cluster will notify the 
load balancer that the node is no longer available etc. In some cases we 
may want to make this configurable to an extent (e.g. Undertow could be 
configured to allow requests with an existing session, and not consider 
itself timed out until all sessions have either timed out or been 
invalidated, although this will obviously take a while).

If anyone has any feedback let me know. In terms of implementation my 
basic plan is to get the core functionality and the Undertow 
implementation into Wildfly, and then work with subsystem authors to 
implement subsystem specific functionality once the core is in place.

Stuart







The

A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
In reply to this post by Jaikiran Pai-2
Nothing is underplayed and all services remain up. If the server is resumed it should be able to begin processing requests immediately.

Stuart

Sent from my iPhone

On 10 Jun 2014, at 0:02, Jaikiran Pai <[hidden email]> wrote:

One other question - When a server is put into suspend mode, is it going to trigger undeployment of certain deployed deployments? And would that be considered the expected behaviour? More specifically, when the admin triggers a suspend, are the subsystems allowed to trigger certain operations which might stop the services that back the currently deployed deployments or are they expected to keep those services in a started/UP state?

-Jaikiran
On Tuesday 10 June 2014 10:16 AM, Jaikiran Pai wrote:
This is more of a subsystem specific question - How are internal operations (i.e. something that doesn't exactly have an entry point into the server) handled when the server is in a suspended state or when it is suspending. One such example is, an EJB application which might have scheduled timer tasks associated with it. Are such timer tasks supposed to continue to run even when server is suspended or when it is suspending? Or is the subsystem expected to shut those down too?

-Jaikiran
On Tuesday 10 June 2014 04:08 AM, Stuart Douglas wrote:
Server suspend and resume is a feature that allows a running server to 
gracefully finish of all running requests. The most common use case for 
this is graceful shutdown, where you would like a server to complete all 
running requests, reject any new ones, and then shut down, however there 
are also plenty of other valid use cases (e.g. suspend the server, 
modify a data source or some other config, then resume).

User View:

For the users point of view two new operations will be added to the server:

suspend(timeout)
resume()

A runtime only attribute suspend-state (is this a good name?) will also 
be added, that can take one of three possible values, RUNNING, 
SUSPENDING, SUSPENDED.

A timeout attribute will also be added to the shutdown operation. If 
this is present then the server will first be suspended, and the server 
will not shut down until either the suspend is successful or the timeout 
occurs. If no timeout parameter is passed to the operation then a normal 
non-graceful shutdown will take place.

In domain mode these operations will be added to both individual server 
and a complete server group.

Implementation Details

Suspend/resume operates on entry points to the server. Any request that 
is currently running must not be affected by the suspend state, however 
any new request should be rejected. In general subsystems will track the 
number of outstanding requests, and when this hits zero they are 
considered suspended.

We will introduce the notion of a global SuspendController, that manages 
the servers suspend state. All subsystems that wish to do a graceful 
shutdown register callback handlers with this controller.

When the suspend() operation is invoked the controller will invoke all 
these callbacks, letting the subsystem know that the server is suspend, 
and providing the subsystem with a SuspendContext object that the 
subsystem can then use to notify the controller that the suspend is 
complete.

What the subsystem does when it receives a suspend command, and when it 
considers itself suspended will vary, but in the common case it will 
immediatly start rejecting external requests (e.g. Undertow will start 
responding with a 503 to all new requests). The subsystem will also 
track the number of outstanding requests, and when this hits zero then 
the subsystem will notify the controller that is has successfully 
suspended.
Some subsystems will obviously want to do other actions on suspend, e.g. 
clustering will likely want to fail over, mod_cluster will notify the 
load balancer that the node is no longer available etc. In some cases we 
may want to make this configurable to an extent (e.g. Undertow could be 
configured to allow requests with an existing session, and not consider 
itself timed out until all sessions have either timed out or been 
invalidated, although this will obviously take a while).

If anyone has any feedback let me know. In terms of implementation my 
basic plan is to get the core functionality and the Undertow 
implementation into Wildfly, and then work with subsystem authors to 
implement subsystem specific functionality once the core is in place.

Stuart







The

A timeout attribute will also be added to the shutdown command,
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Scott Marlow
In reply to this post by Stuart Douglas
On 06/09/2014 10:40 PM, Stuart Douglas wrote:

>
>
> Scott Marlow wrote:
>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>> Server suspend and resume is a feature that allows a running server to
>>> gracefully finish of all running requests. The most common use case for
>>> this is graceful shutdown, where you would like a server to complete all
>>> running requests, reject any new ones, and then shut down, however there
>>> are also plenty of other valid use cases (e.g. suspend the server,
>>> modify a data source or some other config, then resume).
>>>
>>> User View:
>>>
>>> For the users point of view two new operations will be added to the
>>> server:
>>>
>>> suspend(timeout)
>>> resume()
>>>
>>> A runtime only attribute suspend-state (is this a good name?) will also
>>> be added, that can take one of three possible values, RUNNING,
>>> SUSPENDING, SUSPENDED.
>>
>> The SuspendController "state" might be a shorter attribute name and just
>> as meaningful.
>
> This will be in the global server namespace (i.e. from the CLI
> :read-attribute(name="suspend-state").
>
> I think the name 'state' is just two generic, which kind of state are we
> talking about?

I was thinking suspend-state was an attribute of SuspendController.
Thanks for the explanation.

>
>>
>> When are we in the RUNNING state?  Is that simply the pre-state for
>> SUSPENDING?
>
> 99.99% of the time. Basically servers are always running unless they are
> have been explicitly suspended, and then they go from suspending to
> suspended. Note that if resume is called at any time the server goes to
> RUNNING again immediately, as when subsystems are notified they should
> be able to begin accepting requests again straight away.
>
> We also have admin only mode, which is a kinda similar concept, so we
> need to make sure we document the differences.
>
>>
>>> A timeout attribute will also be added to the shutdown operation. If
>>> this is present then the server will first be suspended, and the server
>>> will not shut down until either the suspend is successful or the timeout
>>> occurs. If no timeout parameter is passed to the operation then a normal
>>> non-graceful shutdown will take place.
>>
>> Will non-graceful shutdown wait for non-daemon threads or terminate
>> immediately (call System.exit()).
>
> It will execute the same way it does today (all services will shut down
> and then the server will exit).
>
> Stuart
>
>>
>>> In domain mode these operations will be added to both individual server
>>> and a complete server group.
>>>
>>> Implementation Details
>>>
>>> Suspend/resume operates on entry points to the server. Any request that
>>> is currently running must not be affected by the suspend state, however
>>> any new request should be rejected. In general subsystems will track the
>>> number of outstanding requests, and when this hits zero they are
>>> considered suspended.
>>>
>>> We will introduce the notion of a global SuspendController, that manages
>>> the servers suspend state. All subsystems that wish to do a graceful
>>> shutdown register callback handlers with this controller.
>>>
>>> When the suspend() operation is invoked the controller will invoke all
>>> these callbacks, letting the subsystem know that the server is suspend,
>>> and providing the subsystem with a SuspendContext object that the
>>> subsystem can then use to notify the controller that the suspend is
>>> complete.
>>>
>>> What the subsystem does when it receives a suspend command, and when it
>>> considers itself suspended will vary, but in the common case it will
>>> immediatly start rejecting external requests (e.g. Undertow will start
>>> responding with a 503 to all new requests). The subsystem will also
>>> track the number of outstanding requests, and when this hits zero then
>>> the subsystem will notify the controller that is has successfully
>>> suspended.
>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>> clustering will likely want to fail over, mod_cluster will notify the
>>> load balancer that the node is no longer available etc. In some cases we
>>> may want to make this configurable to an extent (e.g. Undertow could be
>>> configured to allow requests with an existing session, and not consider
>>> itself timed out until all sessions have either timed out or been
>>> invalidated, although this will obviously take a while).
>>>
>>> If anyone has any feedback let me know. In terms of implementation my
>>> basic plan is to get the core functionality and the Undertow
>>> implementation into Wildfly, and then work with subsystem authors to
>>> implement subsystem specific functionality once the core is in place.
>>>
>>> Stuart
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The
>>>
>>> A timeout attribute will also be added to the shutdown command,
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Andrig Miller
In reply to this post by Michael Musgrove


----- Original Message -----

> From: "Michael Musgrove" <[hidden email]>
> To: "Jason T. Greene" <[hidden email]>, "Stuart Douglas" <[hidden email]>
> Cc: "Wildfly Dev mailing list" <[hidden email]>, "Stuart Douglas" <[hidden email]>
> Sent: Tuesday, June 10, 2014 3:51:02 AM
> Subject: Re: [wildfly-dev] Design Proposal: Server suspend/resume (AKA Graceful Shutdown)
>
> I agree with Stuart, it should wait for the transaction to finish
> before
> shutting down.
>
> And yes (with caveats), Jason, when the timeout is reached our
> transaction reaper will abort the transaction. However, if the
> transaction was started with a timeout value of 0 it will never
> abort. Also, if the suspend happens when there are prepared
> transactions then it's too late to cancel the transaction and they
> will be recovered when the system is resumed.
>

If the transaction will never abort, can we force a rollback?  This could lead to a never ending "graceful" shutdown.

Andy

> Note also that suspending before an in-flight transaction has
> prepared is probably safe since the resource will either:
>
> - rollback the branch if all connections to the db are closed (when
> the system suspends); or
> - rollback the branch if the XAResource timeout (set via the
> XAResource.setTransactionTimeout()) value is reached
>
> [And since it was never prepared we have no log record for it so we
> would not do anything on resume]
>
> Mike
>
> > IIRC the behavior for a tx timeout is a rollback, but we should
> > check that.
> >
> >> On Jun 9, 2014, at 6:50 PM, Stuart Douglas
> >> <[hidden email]> wrote:
> >>
> >> Something I forgot to mention is that we will need a switch to
> >> turn this off, as there is a small but noticeable cost with
> >> tracking in flight requests.
> >>
> >>
> >>> On 9 Jun 2014, at 18:04, Andrig Miller <[hidden email]>
> >>> wrote:
> >>>
> >>> What I am bringing up is more subsystem specific, but it might be
> >>> valuable to think about.  In case of the time out of the
> >>> graceful shutdown, what behavior would we consider correct in
> >>> terms of an inflight transaction?
> >> It waits for the transaction to finish before shutting down.
> >>
> >> Stuart
> >>
> >>> Should it be a forced rollback, so that when the server is
> >>> started back up, the transaction manager will not find in the
> >>> log a transaction to be recovered?
> >>>
> >>> Or, should it be considered the same as a crashed state, where
> >>> transactions should be recoverable, and the recover manager
> >>> wouuld try to recover the transaction?
> >>>
> >>> I would lean towards the first, as this would be considered
> >>> graceful by the administrator, and having a transaction be in a
> >>> state where it would be recovered on a restart, doesn't seem
> >>> graceful to me.
> >>>
> >>> Andy
> >>>
> >>> ----- Original Message -----
> >>>> From: "Stuart Douglas" <[hidden email]>
> >>>> To: "Wildfly Dev mailing list" <[hidden email]>
> >>>> Sent: Monday, June 9, 2014 4:38:55 PM
> >>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume
> >>>> (AKA Graceful    Shutdown)
> >>>>
> >>>> Server suspend and resume is a feature that allows a running
> >>>> server
> >>>> to
> >>>> gracefully finish of all running requests. The most common use
> >>>> case
> >>>> for
> >>>> this is graceful shutdown, where you would like a server to
> >>>> complete
> >>>> all
> >>>> running requests, reject any new ones, and then shut down,
> >>>> however
> >>>> there
> >>>> are also plenty of other valid use cases (e.g. suspend the
> >>>> server,
> >>>> modify a data source or some other config, then resume).
> >>>>
> >>>> User View:
> >>>>
> >>>> For the users point of view two new operations will be added to
> >>>> the
> >>>> server:
> >>>>
> >>>> suspend(timeout)
> >>>> resume()
> >>>>
> >>>> A runtime only attribute suspend-state (is this a good name?)
> >>>> will
> >>>> also
> >>>> be added, that can take one of three possible values, RUNNING,
> >>>> SUSPENDING, SUSPENDED.
> >>>>
> >>>> A timeout attribute will also be added to the shutdown
> >>>> operation. If
> >>>> this is present then the server will first be suspended, and the
> >>>> server
> >>>> will not shut down until either the suspend is successful or the
> >>>> timeout
> >>>> occurs. If no timeout parameter is passed to the operation then
> >>>> a
> >>>> normal
> >>>> non-graceful shutdown will take place.
> >>>>
> >>>> In domain mode these operations will be added to both individual
> >>>> server
> >>>> and a complete server group.
> >>>>
> >>>> Implementation Details
> >>>>
> >>>> Suspend/resume operates on entry points to the server. Any
> >>>> request
> >>>> that
> >>>> is currently running must not be affected by the suspend state,
> >>>> however
> >>>> any new request should be rejected. In general subsystems will
> >>>> track
> >>>> the
> >>>> number of outstanding requests, and when this hits zero they are
> >>>> considered suspended.
> >>>>
> >>>> We will introduce the notion of a global SuspendController, that
> >>>> manages
> >>>> the servers suspend state. All subsystems that wish to do a
> >>>> graceful
> >>>> shutdown register callback handlers with this controller.
> >>>>
> >>>> When the suspend() operation is invoked the controller will
> >>>> invoke
> >>>> all
> >>>> these callbacks, letting the subsystem know that the server is
> >>>> suspend,
> >>>> and providing the subsystem with a SuspendContext object that
> >>>> the
> >>>> subsystem can then use to notify the controller that the suspend
> >>>> is
> >>>> complete.
> >>>>
> >>>> What the subsystem does when it receives a suspend command, and
> >>>> when
> >>>> it
> >>>> considers itself suspended will vary, but in the common case it
> >>>> will
> >>>> immediatly start rejecting external requests (e.g. Undertow will
> >>>> start
> >>>> responding with a 503 to all new requests). The subsystem will
> >>>> also
> >>>> track the number of outstanding requests, and when this hits
> >>>> zero
> >>>> then
> >>>> the subsystem will notify the controller that is has
> >>>> successfully
> >>>> suspended.
> >>>> Some subsystems will obviously want to do other actions on
> >>>> suspend,
> >>>> e.g.
> >>>> clustering will likely want to fail over, mod_cluster will
> >>>> notify the
> >>>> load balancer that the node is no longer available etc. In some
> >>>> cases
> >>>> we
> >>>> may want to make this configurable to an extent (e.g. Undertow
> >>>> could
> >>>> be
> >>>> configured to allow requests with an existing session, and not
> >>>> consider
> >>>> itself timed out until all sessions have either timed out or
> >>>> been
> >>>> invalidated, although this will obviously take a while).
> >>>>
> >>>> If anyone has any feedback let me know. In terms of
> >>>> implementation my
> >>>> basic plan is to get the core functionality and the Undertow
> >>>> implementation into Wildfly, and then work with subsystem
> >>>> authors to
> >>>> implement subsystem specific functionality once the core is in
> >>>> place.
> >>>>
> >>>> Stuart
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> The
> >>>>
> >>>> A timeout attribute will also be added to the shutdown command,
> >>>> _______________________________________________
> >>>> wildfly-dev mailing list
> >>>> [hidden email]
> >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> >>> _______________________________________________
> >>> wildfly-dev mailing list
> >>> [hidden email]
> >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> >> _______________________________________________
> >> wildfly-dev mailing list
> >> [hidden email]
> >> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> > _______________________________________________
> > wildfly-dev mailing list
> > [hidden email]
> > https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
>
> --
> Michael Musgrove
> Transactions Team
> e: [hidden email]
> t: +44 191 243 0870
>
> Registered in England and Wales under Company Registration No.
> 03798903
> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson
> (US), Michael O'Neill(Ireland)
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas-2

>
> If the transaction will never abort, can we force a rollback?  This could lead to a never ending "graceful" shutdown.
>

That is why we have a timeout, once the timeout is done the server will
shutdown anyway. We should probably have some kind of post-timeout
callback that gets invoked, so the TX subsystem could potentially take
some action.

I'm not sure what the best action would be, the subsystem specific
details will be up to the team that maintains the subsystem.

Stuart

> Andy
>
>> Note also that suspending before an in-flight transaction has
>> prepared is probably safe since the resource will either:
>>
>> - rollback the branch if all connections to the db are closed (when
>> the system suspends); or
>> - rollback the branch if the XAResource timeout (set via the
>> XAResource.setTransactionTimeout()) value is reached
>>
>> [And since it was never prepared we have no log record for it so we
>> would not do anything on resume]
>>
>> Mike
>>
>>> IIRC the behavior for a tx timeout is a rollback, but we should
>>> check that.
>>>
>>>> On Jun 9, 2014, at 6:50 PM, Stuart Douglas
>>>> <[hidden email]>  wrote:
>>>>
>>>> Something I forgot to mention is that we will need a switch to
>>>> turn this off, as there is a small but noticeable cost with
>>>> tracking in flight requests.
>>>>
>>>>
>>>>> On 9 Jun 2014, at 18:04, Andrig Miller<[hidden email]>
>>>>> wrote:
>>>>>
>>>>> What I am bringing up is more subsystem specific, but it might be
>>>>> valuable to think about.  In case of the time out of the
>>>>> graceful shutdown, what behavior would we consider correct in
>>>>> terms of an inflight transaction?
>>>> It waits for the transaction to finish before shutting down.
>>>>
>>>> Stuart
>>>>
>>>>> Should it be a forced rollback, so that when the server is
>>>>> started back up, the transaction manager will not find in the
>>>>> log a transaction to be recovered?
>>>>>
>>>>> Or, should it be considered the same as a crashed state, where
>>>>> transactions should be recoverable, and the recover manager
>>>>> wouuld try to recover the transaction?
>>>>>
>>>>> I would lean towards the first, as this would be considered
>>>>> graceful by the administrator, and having a transaction be in a
>>>>> state where it would be recovered on a restart, doesn't seem
>>>>> graceful to me.
>>>>>
>>>>> Andy
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Stuart Douglas"<[hidden email]>
>>>>>> To: "Wildfly Dev mailing list"<[hidden email]>
>>>>>> Sent: Monday, June 9, 2014 4:38:55 PM
>>>>>> Subject: [wildfly-dev] Design Proposal: Server suspend/resume
>>>>>> (AKA Graceful    Shutdown)
>>>>>>
>>>>>> Server suspend and resume is a feature that allows a running
>>>>>> server
>>>>>> to
>>>>>> gracefully finish of all running requests. The most common use
>>>>>> case
>>>>>> for
>>>>>> this is graceful shutdown, where you would like a server to
>>>>>> complete
>>>>>> all
>>>>>> running requests, reject any new ones, and then shut down,
>>>>>> however
>>>>>> there
>>>>>> are also plenty of other valid use cases (e.g. suspend the
>>>>>> server,
>>>>>> modify a data source or some other config, then resume).
>>>>>>
>>>>>> User View:
>>>>>>
>>>>>> For the users point of view two new operations will be added to
>>>>>> the
>>>>>> server:
>>>>>>
>>>>>> suspend(timeout)
>>>>>> resume()
>>>>>>
>>>>>> A runtime only attribute suspend-state (is this a good name?)
>>>>>> will
>>>>>> also
>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>> SUSPENDING, SUSPENDED.
>>>>>>
>>>>>> A timeout attribute will also be added to the shutdown
>>>>>> operation. If
>>>>>> this is present then the server will first be suspended, and the
>>>>>> server
>>>>>> will not shut down until either the suspend is successful or the
>>>>>> timeout
>>>>>> occurs. If no timeout parameter is passed to the operation then
>>>>>> a
>>>>>> normal
>>>>>> non-graceful shutdown will take place.
>>>>>>
>>>>>> In domain mode these operations will be added to both individual
>>>>>> server
>>>>>> and a complete server group.
>>>>>>
>>>>>> Implementation Details
>>>>>>
>>>>>> Suspend/resume operates on entry points to the server. Any
>>>>>> request
>>>>>> that
>>>>>> is currently running must not be affected by the suspend state,
>>>>>> however
>>>>>> any new request should be rejected. In general subsystems will
>>>>>> track
>>>>>> the
>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>> considered suspended.
>>>>>>
>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>> manages
>>>>>> the servers suspend state. All subsystems that wish to do a
>>>>>> graceful
>>>>>> shutdown register callback handlers with this controller.
>>>>>>
>>>>>> When the suspend() operation is invoked the controller will
>>>>>> invoke
>>>>>> all
>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>> suspend,
>>>>>> and providing the subsystem with a SuspendContext object that
>>>>>> the
>>>>>> subsystem can then use to notify the controller that the suspend
>>>>>> is
>>>>>> complete.
>>>>>>
>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>> when
>>>>>> it
>>>>>> considers itself suspended will vary, but in the common case it
>>>>>> will
>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>> start
>>>>>> responding with a 503 to all new requests). The subsystem will
>>>>>> also
>>>>>> track the number of outstanding requests, and when this hits
>>>>>> zero
>>>>>> then
>>>>>> the subsystem will notify the controller that is has
>>>>>> successfully
>>>>>> suspended.
>>>>>> Some subsystems will obviously want to do other actions on
>>>>>> suspend,
>>>>>> e.g.
>>>>>> clustering will likely want to fail over, mod_cluster will
>>>>>> notify the
>>>>>> load balancer that the node is no longer available etc. In some
>>>>>> cases
>>>>>> we
>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>> could
>>>>>> be
>>>>>> configured to allow requests with an existing session, and not
>>>>>> consider
>>>>>> itself timed out until all sessions have either timed out or
>>>>>> been
>>>>>> invalidated, although this will obviously take a while).
>>>>>>
>>>>>> If anyone has any feedback let me know. In terms of
>>>>>> implementation my
>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>> implementation into Wildfly, and then work with subsystem
>>>>>> authors to
>>>>>> implement subsystem specific functionality once the core is in
>>>>>> place.
>>>>>>
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The
>>>>>>
>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
>> --
>> Michael Musgrove
>> Transactions Team
>> e: [hidden email]
>> t: +44 191 243 0870
>>
>> Registered in England and Wales under Company Registration No.
>> 03798903
>> Directors: Michael Cunningham (US), Charles Peters (US), Matt Parson
>> (US), Michael O'Neill(Ireland)
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
In reply to this post by Stuart Douglas-2
Note that none of this really has anything to do with services,
suspending a server should not involve any services changing state, so
that when a server is resumed it can start handling requests immediately.

Stuart

Stuart Douglas wrote:

> Server suspend and resume is a feature that allows a running server to
> gracefully finish of all running requests. The most common use case for
> this is graceful shutdown, where you would like a server to complete all
> running requests, reject any new ones, and then shut down, however there
> are also plenty of other valid use cases (e.g. suspend the server,
> modify a data source or some other config, then resume).
>
> User View:
>
> For the users point of view two new operations will be added to the server:
>
> suspend(timeout)
> resume()
>
> A runtime only attribute suspend-state (is this a good name?) will also
> be added, that can take one of three possible values, RUNNING,
> SUSPENDING, SUSPENDED.
>
> A timeout attribute will also be added to the shutdown operation. If
> this is present then the server will first be suspended, and the server
> will not shut down until either the suspend is successful or the timeout
> occurs. If no timeout parameter is passed to the operation then a normal
> non-graceful shutdown will take place.
>
> In domain mode these operations will be added to both individual server
> and a complete server group.
>
> Implementation Details
>
> Suspend/resume operates on entry points to the server. Any request that
> is currently running must not be affected by the suspend state, however
> any new request should be rejected. In general subsystems will track the
> number of outstanding requests, and when this hits zero they are
> considered suspended.
>
> We will introduce the notion of a global SuspendController, that manages
> the servers suspend state. All subsystems that wish to do a graceful
> shutdown register callback handlers with this controller.
>
> When the suspend() operation is invoked the controller will invoke all
> these callbacks, letting the subsystem know that the server is suspend,
> and providing the subsystem with a SuspendContext object that the
> subsystem can then use to notify the controller that the suspend is
> complete.
>
> What the subsystem does when it receives a suspend command, and when it
> considers itself suspended will vary, but in the common case it will
> immediatly start rejecting external requests (e.g. Undertow will start
> responding with a 503 to all new requests). The subsystem will also
> track the number of outstanding requests, and when this hits zero then
> the subsystem will notify the controller that is has successfully
> suspended.
> Some subsystems will obviously want to do other actions on suspend, e.g.
> clustering will likely want to fail over, mod_cluster will notify the
> load balancer that the node is no longer available etc. In some cases we
> may want to make this configurable to an extent (e.g. Undertow could be
> configured to allow requests with an existing session, and not consider
> itself timed out until all sessions have either timed out or been
> invalidated, although this will obviously take a while).
>
> If anyone has any feedback let me know. In terms of implementation my
> basic plan is to get the core functionality and the Undertow
> implementation into Wildfly, and then work with subsystem authors to
> implement subsystem specific functionality once the core is in place.
>
> Stuart
>
>
>
>
>
>
>
> The
>
> A timeout attribute will also be added to the shutdown command,
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Dimitris Andreadis
In reply to this post by Stuart Douglas
Why not extend the states of the existing 'server-state' attribute to:

(STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)

http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html

On 10/06/2014 04:40, Stuart Douglas wrote:

>
>
> Scott Marlow wrote:
>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>> Server suspend and resume is a feature that allows a running server to
>>> gracefully finish of all running requests. The most common use case for
>>> this is graceful shutdown, where you would like a server to complete all
>>> running requests, reject any new ones, and then shut down, however there
>>> are also plenty of other valid use cases (e.g. suspend the server,
>>> modify a data source or some other config, then resume).
>>>
>>> User View:
>>>
>>> For the users point of view two new operations will be added to the server:
>>>
>>> suspend(timeout)
>>> resume()
>>>
>>> A runtime only attribute suspend-state (is this a good name?) will also
>>> be added, that can take one of three possible values, RUNNING,
>>> SUSPENDING, SUSPENDED.
>>
>> The SuspendController "state" might be a shorter attribute name and just
>> as meaningful.
>
> This will be in the global server namespace (i.e. from the CLI
> :read-attribute(name="suspend-state").
>
> I think the name 'state' is just two generic, which kind of state are we
> talking about?
>
>>
>> When are we in the RUNNING state?  Is that simply the pre-state for
>> SUSPENDING?
>
> 99.99% of the time. Basically servers are always running unless they are
> have been explicitly suspended, and then they go from suspending to
> suspended. Note that if resume is called at any time the server goes to
> RUNNING again immediately, as when subsystems are notified they should
> be able to begin accepting requests again straight away.
>
> We also have admin only mode, which is a kinda similar concept, so we
> need to make sure we document the differences.
>
>>
>>> A timeout attribute will also be added to the shutdown operation. If
>>> this is present then the server will first be suspended, and the server
>>> will not shut down until either the suspend is successful or the timeout
>>> occurs. If no timeout parameter is passed to the operation then a normal
>>> non-graceful shutdown will take place.
>>
>> Will non-graceful shutdown wait for non-daemon threads or terminate
>> immediately (call System.exit()).
>
> It will execute the same way it does today (all services will shut down
> and then the server will exit).
>
> Stuart
>
>>
>>> In domain mode these operations will be added to both individual server
>>> and a complete server group.
>>>
>>> Implementation Details
>>>
>>> Suspend/resume operates on entry points to the server. Any request that
>>> is currently running must not be affected by the suspend state, however
>>> any new request should be rejected. In general subsystems will track the
>>> number of outstanding requests, and when this hits zero they are
>>> considered suspended.
>>>
>>> We will introduce the notion of a global SuspendController, that manages
>>> the servers suspend state. All subsystems that wish to do a graceful
>>> shutdown register callback handlers with this controller.
>>>
>>> When the suspend() operation is invoked the controller will invoke all
>>> these callbacks, letting the subsystem know that the server is suspend,
>>> and providing the subsystem with a SuspendContext object that the
>>> subsystem can then use to notify the controller that the suspend is
>>> complete.
>>>
>>> What the subsystem does when it receives a suspend command, and when it
>>> considers itself suspended will vary, but in the common case it will
>>> immediatly start rejecting external requests (e.g. Undertow will start
>>> responding with a 503 to all new requests). The subsystem will also
>>> track the number of outstanding requests, and when this hits zero then
>>> the subsystem will notify the controller that is has successfully
>>> suspended.
>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>> clustering will likely want to fail over, mod_cluster will notify the
>>> load balancer that the node is no longer available etc. In some cases we
>>> may want to make this configurable to an extent (e.g. Undertow could be
>>> configured to allow requests with an existing session, and not consider
>>> itself timed out until all sessions have either timed out or been
>>> invalidated, although this will obviously take a while).
>>>
>>> If anyone has any feedback let me know. In terms of implementation my
>>> basic plan is to get the core functionality and the Undertow
>>> implementation into Wildfly, and then work with subsystem authors to
>>> implement subsystem specific functionality once the core is in place.
>>>
>>> Stuart
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The
>>>
>>> A timeout attribute will also be added to the shutdown command,
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
They are actually orthogonal, a server can be in both RESTART_REQUIRED
and any one of the suspend states.

RESTART_REQUIRED is very much tied to services and the management model,
while suspend/resume is a runtime only thing that should not touch the
state of services.


Stuart

Dimitris Andreadis wrote:

> Why not extend the states of the existing 'server-state' attribute to:
>
> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>
> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>
> On 10/06/2014 04:40, Stuart Douglas wrote:
>>
>> Scott Marlow wrote:
>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>> Server suspend and resume is a feature that allows a running server to
>>>> gracefully finish of all running requests. The most common use case for
>>>> this is graceful shutdown, where you would like a server to complete all
>>>> running requests, reject any new ones, and then shut down, however there
>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>> modify a data source or some other config, then resume).
>>>>
>>>> User View:
>>>>
>>>> For the users point of view two new operations will be added to the server:
>>>>
>>>> suspend(timeout)
>>>> resume()
>>>>
>>>> A runtime only attribute suspend-state (is this a good name?) will also
>>>> be added, that can take one of three possible values, RUNNING,
>>>> SUSPENDING, SUSPENDED.
>>> The SuspendController "state" might be a shorter attribute name and just
>>> as meaningful.
>> This will be in the global server namespace (i.e. from the CLI
>> :read-attribute(name="suspend-state").
>>
>> I think the name 'state' is just two generic, which kind of state are we
>> talking about?
>>
>>> When are we in the RUNNING state?  Is that simply the pre-state for
>>> SUSPENDING?
>> 99.99% of the time. Basically servers are always running unless they are
>> have been explicitly suspended, and then they go from suspending to
>> suspended. Note that if resume is called at any time the server goes to
>> RUNNING again immediately, as when subsystems are notified they should
>> be able to begin accepting requests again straight away.
>>
>> We also have admin only mode, which is a kinda similar concept, so we
>> need to make sure we document the differences.
>>
>>>> A timeout attribute will also be added to the shutdown operation. If
>>>> this is present then the server will first be suspended, and the server
>>>> will not shut down until either the suspend is successful or the timeout
>>>> occurs. If no timeout parameter is passed to the operation then a normal
>>>> non-graceful shutdown will take place.
>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>> immediately (call System.exit()).
>> It will execute the same way it does today (all services will shut down
>> and then the server will exit).
>>
>> Stuart
>>
>>>> In domain mode these operations will be added to both individual server
>>>> and a complete server group.
>>>>
>>>> Implementation Details
>>>>
>>>> Suspend/resume operates on entry points to the server. Any request that
>>>> is currently running must not be affected by the suspend state, however
>>>> any new request should be rejected. In general subsystems will track the
>>>> number of outstanding requests, and when this hits zero they are
>>>> considered suspended.
>>>>
>>>> We will introduce the notion of a global SuspendController, that manages
>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>> shutdown register callback handlers with this controller.
>>>>
>>>> When the suspend() operation is invoked the controller will invoke all
>>>> these callbacks, letting the subsystem know that the server is suspend,
>>>> and providing the subsystem with a SuspendContext object that the
>>>> subsystem can then use to notify the controller that the suspend is
>>>> complete.
>>>>
>>>> What the subsystem does when it receives a suspend command, and when it
>>>> considers itself suspended will vary, but in the common case it will
>>>> immediatly start rejecting external requests (e.g. Undertow will start
>>>> responding with a 503 to all new requests). The subsystem will also
>>>> track the number of outstanding requests, and when this hits zero then
>>>> the subsystem will notify the controller that is has successfully
>>>> suspended.
>>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>> load balancer that the node is no longer available etc. In some cases we
>>>> may want to make this configurable to an extent (e.g. Undertow could be
>>>> configured to allow requests with an existing session, and not consider
>>>> itself timed out until all sessions have either timed out or been
>>>> invalidated, although this will obviously take a while).
>>>>
>>>> If anyone has any feedback let me know. In terms of implementation my
>>>> basic plan is to get the core functionality and the Undertow
>>>> implementation into Wildfly, and then work with subsystem authors to
>>>> implement subsystem specific functionality once the core is in place.
>>>>
>>>> Stuart
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The
>>>>
>>>> A timeout attribute will also be added to the shutdown command,
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Dimitris Andreadis
Isn't RESTART_REQUIRED also orthogonal to RUNNING?

On 10/06/2014 17:17, Stuart Douglas wrote:

> They are actually orthogonal, a server can be in both RESTART_REQUIRED and any one of the
> suspend states.
>
> RESTART_REQUIRED is very much tied to services and the management model, while
> suspend/resume is a runtime only thing that should not touch the state of services.
>
>
> Stuart
>
> Dimitris Andreadis wrote:
>> Why not extend the states of the existing 'server-state' attribute to:
>>
>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>
>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>
>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>
>>> Scott Marlow wrote:
>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>> Server suspend and resume is a feature that allows a running server to
>>>>> gracefully finish of all running requests. The most common use case for
>>>>> this is graceful shutdown, where you would like a server to complete all
>>>>> running requests, reject any new ones, and then shut down, however there
>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>> modify a data source or some other config, then resume).
>>>>>
>>>>> User View:
>>>>>
>>>>> For the users point of view two new operations will be added to the server:
>>>>>
>>>>> suspend(timeout)
>>>>> resume()
>>>>>
>>>>> A runtime only attribute suspend-state (is this a good name?) will also
>>>>> be added, that can take one of three possible values, RUNNING,
>>>>> SUSPENDING, SUSPENDED.
>>>> The SuspendController "state" might be a shorter attribute name and just
>>>> as meaningful.
>>> This will be in the global server namespace (i.e. from the CLI
>>> :read-attribute(name="suspend-state").
>>>
>>> I think the name 'state' is just two generic, which kind of state are we
>>> talking about?
>>>
>>>> When are we in the RUNNING state?  Is that simply the pre-state for
>>>> SUSPENDING?
>>> 99.99% of the time. Basically servers are always running unless they are
>>> have been explicitly suspended, and then they go from suspending to
>>> suspended. Note that if resume is called at any time the server goes to
>>> RUNNING again immediately, as when subsystems are notified they should
>>> be able to begin accepting requests again straight away.
>>>
>>> We also have admin only mode, which is a kinda similar concept, so we
>>> need to make sure we document the differences.
>>>
>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>> this is present then the server will first be suspended, and the server
>>>>> will not shut down until either the suspend is successful or the timeout
>>>>> occurs. If no timeout parameter is passed to the operation then a normal
>>>>> non-graceful shutdown will take place.
>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>> immediately (call System.exit()).
>>> It will execute the same way it does today (all services will shut down
>>> and then the server will exit).
>>>
>>> Stuart
>>>
>>>>> In domain mode these operations will be added to both individual server
>>>>> and a complete server group.
>>>>>
>>>>> Implementation Details
>>>>>
>>>>> Suspend/resume operates on entry points to the server. Any request that
>>>>> is currently running must not be affected by the suspend state, however
>>>>> any new request should be rejected. In general subsystems will track the
>>>>> number of outstanding requests, and when this hits zero they are
>>>>> considered suspended.
>>>>>
>>>>> We will introduce the notion of a global SuspendController, that manages
>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>> shutdown register callback handlers with this controller.
>>>>>
>>>>> When the suspend() operation is invoked the controller will invoke all
>>>>> these callbacks, letting the subsystem know that the server is suspend,
>>>>> and providing the subsystem with a SuspendContext object that the
>>>>> subsystem can then use to notify the controller that the suspend is
>>>>> complete.
>>>>>
>>>>> What the subsystem does when it receives a suspend command, and when it
>>>>> considers itself suspended will vary, but in the common case it will
>>>>> immediatly start rejecting external requests (e.g. Undertow will start
>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>> track the number of outstanding requests, and when this hits zero then
>>>>> the subsystem will notify the controller that is has successfully
>>>>> suspended.
>>>>> Some subsystems will obviously want to do other actions on suspend, e.g.
>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>> load balancer that the node is no longer available etc. In some cases we
>>>>> may want to make this configurable to an extent (e.g. Undertow could be
>>>>> configured to allow requests with an existing session, and not consider
>>>>> itself timed out until all sessions have either timed out or been
>>>>> invalidated, although this will obviously take a while).
>>>>>
>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>> basic plan is to get the core functionality and the Undertow
>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>> implement subsystem specific functionality once the core is in place.
>>>>>
>>>>> Stuart
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The
>>>>>
>>>>> A timeout attribute will also be added to the shutdown command,
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Stuart Douglas
I don't think so, I think RESTART_REQUIRED means running, but I need to
restart to apply management changes (I think that attribute can also be
RELOAD_REQUIRED, I think the description may be a bit out of date).

To accurately reflect all the possible states you would need something like:

RUNNING
PAUSING,
PAUSED,
RESTART_REQUIRED
PAUSING_RESTART_REQUIRED
PAUSED_RESTART_REQUIRED
RELOAD_REQUIRED
PAUSING_RELOAD_REQUIRED
PAUSED_RELOAD_REQUIRED

Which does not seem great, and may introduce compatibility problems for
clients that are not expecting these new values.

Stuart



Dimitris Andreadis wrote:

> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>
> On 10/06/2014 17:17, Stuart Douglas wrote:
>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>> and any one of the
>> suspend states.
>>
>> RESTART_REQUIRED is very much tied to services and the management
>> model, while
>> suspend/resume is a runtime only thing that should not touch the state
>> of services.
>>
>>
>> Stuart
>>
>> Dimitris Andreadis wrote:
>>> Why not extend the states of the existing 'server-state' attribute to:
>>>
>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>
>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>
>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>
>>>> Scott Marlow wrote:
>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>> Server suspend and resume is a feature that allows a running
>>>>>> server to
>>>>>> gracefully finish of all running requests. The most common use
>>>>>> case for
>>>>>> this is graceful shutdown, where you would like a server to
>>>>>> complete all
>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>> there
>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>> modify a data source or some other config, then resume).
>>>>>>
>>>>>> User View:
>>>>>>
>>>>>> For the users point of view two new operations will be added to
>>>>>> the server:
>>>>>>
>>>>>> suspend(timeout)
>>>>>> resume()
>>>>>>
>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>> also
>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>> SUSPENDING, SUSPENDED.
>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>> just
>>>>> as meaningful.
>>>> This will be in the global server namespace (i.e. from the CLI
>>>> :read-attribute(name="suspend-state").
>>>>
>>>> I think the name 'state' is just two generic, which kind of state
>>>> are we
>>>> talking about?
>>>>
>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>> SUSPENDING?
>>>> 99.99% of the time. Basically servers are always running unless they
>>>> are
>>>> have been explicitly suspended, and then they go from suspending to
>>>> suspended. Note that if resume is called at any time the server goes to
>>>> RUNNING again immediately, as when subsystems are notified they should
>>>> be able to begin accepting requests again straight away.
>>>>
>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>> need to make sure we document the differences.
>>>>
>>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>>> this is present then the server will first be suspended, and the
>>>>>> server
>>>>>> will not shut down until either the suspend is successful or the
>>>>>> timeout
>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>> normal
>>>>>> non-graceful shutdown will take place.
>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>> immediately (call System.exit()).
>>>> It will execute the same way it does today (all services will shut down
>>>> and then the server will exit).
>>>>
>>>> Stuart
>>>>
>>>>>> In domain mode these operations will be added to both individual
>>>>>> server
>>>>>> and a complete server group.
>>>>>>
>>>>>> Implementation Details
>>>>>>
>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>> that
>>>>>> is currently running must not be affected by the suspend state,
>>>>>> however
>>>>>> any new request should be rejected. In general subsystems will
>>>>>> track the
>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>> considered suspended.
>>>>>>
>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>> manages
>>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>>> shutdown register callback handlers with this controller.
>>>>>>
>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>> all
>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>> suspend,
>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>> complete.
>>>>>>
>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>> when it
>>>>>> considers itself suspended will vary, but in the common case it will
>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>> start
>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>> then
>>>>>> the subsystem will notify the controller that is has successfully
>>>>>> suspended.
>>>>>> Some subsystems will obviously want to do other actions on
>>>>>> suspend, e.g.
>>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>>> load balancer that the node is no longer available etc. In some
>>>>>> cases we
>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>> could be
>>>>>> configured to allow requests with an existing session, and not
>>>>>> consider
>>>>>> itself timed out until all sessions have either timed out or been
>>>>>> invalidated, although this will obviously take a while).
>>>>>>
>>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>>> implement subsystem specific functionality once the core is in place.
>>>>>>
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The
>>>>>>
>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Design Proposal: Server suspend/resume (AKA Graceful Shutdown)

Dimitris Andreadis
It seems to me RESTART_REQUIRED (or RELOAD_REQUIRED) should be a boolean on its own to
simplify the state diagram.

On 10/06/2014 17:40, Stuart Douglas wrote:

> I don't think so, I think RESTART_REQUIRED means running, but I need to restart to apply
> management changes (I think that attribute can also be RELOAD_REQUIRED, I think the
> description may be a bit out of date).
>
> To accurately reflect all the possible states you would need something like:
>
> RUNNING
> PAUSING,
> PAUSED,
> RESTART_REQUIRED
> PAUSING_RESTART_REQUIRED
> PAUSED_RESTART_REQUIRED
> RELOAD_REQUIRED
> PAUSING_RELOAD_REQUIRED
> PAUSED_RELOAD_REQUIRED
>
> Which does not seem great, and may introduce compatibility problems for clients that are not
> expecting these new values.
>
> Stuart
>
>
>
> Dimitris Andreadis wrote:
>> Isn't RESTART_REQUIRED also orthogonal to RUNNING?
>>
>> On 10/06/2014 17:17, Stuart Douglas wrote:
>>> They are actually orthogonal, a server can be in both RESTART_REQUIRED
>>> and any one of the
>>> suspend states.
>>>
>>> RESTART_REQUIRED is very much tied to services and the management
>>> model, while
>>> suspend/resume is a runtime only thing that should not touch the state
>>> of services.
>>>
>>>
>>> Stuart
>>>
>>> Dimitris Andreadis wrote:
>>>> Why not extend the states of the existing 'server-state' attribute to:
>>>>
>>>> (STARTING, RUNNING, SUSPENDING, SUSPENDED, RESTART_REQUIRED RUNNING)
>>>>
>>>> http://wildscribe.github.io/Wildfly/8.0.0.Final/index.html
>>>>
>>>> On 10/06/2014 04:40, Stuart Douglas wrote:
>>>>>
>>>>> Scott Marlow wrote:
>>>>>> On 06/09/2014 06:38 PM, Stuart Douglas wrote:
>>>>>>> Server suspend and resume is a feature that allows a running
>>>>>>> server to
>>>>>>> gracefully finish of all running requests. The most common use
>>>>>>> case for
>>>>>>> this is graceful shutdown, where you would like a server to
>>>>>>> complete all
>>>>>>> running requests, reject any new ones, and then shut down, however
>>>>>>> there
>>>>>>> are also plenty of other valid use cases (e.g. suspend the server,
>>>>>>> modify a data source or some other config, then resume).
>>>>>>>
>>>>>>> User View:
>>>>>>>
>>>>>>> For the users point of view two new operations will be added to
>>>>>>> the server:
>>>>>>>
>>>>>>> suspend(timeout)
>>>>>>> resume()
>>>>>>>
>>>>>>> A runtime only attribute suspend-state (is this a good name?) will
>>>>>>> also
>>>>>>> be added, that can take one of three possible values, RUNNING,
>>>>>>> SUSPENDING, SUSPENDED.
>>>>>> The SuspendController "state" might be a shorter attribute name and
>>>>>> just
>>>>>> as meaningful.
>>>>> This will be in the global server namespace (i.e. from the CLI
>>>>> :read-attribute(name="suspend-state").
>>>>>
>>>>> I think the name 'state' is just two generic, which kind of state
>>>>> are we
>>>>> talking about?
>>>>>
>>>>>> When are we in the RUNNING state? Is that simply the pre-state for
>>>>>> SUSPENDING?
>>>>> 99.99% of the time. Basically servers are always running unless they
>>>>> are
>>>>> have been explicitly suspended, and then they go from suspending to
>>>>> suspended. Note that if resume is called at any time the server goes to
>>>>> RUNNING again immediately, as when subsystems are notified they should
>>>>> be able to begin accepting requests again straight away.
>>>>>
>>>>> We also have admin only mode, which is a kinda similar concept, so we
>>>>> need to make sure we document the differences.
>>>>>
>>>>>>> A timeout attribute will also be added to the shutdown operation. If
>>>>>>> this is present then the server will first be suspended, and the
>>>>>>> server
>>>>>>> will not shut down until either the suspend is successful or the
>>>>>>> timeout
>>>>>>> occurs. If no timeout parameter is passed to the operation then a
>>>>>>> normal
>>>>>>> non-graceful shutdown will take place.
>>>>>> Will non-graceful shutdown wait for non-daemon threads or terminate
>>>>>> immediately (call System.exit()).
>>>>> It will execute the same way it does today (all services will shut down
>>>>> and then the server will exit).
>>>>>
>>>>> Stuart
>>>>>
>>>>>>> In domain mode these operations will be added to both individual
>>>>>>> server
>>>>>>> and a complete server group.
>>>>>>>
>>>>>>> Implementation Details
>>>>>>>
>>>>>>> Suspend/resume operates on entry points to the server. Any request
>>>>>>> that
>>>>>>> is currently running must not be affected by the suspend state,
>>>>>>> however
>>>>>>> any new request should be rejected. In general subsystems will
>>>>>>> track the
>>>>>>> number of outstanding requests, and when this hits zero they are
>>>>>>> considered suspended.
>>>>>>>
>>>>>>> We will introduce the notion of a global SuspendController, that
>>>>>>> manages
>>>>>>> the servers suspend state. All subsystems that wish to do a graceful
>>>>>>> shutdown register callback handlers with this controller.
>>>>>>>
>>>>>>> When the suspend() operation is invoked the controller will invoke
>>>>>>> all
>>>>>>> these callbacks, letting the subsystem know that the server is
>>>>>>> suspend,
>>>>>>> and providing the subsystem with a SuspendContext object that the
>>>>>>> subsystem can then use to notify the controller that the suspend is
>>>>>>> complete.
>>>>>>>
>>>>>>> What the subsystem does when it receives a suspend command, and
>>>>>>> when it
>>>>>>> considers itself suspended will vary, but in the common case it will
>>>>>>> immediatly start rejecting external requests (e.g. Undertow will
>>>>>>> start
>>>>>>> responding with a 503 to all new requests). The subsystem will also
>>>>>>> track the number of outstanding requests, and when this hits zero
>>>>>>> then
>>>>>>> the subsystem will notify the controller that is has successfully
>>>>>>> suspended.
>>>>>>> Some subsystems will obviously want to do other actions on
>>>>>>> suspend, e.g.
>>>>>>> clustering will likely want to fail over, mod_cluster will notify the
>>>>>>> load balancer that the node is no longer available etc. In some
>>>>>>> cases we
>>>>>>> may want to make this configurable to an extent (e.g. Undertow
>>>>>>> could be
>>>>>>> configured to allow requests with an existing session, and not
>>>>>>> consider
>>>>>>> itself timed out until all sessions have either timed out or been
>>>>>>> invalidated, although this will obviously take a while).
>>>>>>>
>>>>>>> If anyone has any feedback let me know. In terms of implementation my
>>>>>>> basic plan is to get the core functionality and the Undertow
>>>>>>> implementation into Wildfly, and then work with subsystem authors to
>>>>>>> implement subsystem specific functionality once the core is in place.
>>>>>>>
>>>>>>> Stuart
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The
>>>>>>>
>>>>>>> A timeout attribute will also be added to the shutdown command,
>>>>>>> _______________________________________________
>>>>>>> wildfly-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
12