|
|
Janet.CAS |
|
|
|
Janet.ADÉ |
|
|
|
Suspendable Interpreters
Balancing load in Janet is done by
suspending executing interpreters and evicting them to nodes with a
matching capability. When defining an interpreter the user has the
choice to implement either the regular IInterpreter interface provided
by Janet.CAS or the derived IWorkloadBalancingInterpreter
interface provided by Janet.ADÉ. Only interpreters that implement the IWorkloadBalancingInterpreter
interface
can be suspended and re-started. Janet.ADÉ will detect load
imbalances and level them out by evicting commands of suspendable
interpreters as required. This process is automatic and transparent.
However, the user has to implement the functionality to suspend and
interpreter. For this reason, load balancing in Janet is said to be
semi-transparent. When the system requests an interpreter to suspend
the user has to terminate execution and write context information back
into the interpreter's command. The command is then evicted to some
other executor where a different interpreter, defined at the
destination executor to execute the received command, resumes execution
with the context stored in the command.
Janet does not level out load imbalances by migrating agents. Instead
it evicts commands. This may appear astonishing at first sight, but has
many advantages. There is no severe security problem as with mobile
agents. To make migration of mobile agents as safe as theoretically
possible a lot of CPU power is consumed for encryption and decryption
of the agent. Furthermore, evicting entire agents is costly in many
ways. Firstly, the agent has to be transferred with all its commands
and permanent data. Secondly, it has to be deregistered from the origin
node and registered at the destination node. This is a time consuming
process which makes immediate response to load changes unnecessarily
slower. In addition, when evicting individual commands instead of
entire agents load can be balanced at a finer level of granularity.
|
|
Executor-Observer-Distributor
Triad
Three different kind of nodes are defined
to accomplish load balancing. The different roles of these nodes are
defined by adding an extra capability to their system application. An executor node
carries user defined applications. If the user wants her
application to benefit from agent load balancing, she adds suspendable
interpreters
to her application's capabilities. Then her interpreters are visible to
the load balancing system and will be considered for eviction.
Every workstation in a cluster needs to have a single observer node.
The sole purpose of an observer is to observe the workstation's CPU
load. If a certain CPU load threshold value is exceeded or fallen below
the distributor node is notified. In the former case the distributor
will make all commands of all executor nodes on the workstation be
evicted to other executor nodes with matching capabilities on other
workstations (full eviction). In the later case all executor nodes on
the workstation
the observer resides on are re-considered for hosting evicted commands
from other workstation's executor nodes.
An executor informs the distributor node
whenever its load changes. When the distributor sees that an executor
is about to change to the idle state it evicts a suspendable
interpreter
from a different executor on another node with executing or waiting
suspendable interpreters of matching capability (partial eviction).
This
approach is called recipient-initiated eviction (the opposite is called
sender-initiated eviction). The
executor receives a command from the distributor to evict an
interpreter of a specific capability. The executor's interpreter that
suspends the user's interpreter is added to the executor's CORE
capability to make sure it is executed with highest priority, thus
being able to interrupt the user's interpreter in order to suspend and
evict it.
|
|
Load Determination
There are several
algorithms to detect load in a cluster. Several of these algorithms are
mentioned in the thesis.
One way to quantify a system's load is to determine the length of the
list of running processes. This approach is called ShortestQueue. Since
commands in Janet are placed in queues and then processed sequentially
it is easy to determine the overall queue length of an executor. This
makes applying the ShortestQueue algorithm in Janet straight-forward,
which is one reason this algorithm was chosen. Another reason is that
it delivers good results and does not require additional information
from the Java virtual machine that is difficult to obtain without
changing the virtual machine or host operating system.
|
|
Queue
Size Categories and Capability Queue Sizes
As mentioned earlier in the text an
executor notifies the distributor whenever its load changes. Notifying
the distributor whenever any of the executor's queue changes in length
would result in considerable command traffic in the cluster. To reduce
command traffic Queue Size Categories (QSCs) are introduced, which
allow classification of queue lengths. The distributor only receives a
load change notification from the executor when the executor changes
QSC. The user
may define several QSCs for her executor node.
|
|
<executor>
<queueSizeCategories>
<queueSizeCategory name="0" maxWaitingCommands="0"
maxExecutingCommands="0"
concatenationOperator="and" />
<queueSizeCategory name="1" maxWaitingCommands="0"
maxExecutingCommands="1"
concatenationOperator="and" />
<queueSizeCategory name="2" maxWaitingCommands="2"
maxExecutingCommands="1"
concatenationOperator="or" />
<queueSizeCategory name="3" maxWaitingCommands="3"
maxExecutingCommands="1"
concatenationOperator="and" />
<queueSizeCategory name="4" maxWaitingCommands="20"
maxExecutingCommands="3"
concatenationOperator="or" />
<queueSizeCategory name="5" maxWaitingCommands="20"
maxExecutingCommands="3"
concatenationOperator="and" />
</queueSizeCategories>
</executor> |
|
Figure 11: Sample executor's QSC definition
|
|
Figure 11 shows a sample executor's QSC definition. A QSC
definition must always have a QSC named "0" (QSC0), which is considered
by the system to represent the idle state, and a QSC named "1" (QSC1),
which is considered the state by the system where the executor is just
about to become idle. Additional QSCs are optional. It is recommended
to define more QSCs than only QSC0 and QSC1. QSCs can be defined
individually per node and workstation. By defining the same QSCs for
different executors or workstations individually the user has a means
to reflect different performance of different hardware.
Applications
and
capabilities
are
convenient
for
partitioning an
agent's executable part. However, these elements make finding a
decision how to react to a load change more effortful. To find an
executor with least load to accept an evicted command the distributor
needs to consider all nodes' QSCs and it needs to make sure that
candidate executor nodes dispose of the required capabilities. To
address the later issue the concept of Capability Queue Size (CQS) has
been introduced. The CQS of an executor is the sum of the number of all
executing or waiting commands of all agents on an executor of a
specific capability. Whenever an executor changes QSC it attaches a
list with the current CQS values of all its capabilities to the
notification command sent to the distributor. When looking for the
least loaded suitable executor a distributor combines QSCs and CQSs of
the cluster's executors to make an eviction decision. For further
details the interested user is referred to the thesis
document.
|
|
Load Balancing vs. Load
Sharing
Janet.ADÉ supports both load
balancing and load sharing. Load balancing means that a suspendable
interpreter can be suspended during execution and re-started at some
other location whenever the system wants to balance load. Load sharing
means that an interpreter before it is executed is sent by the
distributor to a node with lowest load and matching capability where it
is started and then runs to completion (initial placement). The system
applies load balancing on an interpreter when the user implements the IWorkloadBalancingInterpreter
interface for it. It applies load sharing on an interpreter when the
user implements the IWorkloadSharingInterpreter
interface. In case several distributors exist
in a cluster the distribution process for load sharing interpreters will receive a
speedup. At the time of writing load balancing does not benefit from
more than one distributor in the cluster since the required
coordination
between distributors for load balancing has not been implemented so far.
|
|
Open Issues
There are several open issued that need to
be closed till the first version of the system can leave the beta state
(besides doing some more testing). The most important issues are listed
below:
- Agent
aliasing: In the current state of the system a remote agent has to be
narrowed by specifying its complete physical location consisting of
node loaction, application name, capability name, and agent name.
Alternatively, an agent needs to be addressable through an alias.
- There are
scalability issues that need to be dealt with, because in Janet every
node has full information in it registry about all the applications
with their capabilities and agents that exist on other node. Whenever a
node's registry information changes the change has to be propagated to
all other nodes. Since every node in Janet is connected with every
other node the message traffic increases for propagating registry
changes by order of faculty. Some kind of page fault mechanism for
agent lookup is considered to improve scalability.
- Observers are
currently running in simulated mode. They do not retrieve the
workstation's CPU load from the host operating system somehow. Using
simulated observers is very effective when testing the system since it
is easy to create any kind of load change scenarios. Nevertheless, for
load balancing to be based on true load changes caused by threads
outside of Janet "real" observers have to be implemented.
- At the moment
interpreters once their execution has been started run to completion.
An agent can therefore not respond to an external event until the
currently executing interpreter has finished execution. There are several approaches
possible to address this issue. For example, JADE applies
"interleaving of conversations". The mechanism in Janet is not clear so
far. A solution could be that interpreters executed in response to
exported events are processed by an agent with the use of a higher
prioritized scheduler.
|
|
Future
Directions
The future plan for Janet is to publish it
and look for developers that would like to join in to add new
functionality or improve conceptual issues. It is planned to make Janet
ready for J2EE by using technologies such as JMS (and eventually
XML-RPC for interoperability with non-Java systems), JNDI, JMX. Janet
could be integrated with an open source application server such as JBoss.
A
true
clustering
solution
that
supports
failover
could be implemented as a research project. Security
has to be added so that only commands are accepted from trusted sources
and can only be sent to trusted destinations on a secure connection.
Functionality could be added to make agents migrate as well. This would
make sense to react to imbalances in memory usage rather than to react
to imbalances in CPU consumption. There are plenty of ideas of how to
continue with the development of Janet. Only time and resources are
needed ...
|
|
|