Using GPUBox

The GPUBox software enables users to use any number of the GPU devices on demand.

There are two basic operations that users can take.

GPU allocation Upon the request, OServer checks for the availability of the GPUs and assigns them to the requester. The process can be compared to plugging in a physical graphic card into the PCI slot without the power outage; Allocated GPUs are recognized by supported applications as if they were installed in in the system. However, the allocated devices are not visible in the operating system as regular PCI devices.
GPU drop Upon the request, OServer removes the GPUs from the user's allocation list; this process can be compared to removing the PCI card from the computer. After the drop operation, the application cannot recognize the GPU anymore.

How to run a program under GPUBox

The following chapter shows how to run a program within the GPUBox infrastructure in only a few steps. In order to run one of the programs compatible with GPUBox, the user first has to sign in and download the token. The token is saved in the Client's configuration file and used for the authorization of every GPUBox operation.

In this chapter, we will go through the scenario in which the user bob runs Blender under the Linux operating system connected to OServer with the http://203.0.113.1:8081 address. Similar steps can be taken to run Blender on Windows.

We encourage starting programs from the terminal, both in Linux and Windows. Otherwise, the user might miss the important messages that could help in a possible problem-solving procedure.

Sign in

$ gpubox token -u=bob -o=http://203.0.113.1:8081
Enter password: ******
Configuration file /home/bob/.gpubox already exists, overwrite it? [y/n]y
Login successful.

Allocate GPU

$ gpubox free
+---------+----------------------------------+-------------------+
|DID      |GPU name                          |Memory (GB)        |
+---------+----------------------------------+-------------------+
|1        |GeForce GTX 680                   |2                  |
+---------+----------------------------------+-------------------+

The above command displays the available GPU devices. To use the GPU, you have to allocate devices and issue the following command, where 1 is a Device ID (Column DID in the gpubox free subcommand output), 12 is the number of requested GPUs.

$ gpubox a 1 12
Created
+---+----------------+-------------+-------------+-------+---------+
|ID |GPU name        |PCI          |Client's IP  |Status |Since    |
+---+----------------+-------------+-------------+-------+---------+
|1  |GeForce GTX 690 |710B:01:00.0 |203.0.113.51 |SHARED |2013-... |
|2  |GeForce GTX 690 |710B:02:00.0 |203.0.113.51 |SHARED |2013-... |
|3  |GeForce GTX 690 |710B:03:00.0 |203.0.113.51 |SHARED |2013-... |
|4  |GeForce GTX 690 |710C:01:00.0 |203.0.113.51 |SHARED |2013-... |
|5  |GeForce GTX 690 |710C:02:00.0 |203.0.113.51 |SHARED |2013-... |
|6  |GeForce GTX 690 |710C:03:00.0 |203.0.113.51 |SHARED |2013-... |
|7  |GeForce GTX 690 |710D:01:00.0 |203.0.113.51 |SHARED |2013-... |
|8  |GeForce GTX 690 |710D:02:00.0 |203.0.113.51 |SHARED |2013-... |
|9  |GeForce GTX 690 |710D:03:00.0 |203.0.113.51 |SHARED |2013-... |
|10 |GeForce GTX 690 |710E:01:00.0 |203.0.113.51 |SHARED |2013-... |
|11 |GeForce GTX 690 |710E:02:00.0 |203.0.113.51 |SHARED |2013-... |
|12 |GeForce GTX 690 |710E:03:00.0 |203.0.113.51 |SHARED |2013-... |
+---+----------------+-------------+-------------+-------+---------+

Run program

Once the devices are allocated, the next step is to run the program compatible with GPUBox.

$ blender

Go to File,User preferences and open the System tab. The allocated devices should be now visible. There should be exactly as many devices visible as you have allocated in the previous steps. When the GPUs are no longer needed, you can simply drop them.

$ gpubox drop all
GPUs dropped

gpubox command on GPUBox Client for Windows

Every command described in this section applies also to GPUBox Client for Windows. To use the gpubox command on Windows, the user can launch the gpubox.bat script from Client's installation directory or from the Start menu, and then issue the commands accordingly to the instructions from this section.

Security

Authorization in the GPUBox infrastructure is based on passing the security tokens. In order to obtain the token tied to a particular user's account, he must sign in. You can find more details about the logging process in the Login to GPUBox infrastructure chapter.

We encourage the use of HTTPS protocol instead of HTTP. Otherwise, passwords and tokens can be visible to third parties.

The password should be periodically changed. The same rule concerns tokens, which should be newly generated on a regular basis if the network is not properly secured.

$ gpubox gentoken --no

A new token is quickly generated, but not saved in the Client's configuration file. This prevents any unwanted access to the user's account in the GPUBox infrastructure.

Login to GPUBox Infrastructure

In order to use the GPUBox infrastructure, the user must sign in to OServer to obtain the token that is used for authorizing the Client's operations.

$ gpubox token -u=gpubox -o=https://203.0.113.1:8082
Enter password: ******
Configuration file /home/bob/.gpubox already exists, overwrite it? [y/n]y
Login successful.

Bob has signed in to the GPUBox infrastructure as the gpubox user. First, the user is asked for the password (default gpubox) and the request is then sent to OServer with the https://203.0.113.1:8082 . After successful login, Bob is asked whether or not to save the token into the Client's configuration file.

OServer's address used for all of the subcommands of the gpubox command must be a full HTTP address. OServer's address is provided to users by an administrator and has the following format:

  • http://[OSERVER_IP]:[PORT] for a non-encrypted connection, for example, http://203.0.113.1:8081 or,
  • https://[OSERVER_IP]:[PORT] for the SSL connection, for example, https://203.0.113.1:8082.
  • After successfully signing in, OServer's address and token are saved in Client's configuration file:

  • $HOME/.gpubox

    Linux

  • %LOCALAPPDATA%\gpubox.config

    Windows

  • You can check whether OServer is available under a certain address.

    $ gpubox ping http://203.0.113.1:8081
    OK
    

    When the ping fails, you can also receive the appropriate messages:

    *** ERROR: Connection to OServer failed
    OServer unavailable
    

    You can always check the current OServer that you are connected to by issuing the gpubox whoami|w command and finding the address indicated as Connected to:

    $ gpubox whoami
    ID                 :1
    UserID             :gpubox
    Username           :GPUBox administrator
    Valid from         :2013-05-19 13:48:44
    Valid to           :2999-12-13 00:59:59
    Attempted login     :0
    Last successful login :2013-05-24 08:35:43
    Last failed login  :0001-01-01 01:01:01
    Group ID           :0
    Max GPU            :20
    Connected to       :http://203.0.113.1:8081
    

    GPU status

    User can see the GPU in one of the many statuses. The three basic statuses are:

    FREE The device is available and it is not allocated to any user. Devices with FREE status are seen by a user in a listing generated by the gpubox free command.
    SHARED The device is allocated by a user in the SHARED usage mode and the user can expect that more users can allocate the same device. As long as the number of users using the GPU working in the SHARED usage mode is lower than the value of the oserver_max_users_per_gpu parameter in the OServer configuration, the device is visible for other users as ‘available’ for allocation when they issue the gpubox free command.
    EXCLUSIVE The device is allocated exclusively by one user. No other user can allocate the same device.

    The users can also see other statuses related to the specific circumstances that might occur in the GPUBox infrastructure:

    STOPPED
  • an administrator has stopped GPUServer and all of the GPUs related to this server become STOPPED.
  • the GPUs in the STOPPED status are not considered in further processing until they change their status to the previous one.
  • the GPU can be dropped at any time. The user will then receive the *** "GPU in status BROKEN or STOPPED was dropped" message.
  • when GPUServer is restarted, the STOPPED GPUs are recovered to their previous status.
  • BROKEN
  • the most probable reason for the status is a failure of GPUServer or the infrastructure elements.
  • the GPUs in the BROKEN status are not considered in further processing until they are recovered and change their status to their previous one.
  • the GPU can be dropped at any time. The user will then receive the *** "GPU in status BROKEN or STOPPED was dropped" message.
  • when GPUServer is restarted, the BROKEN GPUs are recovered to their previous status.
  • OFF-PENDING
  • an administrator has requested to switch the GPU to the OFF status. The request is pending until the GPU is allocated by at least one user.
  • the GPU can be dropped at any time.
  • EXCLUSIVE-PENDING
  • an administrator has requested to switch the GPU to the EXCLUSIVE status. The GPU stays in EXCLUSIVE-PENDING status until the GPU is allocated by more than one user.
  • the GPU can be dropped at any time.
  • Tables displayed for particular commands are described in detail in command references.

    List available GPUs

    You can list the GPUs that are available for allocation:

    gpubox f|free
    
    $ gpubox f
    +------+------------------------------+-------------+-------------+
    |ID    |GPU name                      |Memory (GB)  |Available GPU|
    +------+------------------------------+-------------+-------------+
    |1     |GeForce GTX 690               |2            |12           |
    |2     |GeForce GTX 780               |3            |5            |
    +------+------------------------------+-------------+-------------+
    

    The content of this table means that at the moment, a user can add a maximum of 17 GPUs (unless it is limited in OServer's configuration) : 12 GTX690 and 5 GTX780

    The Available GPU column may or may not be displayed to the users - it depends on the OServer configuration (parameter oserver_gpu_count_info).

    Allocation modes

    A user has a few options while allocating the GPUs:

    Allocation mode The GPUs can be allocated by a user at a demanded number in either:
  • loose mode

    Default

    - if there are less free GPUs than requested, they will be allocated possibly at a maximum number
  • strict mode - Only the requested number of GPUs can be allocated, however, if there is none available, no GPU will be allocated
  • Usage mode The GPUs can be used in either
  • shared mode

    Default

    - the device is shared between the users, which means that they can use it simultaneously
  • exclusive mode - the GPU device is allocated exclusively to a single user.
  • Allocate GPU

    If you already know what GPUs you would like to use, then you can issue the following command to allocate the chosen GPUs of a specific number and mode:

    gpubox add|a  id  [n] [shared|s|exclusive|e] [loose|l|strict|t]
    Parameters available for this subcommand:
    id ID of chosen GPU (Column “ID” from list of the available GPUs)
    n Number of GPUs to be allocated, optional parameter, default = 1.
    shared|s Allocate the GPU in the shared usage mode.

    Default

    .
    exclusive|x Allocate the GPU in the exclusive usage mode.
    loose|l Less GPUs can be allocated if they are not available in the requested amount.

    Default

    strict|t Allocate only the requested number of the GPUs or none.
    Usage example:
    $ gpubox add 1 8
    When we consider the exemplary list of the available GPUs given in the List available GPUs section, issuing this will result in allocating 8 GPUs with ID 1 (in this example, it would be GeForce GTX 690). The default mode in which GPUs are added is shared. It means that allocated GPUs can be used at the same time by other users on their systems. To use devices in exclusive mode, the e|exclusive parameter must be applied. To allocate 5 GPUs GTX780 in exclusive mode, the user has to issue:
    $ gpubox add 2 5 
    This will result in adding 5 GPUs with ID 2 (in this case, GeForce GTX 780) in the exclusive usage mode, which means that no other user will be able to perform any actions on the allocated GPUs. After every operation of adding new GPUs, a list of currently allocated devices will be displayed.

    List allocated GPUs

    In order to display the list of the GPU devices currently allocated to a user, the following command must be issued:
    gpubox l|list
    No additional parameters are required for this subcommand. For example, if a user has already allocated 8 GTXs 690 in default shared mode and 5 GTXs 780 in exclusive mode, the list of currently allocated GPUs will look like this:
    $ gpubox list
    +----+----------------+------------+--------------+---------+-------------------+
    |ID  |GPU name        |PCI         |Client's IP   |Status   |Since              |
    +----+----------------+------------+--------------+---------+-------------------+
    |1   |GeForce GTX 690 |710B:01:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |2   |GeForce GTX 690 |710B:02:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |3   |GeForce GTX 690 |710B:03:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |4   |GeForce GTX 690 |710C:01:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |5   |GeForce GTX 690 |710C:02:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |6   |GeForce GTX 690 |710C:03:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |7   |GeForce GTX 690 |710D:01:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |8   |GeForce GTX 690 |710D:02:00.0|203.0.113.150 |SHARED   |2013-10-02 15:28:11|
    |9   |GeForce GTX 780 |710D:03:00.0|203.0.113.150 |EXCLUSIVE|2013-10-02 15:32:20|
    |10  |GeForce GTX 780 |710E:01:00.0|203.0.113.150 |EXCLUSIVE|2013-10-02 15:32:20|
    |11  |GeForce GTX 780 |710E:02:00.0|203.0.113.150 |EXCLUSIVE|2013-10-02 15:32:20|
    |12  |GeForce GTX 780 |710E:03:00.0|203.0.113.150 |EXCLUSIVE|2013-10-02 15:32:20|
    |13  |GeForce GTX 780 |710F:01:00.0|203.0.113.150 |EXCLUSIVE|2013-10-02 15:32:20|
    +----+----------------+------------+--------------+---------+-------------------+
    

    If a user does not have any GPU allocated to his system, the No content message will be displayed.

    OServer is trying to keep low numeric values of ID. The ID of every other allocated GPU is equal to the currently highest ID + 1.

    Drop GPU

    In order to remove the allocated GPUs from the user’s system, a command with this structure must be executed:

    gpubox drop|d <id>|<id1,id2,id3...>|<id1-idn>
    where using adequate parameters will result in:
    <id> …removing a particular GPU with ID=<id>.
    <id1,id2,id3...> …removing the listed GPUs separated with commas.
    <id1-idn> …removing the GPUs within range <id1> to <idn>.
    Knowing the IDs of the GPUs allocated to the system, a user can delete a chosen single GPU, multiple listed GPUs, or the GPUs with an ID within a specified range. For instance, if a user wants to delete one of the GTXs 690 allocated to his system, let’s say the one with ID 5, then he has to issue the following command:
    $ gpubox drop 5
    If a user wants to remove the GPUs with IDs from 7 to 10, the command will be:
    $ gpubox drop 7-10
    If the GPUs to be dropped are those with IDs 2, 3, and 11, the command will look like:
    $ gpubox drop 2,3,11
    If a user wants to delete the same GPUs that were removed in given examples (2, 3, 5, 7, 8, 9, 10, 11) in one command, then he can do it by listing them one by one:
    $ gpubox drop 2,3,5,7,8,9,10,11 
    as well as by combining the ranges and single IDs
    $ gpubox drop 2,3,5,7-11
    After each gpubox drop action, a user will be informed about the result of the operation. The displayed message communicating the successful drop action will look like this:
    The following GPUs were successfully dropped: 2 , 3 , 5 , 7 , 8 , 9 , 10 , 11
    Failed drop operation (which might be a result of dropping the previous GPUs or invalid IDs, for example) will be communicated by this type of message:
    The following GPUs were not dropped: 2 , 3 , 5 , 7 , 8 , 9 , 10 , 11
    If the drop operation is partially successful, that will also be communicated:
    The following GPUs were successfully dropped: 8 , 9 , 10 , 11
    The following GPUs were not dropped: 2 , 3 , 5 , 7

    Reallocate GPU

    The GPUs are assigned to a particular IP address. In order to use the GPUs allocated from another IP address, a user has two options: drop and allocate again or use the here subcommand:

    $ gpubox here    
    

    All of the GPUs will be reallocated to the current IP addresses. The maximum number of the allowed GPU allocations per IP address is limited by the administrator. Subcommand here reallocates the GPUs up to the number allowed by the GPUBox infrastructure configuration.

    The user is working on the system with the 203.0.113.156 IP address, so the GPUs previously allocated by him on the system with the 203.0.113.151 address cannot be used anymore.

    $ gpubox list
    +---+----------------+-------------+--------------+--------+----------------+
    |ID |GPU name        |PCI          |Client's IP   |Status  |Since           |
    +---+----------------+-------------+--------------+--------+----------------+
    |1  |GeForce GTX 690 |7197:03:00.0 |203.0.113.151 |SHARED  |2013-05-25 09:18|
    |2  |GeForce GTX 690 |7197:04:00.0 |203.0.113.151 |SHARED  |2013-05-25 09:18|
    |3  |GeForce GTX 690 |7197:07:00.0 |203.0.113.151 |SHARED  |2013-05-25 09:18|
    |4  |GeForce GTX 690 |7197:08:00.0 |203.0.113.151 |SHARED  |2013-05-25 09:18|
    |5  |GeForce GTX 690 |719C:03:00.0 |203.0.113.156 |SHARED  |2013-05-25 09:19|
    |6  |GeForce GTX 690 |719C:04:00.0 |203.0.113.156 |SHARED  |2013-05-25 09:19|
    |7  |GeForce GTX 690 |719C:07:00.0 |203.0.113.156 |SHARED  |2013-05-25 09:19|
    |8  |GeForce GTX 690 |719C:08:00.0 |203.0.113.156 |SHARED  |2013-05-25 09:19|
    +---+----------------+-------------+--------------+--------+----------------+
    

    User can reallocate all of the GPUs to his current IP address:

    $ gpubox here
    $ gpubox list
    +---+----------------+-------------+--------------+-------+----------------+
    |ID |GPU name        |PCI          |Client's IP   |Status |Since           |
    +---+----------------+-------------+--------------+-------+----------------+
    |1  |GeForce GTX 690 |7197:03:00.0 |203.0.113.156 |SHARED |2013-05-25 09:18|
    |2  |GeForce GTX 690 |7197:04:00.0 |203.0.113.156 |SHARED |2013-05-25 09:18|
    |3  |GeForce GTX 690 |7197:07:00.0 |203.0.113.156 |SHARED |2013-05-25 09:18|
    |4  |GeForce GTX 690 |7197:08:00.0 |203.0.113.156 |SHARED |2013-05-25 09:18|
    |5  |GeForce GTX 690 |719C:03:00.0 |203.0.113.156 |SHARED |2013-05-25 09:19|
    |6  |GeForce GTX 690 |719C:04:00.0 |203.0.113.156 |SHARED |2013-05-25 09:19|
    |7  |GeForce GTX 690 |719C:07:00.0 |203.0.113.156 |SHARED |2013-05-25 09:19|
    |8  |GeForce GTX 690 |719C:08:00.0 |203.0.113.156 |SHARED |2013-05-25 09:19|
    +---+----------------+-------------+--------------+-------+----------------+
    

    InfiniBand communication

    In order to start the communication over InfiniBand, the GPUBox Client requires

    InfiniBand adapter The adapter can either be installed as a virtual function on a virtual system or as a physical card
    ibverbs library A low-level programming interface for InfiniBand communication. The library is delivered within the InfiniBand driver.
    libInfiniBand-gpubox.so

    Linux

    The libInfiniBand-gpubox.so library should be installed in the [CLIENT_INSTALLATION_DIR]/lib64 directory and be accessible via the $LD_LIBRARY_PATH environment variable.
    InfiniBand-gpubox.dll

    Windows

    The InfiniBand-gpubox.dll library should be installed in the %WINDIR%\system32 directory.
    Enabled in GPUBox Infrastructure The InfiniBand communication must be enabled by an administrator.

    You can verify the status and the number of the available InfiniBand devices in the system by issuing the command:

    $ ibstatus
    Infiniband device 'mlx4_0' port 1 status:
    	default gid:	 fe80:0000:0000:0000:0014:0500:0000:0001
    	base lid:	 0x9
    	sm lid:		 0x4
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 40 Gb/sec (4X QDR)
    	link_layer:	 InfiniBand
    
    Infiniband device 'mlx4_0' port 2 status:
    	default gid:	 fe80:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 10 Gb/sec (4X)
    	link_layer:	 InfiniBand
    

    Here, we have only one device with one active, 40 Gb/sec link.

    Environment variables

    Client has only two available environment variables.

    GPUBOX_CLIENT Path to Client's configuration file. By default, the file is placed in:
  • $HOME/.gpubox

    Linux

  • %LOCALAPPDATA%/gpubox.config

    Windows

  • GPUBOX_IBDEV If more than one InifniBand adapter is installed, a user can set the name of the required device here. For example, GPUBOX_IBDEV=mlx4_0
    GPUBOX_IBPORTS If the InifiniBand adapter has more than one port, the variable sets which port is used by GPUServer and Client.

  • First port has number 1.
  • More ports can be specified in array.
  • When more ports are specified, threads use the ports in a round-robin like algorithm.
  • For example:

  • GPUBOX_IBPORTS=1
  • GPUBOX_IBPORTS=[1,2]
  • Verbose mode

    The gpubox command can be executed in verbose mode by using the -v switch. An exemplary regular output of the $ gpubox free command looks like this:
    $ gpubox f
    +-------+------------------+------------+---------------+
    |DID    |GPU name          |Memory (GB) |Available GPU  |
    +-------+------------------+------------+---------------+
    |1      |GeForce GTX 780   |3           |22             |
    +-------+------------------+------------+---------------+
    Using the -v switch will provide additional details preceding the regular command output:
    $ gpubox f -v
    Verbose mode is on
    >>>Processing FREE
    Reading configuration... GPUBOX_CLIENT environment variable does not exist or it's empty...
    trying to read file .gpubox in home directory
    home directory: /home/mary
    File opened successfully at location:/home/mary/.gpubox
    File size: 61
    Configuration OK.
    Free GPUs from http://203.0.113.1:8081/gpu/free
    
    ...
    
    Parsing returned JSON data...OK.
    Displaying data...
    Number of GPU device types (free): 1
    +-------+------------------+------------+---------------+
    |DID    |GPU name          |Memory (GB) |Available GPU  |
    +-------+------------------+------------+---------------+
    |1      |GeForce GTX 780   |3           |22             |
    +-------+------------------+------------+---------------+
    

    Using the verbose mode does not affect the execution of a command.

    Troubleshooting

    I cannot see the GPUs in the program

    Verify if you are signed in Issue the $ gpubox whoami command. If you see the Configuration reading error message, then you are not signed in. If you are signed in the appropriate output shown below will be displayed:
    ID                 :1
    UserID             :gpubox
    Username           :GPUBox administrator
    Valid from         :2013-05-19 13:48:44
    Valid to           :2999-12-13 00:59:59
    Attempted login     :0
    Last successful login :2013-05-26 08:44:40
    Last failed login  :0001-01-01 01:01:01
    Group ID           :0
    Max GPU            :4
    Connected to       :http://203.0.113.1:8081
    
    In this example, the user is signed in to OServer with the http://203.0.113.1:8081 IP address.
    Verify that you have the GPUs allocated Issue the $ gpubox list command; if you see the No Content message, then you do not have any GPUs allocated.
    Additionally, when the program starts, you will be notified about that in the GPUs are not allocated message.
    Verify if the libraries are installed

    Linux

    Once you issue the $ ls <CLIENT_INSTALLATION_DIR>/lib64/ | grep cuda command, you should see:
    libcuda.so
    libcuda.so.1
    libcuda.so.1.0.0
    
    otherwise reinstall GPUBox Client.
    Verify if the libraries are installed

    Windows

    Version 0.8.833 Check if the library <CLIENT_INSTALLATION_DIR>\lib\nvcuda.dll is copied to a folder with application executables.
    Verify if the libraries are installed

    Windows

    Version 0.8.801 Check in the %WINDIR%\system32 folder or in the program installation folder if the correct library nvcuda.dll is installed.
    Verify if the library is accessible

    Linux

    To verify if LD_LIBRARY_PATH the environment variable is set correctly, issue the command:
    env | grep LD
    LD_LIBRARY_PATH=/home/bob/gpubox-client/lib64
    

    I do not see all of the allocated GPUs in the program

    Verify the GPUs origin of allocation Issue the $ gpubox list command and verify that all IP addresses from the Client's IP column are the same. If they are not, then reallocate GPUs. Refer to the Reallocate GPUs chapter.

    I cannot allocate more GPUs than...

    The number of the allowed GPU allocations is constrained by the GPUBox infrastructure configuration parameter and can also be individually limited for each user by an administrator.
    Issue the $ gpubox whoami command to verify the maximum number of GPUs (Max GPU) that you can allocate.