GPUBox Starter for Windows

GPUBox Starter is responsible for managing components of GPUBox in Windows operating system.

Roadmap

GPUBox Starter for Windows contains only basic information about managing GPUBox in Windows operating system. For more details refer to the document structure.

Software requirements and recommendations

Infrastructure running under the GPUBox software must meet certain hardware and software requirements.

Software requirements

64bit Operating system
Windows 7, Windows 8, Windows 10, Windows Server 2008, Windows Server 2012
Drivers
CUDA driver version 6.0 or higher.
Required CUDA libraries are included in graphics card driver, download NVidia driver.
Network
TCP/IP
InfiniBand

For InfiniBand support

InfiniBand is optional and requires additional hardware and software.
InfiniBand based on RDMA technology that features very high throughput and very low latency. It allows to configure up to 100Gb/s as native InfiniBand and/or Ethernet network.InfiniBand cards
Visit Mellanox website to install required drivers
From console window issue command ibstat to verify if the hardware and software are installed properly.

Optional Software
GPU Deployment Kit
In order to read information about a GPU’s temperature and fan speed.
GPUServer requires library nvml.dll to extract the measurements. Library's default location: C:\Program files\NVIDIA Corporation\NVSMI\

The GPUBox software does not support 32-bit operating systems.

Hardware recommendations

For the better utilization of the GPUBox software, we recommend the following system setups:

OServer
Processor 64-bit CPU with at least 4 cores
Depends on the number of Clients.
Memory Minimum 2GB
Network At least 1Gb/s TCP/IP network. In the GPUBox infrastructure, OServer is the least network-consuming component.
GPUServer
Processor 64-bit CPU with at least 4 cores
Communication with Clients and copying the data between the system memory means that the GPU memory can be CPU-intensive.
The TCP/IP network adapter without the offload engine (TCP Offload Engine - TOE) can consume a high volume of the CPU cycles during transmission.
Memory Minimum 16GB, but it is a good practice to keep it twice as large as the total amount of the GPU’s memory. For example, GPUServer providing four GPUs installed with 3GB of GPU memory for each should have at least 24GB of RAM, which is 2 x (4 x 3GB).
Network At least 10Gb/s, network adapter with an offload engine or InfiniBand.
We recommend InfiniBand communication between GPUServer and Client as it offers up to 100Gb/s throughput with very low-latency and low CPU-intensive operations.
Client and GPUBox Starter
Processor 64-bit CPU with at least 4 cores (virtual or physical)
Memory Minimum 2GB
Network At least 10Gb/s, network adapter with an offload engine or InfiniBand.
When Client is installed on a virtual system, network adapters (Ethernet and/or InfiniBand) should be installed via PCI-passthrough or SR-IOV technology.

For the entire installation process and configuration, we advise the use of full absolute paths for directories and files.

Installation

In order to start, download installation package: gpubox-install-<version>.exe

GPUBox requires access to internet and ports 6555 and 53 to be opened.

Please be aware that web browser may block the files download and installation materials may need further assistance.

Windows, via smart screen, may also ask you if run the application. Click More info to show additional information and then click Run anyway button to continue installation.

GPUBox installer requires to stop OServer, GPUServer and GPUBox Starter if they are already running.

During the installation process, it is worth to pay attention on step with components selection.

Installer will install all GPUBox components i.e. OServer, GPUServer and GPUBox Client only InfiniBand component is optional.
In most cases you will leave Base Components always selected.
InfiniBand Support is a component responsible for communication over native InfiniBand protocol. OFED for Windows is required to install this component.

If you do not have OFED for Windows installed in your system do not select InfiniBand Support otherwise system will be notifying you about missing libraries.
Verify if the OFED for Windows is installed in Control Panel or issue command ibstat. If the command is not found, likely, the required InfiniBand drivers are not installed in your system.

After installation completion, we highly recommend to run GPUBox Starter and configure GPUBox's components each time.

Setup Wizard

Wizard starts:

  • after each installation of GPUBox,
  • when GPUBox Starter detects default GPUServer configuration file.
  • Setup Wizard is always available from Tools option in main menu.

    GPUBox has three main components:

  • OServer
  • GPUServer
  • GPUBox Client
  • Wizard will help you to configure all components or just some of them:

  • start OServer, start GPUServer and configure client
  • connect to already running OServer, start GPUServer and configure client
  • If you wish to use client and connect only to GPUBox infrastructure you can quit Wizard and follow the instructions from Login to GPUBox Infrastructure

    Be aware that antivirus or firewall program can prevent from:

  • starting
  • connecting
  • or configuring
  • entire or some parts of the GPUBox software. It is highly recommend to disable antivirus and firewall at the time of installation process.

    You have a choice to connect to already running OServer or start a new instance in your local computer.

    If you only wish to connect to already running OServer you can skip this step and go to Find OServer step.

    Click Start OServer to start OServer.

    After a few seconds OServer will be ready to receive connections.

    In next step you will connect to remote running or to local running OServer.

    To connect to OServer you will require to enter a full HTTP or HTTPS address with port number.
    By default OServer's interface is HTTP only and it is bound to all available IP interfaces. Default port is 8081.

    You have two options to enter OServer's address:

  • automatically by clicking button Find OServer
  • or type manually in format http://<oserver_ip>:8081
  • OServer's discovery mechanism based on UDP broadcast protocol. This type of communication and also multicast protocol can be disabled. Please verify this with your administrator or service provider.
    Discovery protocol works only with local network and the same subnet.

    You can use any available IP interface to communicate with OServer however we highly recommend to use the same IP interface as for GPUServer bindings.

    After successfully connected to OServer, indicator will turn green.

    In the next step select IP interface with the best performance possible.

    Use loopback only when you are interested in using GPUBox on your local computer.

    We recommend to use faster network than 1Gb/s.

    If you want to work with remote desktop, only protocols like VNC are compatible with CUDA driver.
    RDP (Remote Desktop Protocol) does not use GPU, in such a case GPUServer will display message:

    GBSC-SC-95A Cannot initialize CUDA environment: 100

    If all GPUs from your system are visible you can be sure that GPUServer is initialized successfully.

    At this point two main components of GPUBox should be fully initialize.

    Within the very first start of OServer, it creates the security database with a single, already enabled superuser with UserID gpubox, username GPUBox Administrator and password gpubox. The database is newly created, this way, each time the path to the security database is changed in the oserver_security_plugin configuration parameters or if the indicated file is deleted.

    Important part of installation process is coping DLL library nvcuda.dll into directory where you have CUDA-enabled software.
    For example, if you copy library into C:\Program Files\Blender Foundation\Blender, Blender will be able to use GPUs from GPUBox infrastructure.

    Clicking the copy nvcuda button will copy nvcuda.dll library into clipboard and open Windows Explorer then you have to simply paste (ctrl+v) the library into the desire directory.

    The very last step you have to take in order to use GPUBox infrastructure is to allocate GPU(s) i.e. assign GPU to your CUDA-enabled program.
    For more information visit Allocate and drop GPU.

    Interface overview

    1
    OServer panel
    2
    Login panel
    3
    Allocated GPUs
    4
    Free GPUs
    5
    Servers panel
    6
    Components status

  • Red
    - server is not running
  • Orange
    - server is starting or stopping
  • Green
    - server is up and running
  • For client, only red and green colors are valid:

  • Red
    - client is logged out
  • Green
    - client is logged in
  • 7
    Allocation panel
    8
    Logs

    OServer panel

    1
    Link to GPUBox Web Console
    2
    RESTful OServer's address
    3
    Status shows if entered address is valid and connect to OServer:
  • connected
    - address is valid
  • not connected
    - address is invalid
  • 4
    Open discovery dialog and find OServer automatically

    OServer's discovery mechanism based on UDP broadcast protocol. This type of communication and also multicast protocol can be disable. Please verify this with you administrator or service provider.

    5
    Refresh everything except servers' status

    Login panel

    Panel shows if user is logged into GPUBox infrastructure.

    1
    When user is logged in, it shows OServer's address. Click it to open GPUBox Web Console.
    2
    Shows the user name and has link to user's details or link to login panel

    Allocated GPUs

    It shows list of currently allocated GPUs.

    Panel's column correspond to command gpubox listgpubox list.

    A
    Local identification number of the GPU that is used for basic user operations on allocations. ID is generated automatically during the GPU allocation and may indicate the order in which the GPUs were added.
    B
    Name of the allocated GPU device.
    C
    PCI address format <domain>:<bus>:<slot>.<function> where:
    <domain> is 4 characters long and consists of a hexadecimal value of the two last octets of the GPUServer’s IP address. For example: Domain in a PCI address for the GPUServer with IP address 203.0.113.12 will be 710C because 113 hexadecimally is 0x71 and 12 is 0xC.
    <bus>:<slot>.<function> is taken from the lspci command on the GPUServer.
    D
    The IP address of the Client that was used to allocate the GPU. The user can use the command gpubox here to reallocate all of the GPUs to his current IP address.
    E
    The status of a particular GPU. It can be EXCLUSIVE, SHARED, OFF, STOPPED or BROKEN.
    F
    Timestamp indicating when the GPU was allocated. Format: YYYY-MM-DD hh:mm:ss

    Free GPUs

    It shows list of currently available GPUs, ready to allocate.

    Panel's column correspond to command gpubox freegpubox free.

    A
    Device ID indicating the type of GPU that can be allocated.
    B
    Name of the free GPU device.
    C
    GPU memory expressed in gigabytes.
    D
    Number of free GPUs of a particular type.
    1
    When the checkbox is selected, GPU will be allocated to user exclusively otherwise device is shared between other users.

    Servers panel

    Panel shows the current status of two main components of GPUBox infrastructure that are running on your local computer.

    O
    OServer
    G
    GPUServer
    1
    Status of severs:
  • running
    - server is running
  • starting
    - server is starting
  • stopping
    - server is stopping
  • not running
    - server is not running
  • 2
    Process ID in system.
    3
    Stop or start server. Button is disabled When status is in orange state.
    4
    Click to open server's configuration file in notepad.
    5
    Select checkbox if you want to start server while user logs into the system.

    Allocation panel

    1
    Number of GPUs to be allocated
    2
    Allocate GPUs selected in free GPUs panel.
    3
    Release GPUs selected in allocated GPUs panel.

    Logs

    There are two panels to show logs, respectively for OServer and GPUServer.

    1
    Realod log from file.
    2
    Path to log file.
    3
    Every a few seconds, if checkbox is selected, application will automatically reload log from file.
    4
    Servers are console based program and by default console window is hidden. Click Show console to unhide.

    Closing the console window automatically also terminate the server.

    User details

    Dialog shows details of currently logged in user.

    Dialog's detail correspond to command gpubox whoamigpubox whoami.

    Login dialog

    For freshly initialized security database, the default user and password are 'gpubox'.
    For more information refer to Security

    Options

    We highly recommend to not changing the values otherwise you have to know what are you doing!

    1
    Paths to servers' configuration files.
  • C:\ProgramData\GPUBox\OServer\etc\oserver-win.txt

    OServer's default

  • C:\ProgramData\GPUBox\GPUServer\etc\gpuserver-win.txt

    GPUServer's default

  • 2
    Discovery protocol.
  • Port where GPUBox Starter is listening to broadcast messages from OServers, 17400

    Default

  • Discovery timeout. GPUBox Starter waits the amount of milliseconds on each available IP interface to find OServer, 6000

    Default

  • 3
    It generates default OServer's configuration file. The current is backed up in folder C:\ProgramData\GPUBox\OServer\log. Format of the backup is oserver-bck-YYYY-MM-DD HH-MM-SS.log.
    4
    Sometimes OServer's database can be corrupted. The button removes the database. It is disable when OServer is running. Entire database will be rebuild within next OServer's start however all users' allocation will be lost.
    5
    Select checkbox to start GPUBox Starter when user log in.
    6
    Set all values to default.

    Main Menu

    File

    Exit - exit application.

    Tools

    Wizard... - opens Setup Wizard.
    Options... - options dialog.
    Copy nvcuda.dll - copy nvcuda.dll into clipboard and open Windows Explorer.
    Console commands

    It extends the capability of managing GPUBox infrastructure via terminal commands:

  • gpubox - opens terminal window and issue gpubox ? command.
  • agpubox - opens terminal window and issue agpubox ? command.
  • Kill process In some cases OServer or GPUServer may not be closed properly, for example clicking the Stop button from servers panel then the process needs to be killed.

    Terminating process immediately has impact on running tasks. Server does not have a chance to finish all task in appropriate manner.
    Basically all tasks are resistant to this type of corruption except OServer's database.
    If OServer's database is corrupted, OServer will log this event with database type message OServer messages or similar. The only way to solve the issue is deleting database's file and restart OServer, refer to Options

  • OServer - kill OServer process
  • GPUServer - kill GPUServer process
  • Help

    View help - opens this manual in web browser manual. This options requires Internet access.
    About GPUBox - shows details information about GPUBox.

    Tray Menu

    Closing GPUBox Starter window turns application into tray icon.

    Right click on the icon shows menu:

    Open GPUBox - restores application window.
    GPUBox Web Console - opens GPUBox Web Console in web browser.
    OServer - stops or starts OServer.
    GPUServer - stops or starts GPUServer.
    Exit - exits application.

    Connect Client to GPUBox infrastructure

    In order to use GPUs from GPUBox infrastructure you have to:

  • login to
  • allocate GPU
  • and then
  • copy nvcuda.dll to CUDA-enabled software's directory.

  • For information about extended, command line interface visit:

  • Using GPUBox
  • Managing GPUBox

    Login to and Logout from GPUBox Infrastructure

    Login

    Type address manually or use the find button to connect to OServer.

    After entering OServer's address, when status indicator turns into
    connected
    click Login link.

    Type username and password in Login dialog

    When you are logged in successfully, all panels be refreshed automatically.

    Logout

    Click the link with username to open user details dialog.

    Click Logout button.

    Allocate and drop GPU

    Allocate GPU

    1
    Enter the number of required GPUs.
    2
    Select checkbox if you want to use the GPU exclusively otherwise device will be shared among other users of GPUBox infrastructure.
    3
    Select device from 'Available GPUs'.
    4
    Click button << to allocate GPU(s).
    5
    In less than a second a new GPU(s) will be displayed in table of 'Allocated GPUs'.

    Drop GPU

    1
    Select single or multiple GPUs from table of 'Allocated GPUs'.
    2
    Click the drop button >>.
    3
    In less than a second device(s) will return to the pool of available GPUs. The number of free GPU will increase respectively.