Explain Codes LogoExplain Codes Logo

How to Get Current Available GPUs in TensorFlow?

python
gpu-memory
tensorflow-configuration
distributed-systems
Nikita BarsukovbyNikita Barsukov·Feb 18, 2025
TLDR

Discover available GPUs in TensorFlow by invoking the function tf.config.list_physical_devices('GPU'). This function provides a list of GPU devices accessible to TensorFlow. You can use this handy function to get an instantaneous count of available GPUs:

import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Running this snippet will display the number of GPUs TensorFlow has detected. By applying the len() function, you can effortlessly count the total number of accessible GPUs.

Avoiding a GPU Memory Allocation Fiasco

When it comes to GPUs in TensorFlow, you have to ensure judicious memory utilization. By default, TensorFlow tries to allocate all available GPU memory, leading to a potential clash with other apps or subsequent model runs. Here's how to prevent that:

  • Configure memory growth for GPUs. By setting allow_growth to True, TensorFlow allocates only the required amount of GPU memory and increases as required.

In TensorFlow 1.x, configure the memory growth within a session like so:

import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.InteractiveSession(config=config)

For versions TensorFlow 2.0 and up, apply the following configuration:

import tensorflow as tf gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: try: # Apparently, memory growth needs to be consistent across GPUs, else it throws a tantrum. for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) logical_gpus = tf.config.experimental.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_devices), "Logical GPUs") except RuntimeError as e: # Throwing a runtime error because no one likes a late memory growth declarer! print(e)

Ways around Invisible GPUs

Sometimes, due to certain constraints, TensorFlow may not be able to access GPUs directly. When this happens, invoke CUDA_VISIBLE_DEVICES environment variable to tweak GPU visibility. For example, the command CUDA_VISIBLE_DEVICES=0 permits TensorFlow to detect only the first GPU.

For occasions where you can't bypass TensorFlow, good news is NVIDIA's nvidia-smi comes to the rescue! You can deploy it with Python's inbuilt subprocess module for a wonderful collaboration:

import subprocess # The '-L' attribute lists all GPUs, UUID being unique to each device gpu_info = subprocess.check_output(["nvidia-smi", "-L"]) print(f"Detected GPUs:\n{gpu_info}") # Counting UUIDs to get the number of GPUs. Clever, right? gpu_count = gpu_info.decode('utf-8').count('UUID') print(f"Number of GPUs: {gpu_count}")

But hold your horses! This approach isn't without its share of effects, so tread carefully.

Cracking the Code behind Device Details

A mere list of accessible GPUs may not suffice. For times when you need comprehensive device specifics, TensorFlow offers the DeviceAttributes protocol buffer which contains historical, performance, and configuration data about devices:

from tensorflow.python.client import device_lib def get_devices(): return device_lib.list_local_devices() devices = get_devices() # Filtering the list to include only GPUs gpus = [d for d in devices if d.device_type == 'GPU'] for gpu in gpus: print(f"Name: {gpu.name}, Type: {gpu.device_type}, Memory: {gpu.memory_limit / (1024 ** 3)} GB")

When working in a distributed ecosystem, you would want to query GPU information across machines/processes. For such applications, you could consider TensorFlow's impressive suite of distribution strategies or synchronise device querying manually across your cluster.