Getting Pycuda to work with Mpi4py on 2 gpus

I am trying to run a pycuda program across two gpus. I have read a great post by Talonmies explaining how you do it with the threading library, the post also mentioned this is possible with mpi4py.

When I run mpi4py with pycuda, program gives the error: self.ctx = driver.Device(gpuid).max_context pycuda._driver.logicError: cuDeviceGet failed: not initialized

Perhaps this is due to my attempt to initalize two of the gpu devices simutanously. Does anyone have a very short example of how we can get 2 gpus working with mpi4py?

Category:python Views:1 Time:2011-07-15

Related post

  • PyCUDA; how to distribute workload to multiple devices dynamically 2011-04-27

    PyCUDA, for all its faults, usually has very good examples provided with it / downloadable from the wiki. But I couldn't find anything in the examples or in the documentation (or a cursory google search) demonstrating the PyCUDA way of dyanmically al

  • MPI4Py Scatter sendbuf Argument Type? 2009-05-03

    I'm having trouble with the Scatter function in the MPI4Py Python module. My assumption is that I should be able to pass it a single list for the sendbuffer. However, I'm getting a consistent error message when I do that, or indeed add the other two

  • Problem with MPICH2 & mpi4py Installation 2010-10-07

    I'm on Windows XP2 32-bit machine. I'm trying to install MPICH2 & mpi4py. I've downloaded & installed MPICH2-1.2.1p1 I've downloaded & mpi4py When I run python setup.py install in mpi4pi\ directory. I get running install running build run

  • processing an image using CUDA implementation, python (pycuda) or C++? 2011-02-11

    I am in a project to process an image using CUDA. The project is simply an addition or subtraction of the image. May I ask your professional opinion, which is best and what would be the advantages and disadvantages of those two? I appreciate everyone

  • How to profile PyCuda code in Linux? 2011-03-15

    I have a simple (tested) pycuda app and am trying to profile it. I've tried NVidia's Compute Visual Profiler, which runs the program 11 times, then emits this error: NV_Warning: Ignoring the invalid profiler config option: fb0_subp0_read_sectors Erro

  • mpi4py with processes and threads 2011-04-07

    Hi This is a pretty specific question, so I hope StackOverflow is meant for all programming languages and not just javascript/html I am writing a multi program in MPICH2 (popular message passing interface). My program is written in Python so I use th

  • PyCuda: Can import module, then I can't- (PyCUDA Samples) 2011-04-08

    Example code: import pycuda.autoinit import pycuda.driver as drv import numpy from pycuda.compiler import SourceModule mod = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b) { const int i = threadIdx.x; dest[i] = a[i] *

  • PyCUDA: C/C++ includes? 2011-04-12

    Something that isn't really mentioned anywhere (at least that I can see) is what library functions are exposed to inline CUDA kernels. Specifically I'm doing small / stupid matrix multiplications that don't deserve to be individually offloaded to the

  • PyCUDA / Copperhead doesn't appear to recognise 64-bit machines 2011-04-14

    Two problems I'm having with copperhead at the minute, which I suspect are related. Running a sample file (samples/axpy.py) generated lots of little warnings, but this one stood out. g++ -pthread -fno-strict-aliasing -g -O2 -g -fwrapv -O2 -Wall -fPIC

  • PyCUDA kernel timing errors 2011-04-18

    Simple enough start=cuda.Event() func(args,block=blockdims) cuda.memcpy_dtoh(d,h) end=cuda.Event() dur=start.time_till(end) print dur But I'm getting this error File "gpu.py", line 161, in gpu_test dur=start.time_till(end) pycuda._driver.LogicError:

  • PyCUDA GPUArray slice-based operations 2011-04-18

    The PyCUDA documentation is a bit light on examples for those of us in the 'Non-Guru' class, but I'm wondering about the operations available for array operations on gpuarrays, ie. if I wanted to gpuarray this loop; m=np.random.random((K,N,N)) a=np.z

  • PyCUDA Passing variable by value to kernel 2011-04-19

    Should be simple enough; I literally want to send an int to the a SourceModule kernel declaration, where the C function __global__......(int value,.....) with the value being declared and called... value = 256 ... ... func(value,...) But I'm getting

  • PyCUDA Memory Addressing: Memory offset? 2011-04-19

    I've got a large chunk of generated data (A[i,j,k]) on the device, but I only need one 'slice' of A[i,:,:], and in regular CUDA this could be easily accomplished with some pointer arithmetic. Can the same thing be done within pycuda? i.e cuda.memcpy_

  • PyCUDA: Querying Device Status (Memory specifically) 2011-04-20

    PyCUDA's documentation mentions Driver Interface calls in passing, but I'm a bit think and can't see how to get information such as 'SHARED_SIZE_BYTES' out of my code. Can anyone point me to any examples of querying the device in this way? Is it poss

  • What does pycuda.debug actually do? 2011-04-25

    As part of a larger project, I've come across a strangely consistent bug that I can't get my head around, but is an archetypical 'black box' bug; when running with cuda-gdb python -m pycuda.debug prog.py -args, it runs fine, but slow. If i drop pycud

  • PyCUDA/CUDA: Causes of non-deterministic launch failures? 2011-04-29

    Anyone following CUDA will probably have seen a few of my queries regarding a project I'm involved in, but for those who haven't I'll summarize. (Sorry for the long question in advance) Three Kernels, One Generates a data set based on some input vari

  • Python Multiprocessing with PyCUDA 2011-05-05

    I've got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back; What I've set up is a GPU class, with functions that perform operations on the GPU (strange that). These operations

  • PyCUDA+Threading = Invalid Handles on kernel invocations 2011-05-07

    I'll try and make this clear; I've got two classes; GPU(Object), for general access to GPU functionality, and multifunc(threading.Thread) for a particular function I'm trying to multi-device-ify. GPU contains most of the 'first time' processing neede

  • Where can I find a "Cuda/PyCuda for Dummies" tutorial 2011-05-20

    I want to learn how to do GPU programming over the summer, and I'm open to all languages/libraries but most interested in PyCuda. I am not a strong programmer; I can bang out most programs I want in Java, and understand the rudiments of C, but when I

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.165 (s). 11 q(s)