frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Alert: This site has migrating to our new home at https://forums.intel.com/s/topic/0TO0P000000PqZDWA0/intel-neural-compute-sticks. Please visit us there!

Multiple Graphs on Movidius NCS and queuing

Hello all,

I have been working for a while now with Movidius NCS at my Institute (Fraunhofer IIS) for testing, building a know how, and implementing NN for industry on the NCS. I will write here few of our remarks for other people interested in the NCS and trying to work with it.

The performance for the NCS (FLOPS/power) is really impressive and enables the implementation of complicated (big) deep NN on the edge. We were quite happy to see the release of SDK 2 few weeks ago which allowed running multiple graphs on the same stick using the added FIFO feature. Previously, we had to use multiple sticks (extra hardware) or load/unload the different graphs on the same stick for every inference (adds a relative large overhead). We have tested the new SDK and here are the few remarks on the results:

  • You can now load(once) multiple graphs to the same stick and use them for inference. The only constrain I know of, is to not exceed the memory max size of the stick (300 MB). This is a huge improvement over SDK 1, since eliminated the need to load/unload the graphs for every inference when trying to use multiple networks on the same stick. This saves a lot of overhead of loading the graphs.
  • The inference takes place sequentially i.e. When loading 2 images, the inference of the 2nd image starts only after the first image inference is finished.
  • Unfortunately, you cannot load a new image to the same FIFO before the inference of that FIFO/graph is finished. Therefore, you cannot pipeline loading and inference. This would be a cool feature to have since there are already FIFO queues.
  • However, you can load a new image to a FIFO while another graph is being processed(inferred). This can be used to implement pipelining of loading and inference by loading the same network graph multiple times on the stick. This means that, while the first image is being inferred on graph 1, a 2nd image can be loaded to graph 2(same network but loaded twice) and then queued for inference(Sequentially as explained in point 2).

In conclusion, The Movidius NCS is powerful and have so far good figures that allows implementing deep NN at the edge using low power. The new SDK added good features and flexibility. However, There are some features that would be very good to have in the future that would enhance the developer experience and the prototyping procedure.

Note: These are my own observations, so if you have a different observation please feel free to correct me. I will update with more observations after I test more.

Best regards,
Maen Mallah
Researcher at Fraunhofer IIS,
Germany

Comments

  • 4 Comments sorted by Votes Date Added
  • Thanks for such clearly decription first!
    Here's a question. According to the result you get from the test above, could I implement multi-thread inference by load two graph into one device with specified shave num? For example, load a graph compiled with -s 4 and load another graph compiled with flag -s 8.
    It would be great if this thick work~

  • @maenma It should be possible to queue up several frames while the NCS device is processing an inference. You can increase the number of elements a FIFO can hold using the fifo allocate option. See the example code below:

    NUM_ELEMENTS = 2
    input_fifo.allocate(device, input_descs[0], NUM_ELEMENTS)
    
  • @helius, I tried using less SAHVES but with the same results. It is still sequential. Although, it would be good if it was the case since the speed up is not linear with the number of SHAVES. However, I think the central shared memory imposes serious restrictions regarding this feature.

    Best regards,
    Maen Mallah
    Researcher at Fraunhofer IIS,
    Germany

  • @Tome_at_Intel Actually you are right. I am using allocate with fifos. It also has input_fifo_num_elem and output_fifo_num_elem options. These have a default value of 2 and therefore I assumed the behavior I am getting (loading waits inference to finish) is what is intended. However, I tried increasing these options to 3 and the queuing went normal with loading the next 2 images while the inference of the 1st on is taking place. I checked future and noticed that when num_elem is even, I can only load num_elem-1 images in the queue. Loading the num_elem image waits finishing the inference of the first one for some reason. This is not the case for odd num_elem.

    I am not sure if I will get the same behavior using the allocate function. But I thought it is a bug worth reporting.

    Best regards,
    Maen Mallah
    Researcher at Fraunhofer IIS,
    Germany

Sign In or Register to comment.