frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Inference performance

Hi, I successfully ran ncs-fullcheck example and used it to inference several pictures. The performance of Alexnet is around 200ms and GoogLeNet is around 550ms. However, when I ran the profiling from tool kit (make example), it shows both AlexNet and GoogLeNet inference is around 90ms. There seem to be a gap between profile data and real inference time. Does anyone know where is this gap comes from (transfer image to the stick and retrieve result out, i.e.), and how do I get the performance the same as profiled?

Another question is the inference result seems different from caffe running on the same caffemodel (using cpp classifier), how do I get same result as using caffe?
Caffe: AlexNet
0.3094 - "n02124075 Egyptian cat"
0.1761 - "n02123159 tiger cat"
0.1221 - "n02123045 tabby, tabby cat"
0.1132 - "n02119022 red fox, Vulpes vulpes"
0.0421 - "n02085620 Chihuahua"

NCS
AlexNet
Egyptian cat (69.19%) tabby, tabby cat (6.59%) grey fox, gray fox, Urocyon cinereoargenteus (5.42%) tiger cat (3.93%) hare (3.52%)

Comments

  • 5 Comments sorted by Votes Date Added
  • Hi akey,

    We found an issue with our "ncapi/tools/convert_models.sh" script. You need to add the argument "-s 12" to mvNCCompile.pyc to enable all the vector engines. Please execute that script to regenerate the graph files and you should see the performance similar to that you were seeing with "make example01"

    Thank You
    Ramana @ Intel

    Before the change

    ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg
    OpenDevice 4 succeeded
    Graph allocated
    radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
    Inference time: 569.302185 ms, total time 575.650308 ms
    radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
    Inference time: 556.881409 ms, total time 562.636079 ms
    Deallocate graph, rc=0
    Device closed, rc=0

    Change

    cd ../tools
    vi convert_models.sh
    ** Add -s 12 to all the compiles

    !/bin/sh

    NCS_TOOLKIT_ROOT='../../bin'
    echo $NCS_TOOLKIT_ROOT
    python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/SqueezeNet/NetworkConfig.prototxt -w ../networks/SqueezeNet/squeezenet_v1.0.caffemodel -o ../networks/SqueezeNet/graph -s 12
    python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/GoogLeNet/NetworkConfig.prototxt -w ../networks/GoogLeNet/bvlc_googlenet.caffemodel -o ../networks/GoogLeNet/graph -s 12
    python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Gender/NetworkConfig.prototxt -w ../networks/Gender/gender_net.caffemodel -o ../networks/Gender/graph -s 12
    python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Age/deploy_age.prototxt -w ../networks/Age/age_net.caffemodel -o ../networks/Age/graph -s 12
    python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/AlexNet/NetworkConfig.prototxt -w ../networks/AlexNet/bvlc_alexnet.caffemodel -o ../networks/AlexNet/graph -s 12

    Execute the script
    ./convert_models.sh

    cd ../c_examples

    After the change

    ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg
    OpenDevice 4 succeeded
    Graph allocated
    radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
    Inference time: 108.950851 ms, total time 115.101073 ms
    radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)
    Inference time: 88.571877 ms, total time 95.765275 ms
    Deallocate graph, rc=0
    Device closed, rc=0

  • Much faster now. Continous inference speed from webcam is about 9.5 FPS for GoogleNet. Thanks!

  • @akey can you tell me how you calculate the FPS for GoogleNet, please ?

  • @ibrahimsoliman in python you can use:

    from timeit import default_timer as timer
    time_start = timer()
    CODE
    time_end = timer()
    print('FPS: %.2f fps' % (1000/(time_end-time_start)))
  • edited September 2017 Vote Up0Vote Down

    One thing I don't get though with NCS speed is why it is not running at full 100 GOPS as advertised. For example, in SqueezeNet example below and all other networks, we can see
    1. MFLOPS estimate is 2x compared to actual op count. Is that because of fp16?
    2. MFLOPS are calculated at ~1/3 speed of 100 GOPS. This ratio varies from 1/4 to 1/2 depending on tensor and convolution type.
    Movidus/Intel guys could you explain this and may be give some advise how to increase NCS efficiency?

    Detailed Per Layer Profile
    Layer Name MFLOPs Bandwidth MB/s time(ms)
    ...
    25 fire9/squeeze1x1 12.845 587.19 0.43
    26 fire9/expand1x1 6.423 150.65 0.37
    27 fire9/expand3x3 57.803 318.67 1.57
    28 conv10 200.704 272.92 4.28
    29 pool10 0.392 722.59 0.52
    30 prob 0.003 10.49 0.18
    Total inference time 26.89

Sign In or Register to comment.