frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

LattePanda Alpha + OpenVINO + "CPU (Core m3) vs NCS1 vs NCS2", Performance comparison

Hello everyone.
The "UNet" model did not work in NCSDK, but it worked in OpenVINO.
"UNet" is Semantic Segmentation model.
https://github.com/PINTO0309/TensorflowLite-UNet/raw/master/model/semanticsegmentation_frozen_person_32.pb

Interestingly, The CPU had better performance than "Neural Compute Stick" and "Neural Compute Stick 2".
For the moment, I do not feel utility value for NCS2.
img
◆Japanese Article
Introducing Ubuntu 16.04 + OpenVINO to Latte Panda Alpha 864 (without OS included) and enjoying Semantic Segmentation with Neural Compute Stick and Neural Compute Stick 2

«1

Comments

  • 34 Comments sorted by Votes Date Added
  • edited November 2018 Vote Up1Vote Down

    @Gemini91

    I made free time so I measured it.
    Unfortunately, NCS2 got the following error and it was impossible to measure.

    E: [xLink] [         0] dispatcherEventReceive:308  dispatcherEventReceive() Read failed -4 | event 0x7fa9fb7fdef0 USB_READ_REL_RESP
    E: [xLink] [         0] eventReader:254 eventReader stopped
    E: [xLink] [         0] dispatcherWaitEventComplete:694 waiting is timeout, sending reset remote event
    E: [ncAPI] [         0] ncFifoReadElem:2853 Packet reading is failed.
    E: [ncAPI] [         0] ncFifoDestroy:2672  Failed to write to fifo before deleting it!
    

    Again, the CPU is overwhelmingly faster.
    All measurement units are milliseconds.
    Densenet

  • edited February 13 Vote Up1Vote Down

    @dhoa

    However, I saw you said that the implementation is not beautiful. Is that mean the performance reduce a lot when we optimize it with Openvino ?

    No.
    The meaning of "not beautiful" is below.
    1. I have replaced OpenVINO's noncompliant layer with a pure Tensorflow call.
    2. I call Tensorflow a total of two times, pre-processing and post-processing.
    3. Since the OpenVINO tutorial "Offloading Computations to TensorFlow" does not function properly, some layers infer the CPU without using OpenVINO.
    https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer#offloading-computations-tensorflow

    I just want to process all the layers with the OpenVINO function.
    And, I just wanted to say that the program is cumbersome and not concise.

    How about others models like Unet ?

    Because UNet works only with the OpenVINO function, it is very simple.

    I saw you said it is about 1FPS but do you have some update about it ?

    No. Since Intel does not publish "caffemodel" and "solver", I want to customize it, but I can not customize it.

  • @dhoa

    It is the difference of the performance of the model.
    accuracy:DeepLab < Mask RCNN
    speed:DeepLab > Mask RCNN

    Japanese articles DeepLab vs Mask RCNN
    https://jyuko49.hatenablog.com/entry/2018/11/17/145904

    However, the performance improvement of CPU inference by OpenVINO is spectacular. (with MKL-DNN)
    It seems to be a mechanism that improves performance as CPU core number and thread number increase.
    I think that performance comparison between different models has little meaning.
    Accuracy and speed are always a trade-off.

  • @sri

    It just seems that its better to not use NCS at all.

    If NCS, you are right.

    I dont seem to see any performance improvement by running NCS with CPU. Is that something you have observed as well?

    Yes. That's right.
    MYRIAD2(NCS) is too low in performance.
    Instead, CPU optimization with OpenVINO will demonstrate a very good performance.

  • I bought four NCS2, I will verify how useful Multiple NCS Devices below is at a later date.

    Multiple NCS Devices
    https://software.intel.com/en-us/articles/transitioning-from-intel-movidius-neural-compute-sdk-to-openvino-toolkit

  • Did you compare the performance of NCS on NCSDK and OpenVINO? I just ran a customized Densenet on NCS@NCSDK, NCS@OpenVINO, NCS@OpenVINO. Only Conv, Concat, Relu and BatchNorm layers exist in this network, and the results are so incredibly different that I'm wondering if I did something wrong... NCS@NCSDK takes 0.45s for one inference while NCS@OpenVINO takes 0.65s, and NCS2@OpenVINO takes 0.001s !?

  • edited November 2018 Vote Up0Vote Down

    @Gemini91

    Did you compare the performance of NCS on NCSDK and OpenVINO?

    No. My "UNet" model did not work in NCSDK.
    Therefore, unfortunately it can not be verified with NCSDK.

    NCS, NCS2 = FP16
    CPU = FP32

    I used the conversion script below.

    For FP16 (For NCS/NCS2)

    $ sudo python3 mo_tf.py \
    --input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
    --output_dir 10_lrmodels/UNet/FP16 \
    --input input \
    --output output/BiasAdd \
    --data_type FP16 \
    --batch 1
    

    For FP32 (For CPU)

    $ sudo python3 mo_tf.py \
    --input_model 01_pbmodels/UNet/semanticsegmentation_frozen_person_32.pb \
    --output_dir 10_lrmodels/UNet/FP32 \
    --input input \
    --output output/BiasAdd \
    --data_type FP32 \
    --batch 1
    

    Because your model and my model type are different, I can not simply compare performance.
    If your model can be provided, I may be able to verify.

    Same issue
    https://ncsforum.movidius.com/discussion/1320/slow-fps-on-neural-compute-stick-2

  • I put my model file in dropbox. Do you mind running a test with it on your hardware? The input node is named "input" and has a shape of (1, 32, 840,3). The output node is named "output" and has a shape of (1,1,794745).
    https://www.dropbox.com/s/snbgwzj9p2xkwpm/densenet_frozen.pb?dl=0

  • @Gemini91

    OK.
    However, Japan is already late night so I do not have any working hours. Please wait for a few days.

  • edited November 2018 Vote Up0Vote Down

    My latest test results are pretty much consistent with yours. I think OpenVINO is doing some very tricky optimization for CPU Arch inside, so it benefits their CPU the most and the performance boost kinda depends on network structure too.

    In fact, the demo programs inside OpenVINO should be an easy an fair test for NCS2. I ran demo_squeezenet_download_convert_run.sh on 1 super old CPU, 1 modern CPU, NCS and NCS2, and the results are as follows,

    Hardware Time Consumption Command
    Intel® Celeron® Processor J1900 42.52ms demo_squeezenet_download_convert_run.sh -d CPU
    Intel(R) Xeon(R) CPU E5-1603 v4 3.61ms demo_squeezenet_download_convert_run.sh -d CPU
    NCS 28.67ms demo_squeezenet_download_convert_run.sh -d MYRIAD
    NCS2 9.34ms demo_squeezenet_download_convert_run.sh -d MYRIAD

    It seems that a modern CPU with OpenVINO is indeed much faster than NCS2

  • @Gemini91

    Thank you for providing detailed information.
    It was very helpful.
    It seems to be meaningful for the combination of low performance CPU and NCS2.

  • @PINTO Thanks for provide info about NCS and NCS2 performance.But Power Consumption is also important. btw, may I ask NCS2 can achieve MTCNN with OpenVINO?

  • @curry_best

    310mA - 370mA with USB2.0 port.
    Unfortunately, my measuring device does not support USB 3.0.
    voltage

    may I ask NCS2 can achieve MTCNN with OpenVINO?

    Since OpenVINO accepts only input of fixed scale and fixed batch, I think that it will not move without trying.
    Probably, unless you devise something, the standard repository program will not work.
    https://github.com/ipazc/mtcnn.git
    https://github.com/AITTSMD/MTCNN-Tensorflow.git
    https://github.com/CongWeilin/mtcnn-caffe.git

    If possible, I would like you to try it.

  • @PINTO thanks, I would like to try it if I get a NCS2.

  • @curry_best

    Here's a demo sample of face landmark detection.
    1.face detection
    2.gender
    3.head pose
    4.emotions
    5.facial landmarks
    https://software.intel.com/en-us/articles/OpenVINO-InferEngine#inpage-nav-7-12
    face

  • Hello.
    I implemented real-time semantic segmentation with OpenVINO and CPU only (LattePanda Alpha).
    0.9 FPS - 1.0 FPS

    OpenVINO + ADAS(Semantic Segmentaion) + Python3.5
    https://github.com/PINTO0309/OpenVINO-ADAS.git

  • So skip the stick spend more money on an i7?

  • @chicagobob123

    By the way, my CPU is Core m3, so I think it's a bit cheap.
    Based on the results of the survey, if you boost the speed to the maximum without GPU, I think you should use i7 or higher.
    However, I do not recommend it at all because buying NCS2 and i7 costs high.
    And, power consumption also increases according to CPU performance.
    I think that it is good to purchase NCS2 after waiting until OpenVINO is compatible with ARM.

  • Did you get an up board when it was sale for $40 from intel? I got two since they are way more powerful than a pi. Going to see how that works with the ncs.

  • @chicagobob123

    Did you get an up board when it was sale for $40 from intel?

    $40 !? Is not it a mistake of $170?
    How affordable!!
    It seems I missed the opportunity...

    Going to see how that works with the ncs.

    If possible, please tell us the result.

    Is CPU "Intel Atom"?

  • Yes its an Atom processor with similar connections to Pi.

    4GB DDR3L-1600
    Intel® Atom™ x5-Z8350
    Sadly they are 89 dollars again
    https://click.intel.com/aaeon-up-board.html

    Bob

  • @chicagobob123

    Thank you for providing the information. Bob.
    I am very interested in how much performance it get.

    OpenVINO uses "MKL-DNN" to make parallel inference by multithreading inside the CPU and seems to realize high speed.
    This seems to be a mechanism to increase the performance depending on the number of CPU cores, the performance of each core, and the total number of threads.

  • The boards shipped but now they wont get here until Tuesday. So by the end of the week should have it.
    I posted here and suddenly got my download link and grabbed as much as I could while at work. Will try to install the linux version on my old laptop

  • I tried implementing semantic segmentation with "OpenVINO + DeeplabV3 + Core m3".
    I gained about 4-5 FPS performance.
    However, the implementation is not beautiful. . .
    https://github.com/PINTO0309/OpenVINO-DeeplabV3.git

    I referred to the following article.
    https://medium.com/@oleksandrsavsunenko/optimizing-neural-networks-for-production-with-intels-openvino-a7ee3a6883d

  • Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    CPU Only mode.

  • Hi, Thank you all for the very detailed information. I am doing a project about image segmentation and try first with mask RCNN. However the inference time is quite long, I get about 0.5 FPS.

    Today I came across the post of @PINTO using Deeplab and was very impressed about its speed. I get now 10 times faster (5FPS). However, I saw you said that the implementation is not beautiful. Is that mean the performance reduce a lot when we optimize it with Openvino ? How about others models like Unet ? I saw you said it is about 1FPS but do you have some update about it ?

    Thank you in advance,

  • Thanks for your response @PINTO . Anyway in this moment Deeplab is the best choice for speed right ? It can get this performance compare to others model like Unet or Mask RCNN is because of the light architecture or because of Openvino function ? From my case with mask RCNN, I find that Openvino doesn't improve much the speed.

    Regards,
    Hoa

  • @PINTO thank you so much for your explanation. I have just one more question. How about at this stage using other framework than Tensorflow with OpenVino. Actually I am more familiar with Pytorch (Fastai) and see that OpenVino not supported yet Pytorch and to use it, one need to convert to ONNX. I have tried one time with SSD and can not run the model optimizer. What is your idea about using Pytorch and Openvino ? Sorry if it is not related much for this thread but I'm quite new in this and have idea from expert is really valuable for me.

  • edited February 13 Vote Up0Vote Down

    @dhoa

    Actually I am more familiar with Pytorch (Fastai) and see that OpenVino not supported yet Pytorch and to use it, one need to convert to ONNX.
    What is your idea about using Pytorch and Openvino ?

    You do not have to stick to ONNX.
    You can try converting to Tensorflow or Caffe with reference to the following.
    Since the type of layer supported by OpenVINO varies depending on the conversion target framework, I recommend that you try multiple types of conversion.
    PyTorch -> Tensorflow -> OpenVINO
    PyTorch -> Caffe -> OpenVINO
    https://github.com/PINTO0309/Keras-OneClassAnomalyDetection#13-model-convert

Sign In or Register to comment.