frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Alert: Beginning Tuesday, June 25th, we will be freezing this site and migrating the content and forums to our new home at https://forums.intel.com/s/topic/0TO0P000000PqZDWA0/intel-neural-compute-sticks. Check it out now!

Slow inference performance on a custom tensorflow model built with quantization-aware training.

Hi there -

I am trying to evaluate both the NCSV2 and the Google Accelerator on a custom tensorflow model. I am using the CIFAR dataset.
Both devices are connected to a freshly installed Raspberry Pi 3b.

I used quantization-aware training with an Tensorflow Estimator to build the eval graph. My estimator fn is detailed in the attached txt.
I saved the eval model of the Estimator with classifier.experimental_export_all_saved_models().
Then I froze the graph as detailed in the attached txt.

Interestingly, I was successful using TOCO to build a TFLite model. This TFlite model was then converted successfully with the online Google converter for the edge-tpu.
When I run this quantized model on the Google accelerator I have an inference time of 0.2s.

I used the model optimizer to convert the graph in FP_16 (DEBUG log attached) with the following cmd:

python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model frozen.pb --input model_input/input:0 --input_shape [1,32,32,3] --output softmax_tensor --data_type FP16 --log_level DEBUG

When I run this model on my Rpi3, the net.forward take 3s.
I tried first to run the same (with FP_32 outputs) on my Mac OS desktop. It runs in 0.15s.
The object detection example on both devices runs with no perf issue.

Three questions:
1. Does it mean that the inference is not happening on the NCS but on the RPi3 CPU? How do I deep dive into what's really happening?
2. Am I missing something in the model optimizer? Do I have to use a config.file or a pipeline config? (I don't see any unsupported operation in the attached log)
3. Is my methodology correct to assess the inference time?

Any advice is very welcome!

Thank you!

Comments

  • 1 Comment sorted by Votes Date Added
  • Hi @nicmaq

    The inference time does not seem correct. I would like to reproduce the issue, would it be possible to send me your code and model to test?
    By the way, it doesn't look like there is an attachment on this post.

    Regards,
    Jesus

Sign In or Register to comment.