frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Inference time increased even though FLOPS reduced after reducing network width and parameters

Hi all, I have pruned and removed some least significant filters from my neural network. I've done a profiling for the network before and after pruning. Flops are reduced but time spent for those layers are increased.
@Tome_at_Intel, Is there a hardware concepts like CUDA interface in your SDK. Do we have to use layer sizes in power of two. Like warp size , threads, blocks in CUDA? If some documentation available for maximum perf gain, it will be beneficial.
Please find the attached image.https://drive.google.com/open?id=1SvCrziaF_wHtY-CTboWZWqUpEUVdcsoR

image.png 79.8K

Comments

  • 2 Comments sorted by Votes Date Added
  • @chinthysl Thanks for reporting this. Can you share how you pruned your model to reduce the MFLOPs? Additionally, can you provide both the original and pruned models so that I may reproduce/debug on my end? Thanks.

    At the moment, we don't have a tuning performance guide for the NCS and NCSDK.

  • @Tome_at_Intel Please find the .caffemodel and .prototxt files I used to generate Movidius graph files here https://drive.google.com/open?id=1VDDg8IAtttieVhqzMfOvCLuSeDe4bRDn . And also accuracy of this Movidius graph inference drops 50% down even through caffemodel inference drops only 10%. Seems like Movidius compiler does some additional reduction in the pruned network also. If you can analyze and give us some tips to create network architectures(ex:layer sizes) which supports the compiler well, that would be beneficial.

Sign In or Register to comment.