frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Conversion of frozen TensorFlow Graph to Movidius Graph

I'm getting the following errors when trying to convert a frozen model .pb file using mvNCCompile command in Ubuntu 16.04, Tensorflow 1.7 and the MNCS SDK 2.04

The first error is:
Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>;

The model is a GAN trained on a GPU then saved as a frozen model using a Tensorflow CPU install.

Graph files are here:
https://drive.google.com/drive/folders/1_v-XhhclGhbrrfVGM7Q0JiQYDDmrO_aP?usp=sharing

Full stacktrace attached.

Comments

  • 11 Comments sorted by Votes Date Added
  • I thought it could be due to GPU specific instructions so I trained the model on TF CPU instead and got the same errors.

    CPU graph files:
    https://drive.google.com/drive/folders/1JuOM7yh_9pxaM_kt2N4IFED0lwpH3kL4?usp=sharing

    This is a link to the GAN code:
    https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch

  • Tried again with TF 1.6 CPU and python 2.7 to train the network. Same error as before.

  • I managed to fix the previous errors by adding some code to my freeze_graph.py to strip attributes

        for node in output_graph_def.node:
          if node.op == 'RefSwitch':
            node.op = 'Switch'
            for index in xrange(len(node.input)):
              if 'moving_' in node.input[index]:
                node.input[index] = node.input[index] + '/read'
          elif node.op == 'AssignSub':
            node.op = 'Sub'
            if 'use_locking' in node.attr: del node.attr['use_locking']
          if "dilations" in node.attr: del node.attr["dilations"]
          if "index_type" in node.attr: del node.attr["index_type"]
    

    However I'm now getting:

        if d.decorator_argspec is not None), _inspect.getargspec(target))
        [Error 5] Toolkit Error: Stage Details Not Supported: FusedBatchNorm inputs mean and variance are not defined.  The graph is not created for inference.
    

    I'm assuming that I need to convert the graph for inference using the TF Graph Transform Tool like in this thread. https://ncsforum.movidius.com/discussion/590/indexerror-list-index-out-of-range-trying-to-compile-tf-model

    Though I'm a little unclear how my mean and variance for the FusedBatchNorm should be defined.

  • Reverted to using the official freeze_graph instructions and transform_graph using bazel from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md

    Ubuntu 16.04, python 2.7, TF 1.6 CPU, MNCS SDK 2.04

    Added some code to the standard freeze_graph.py to try to account for the following errors

    1) moving average error:

        ValueError: graph_def is invalid at node 'a2b_generator/Conv/BatchNorm/AssignMovingAvg': Input tensor 'a2b_generator/Conv/BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref.
    

    2) dilation error:

        NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T;
    

    However it seems like the dilation node.attr removal isn't working because mvNCCompile still returns the original error as in post 1. Neither the bazel version of freeze_graph or my simple_freeze_graph works.

    freeze_graph.py modifications:

        #Fix node name errors
        for node in output_graph_def.node:
          if node.op == 'RefSwitch':
            node.op = 'Switch'
            for index in xrange(len(node.input)):
              if 'moving_' in node.input[index]:
                node.input[index] = node.input[index] + '/read'
          elif node.op == 'AssignSub':
            node.op = 'Sub'
            if 'use_locking' in node.attr:
              del node.attr['use_locking']
          if "index_type" in node.attr:
            del node.attr["index_type"]
          if "dilations" in node.attr:
            del node.attr["dilations"]
            print("Removed attr 'dilation'")
    

    My freeze_graph command is:

    bazel-bin/tensorflow/python/tools/freeze_graph \
    --input_graph=graph.pb \
    --input_checkpoint="Epoch_(0)_(100of962).ckpt" \
    --output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh
    

    My transform_graph command is:

    bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    --in_graph=/tmp/frozen_graph.pb \
    --out_graph=/tmp/optimized_graph.pb \
    --inputs='Placeholder' \
    --outputs='a2b_generator/Tanh' \
    --transforms='
      strip_unused_nodes(type=float, shape="1,299,299,3")
      remove_nodes(op=Identity, op=CheckNumerics)
      fold_constants(ignore_errors=true)
      fold_batch_norms'
    

    My mvNCCompile command is:

    mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
    

    All files here: https://drive.google.com/drive/folders/1QKptbWQPqS974bcSfTo_rFYAbuLkhLIt?usp=sharing
    -graph.pb is the GrafDef proto
    -frozen_graph is the output from the freeze_graph
    -optimised_graph is the output from the transform_graph and input to the mvNCCompile command

  • @ginnsandrew At the moment, the NCSDK doesn't support Generative Adversarial Networks.

  • @Tome_at_Intel For all intents and purposes a GAN is just a way of training a convolution network.

    Is the error I'm getting specific to the use of a GAN? During inference the network should just look like a convolution net. As far as I can tell the error I'm getting is due to a mismatch between the TF versions in training and inference. Does the MNC SDK use something other than python 2.7 and TF 1.6?

  • @Tome_at_Intel
    So it turns out the previous error was caused by the use of a different TF version when freezing and transforming my graph file. Using TF 1.6 for the freeze_graph and transform_graph fixed it.

    I now get a new error:

    mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
    mvNCCompile v02.00, Copyright @ Intel Corporation 2017
    
    /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py:871: DeprecationWarning: builtin type EagerTensor has no __module__ attribute
    /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
    /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
    shape: [1, 256, 256, 3]
    res.shape:  (1, 256, 256, 3)
    TensorFlow output shape:  (256, 256, 3)
    Traceback (most recent call last):
      File "/usr/local/bin/mvNCCompile", line 156, in <module>
        create_graph(args.network, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args)
      File "/usr/local/bin/mvNCCompile", line 137, in create_graph
        load_ret = load_network(args, parser, myriad_config)
      File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 95, in load_network
        network.optimize()
      File "/usr/local/bin/ncsdk/Models/Network.py", line 250, in optimize
        self.convert_network_input_to_yxz()
      File "/usr/local/bin/ncsdk/Models/Network.py", line 337, in convert_network_input_to_yxz
        if self.stageslist[0].op in [StageType.fully_connected_layer, StageType.convolution, StageType.max_pooling,
    IndexError: list index out of range
    
    
  • @ginnsandrew Apologies, I meant that we don't have a GAN example for the NCSDK at the moment. For Python, the NCSDK can be used with Python 3.5 also. I am looking into your issue and I'll get back to you as soon as I find something. Thanks.

  • @Tome_at_Intel Thanks. My latest files are here: https://drive.google.com/drive/folders/1U_sw-P-qYZ4ACtso5HqI0thcmbCmOa1H?usp=sharing

    graph.pb - GrafDef proto
    frozen_graph.pb - Output from freeze_graph
    optimised_graph.pb - Output from transform_graph

    Python 2.7.12, TF 1.6, Bazel 0.11.0, MNC SDK 2.04, Ubuntu 16.04.4

    Commands used:

    bazel-bin/tensorflow/python/tools/freeze_graph \
    --input_graph=graph.pb \
    --input_checkpoint="Epoch_(0)_(100of962).ckpt" \
    --output_graph=/tmp/frozen_graph.pb --output_node_names=a2b_generator/Tanh
    
    
    bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    --in_graph=/tmp/frozen_graph.pb \
    --out_graph=/tmp/optimized_graph.pb \
    --inputs='Placeholder' \
    --outputs='a2b_generator/Tanh' \
    --transforms='
      strip_unused_nodes(type=float, shape="1,299,299,3")
      remove_nodes(op=Identity, op=CheckNumerics)
      fold_constants(ignore_errors=true)
      fold_batch_norms'
    
    mvNCCompile /tmp/optimized_graph.pb -in Placeholder -on a2b_generator/Tanh
    
  • @ginnsandrew Just wanted to give you an update. It looks like while parsing the graph file for the model, the NCSDK was not able to find any of the ops. Not sure why this is happening because I know for a fact that we do support some of these ops, however while debugging the model, I tried printing out the nodes from the stageslist list inside of Network.py and it was empty. That's why you receive a list index out of range error. I found this to be strange because when I used a separate script to read and print the nodes from the model, they were all there.

  • @Tome_at_Intel Thanks for looking into it, I really appreciate it. I actually think it was a problem with the way I was saving the graphs. For some reason the standard freeze_graph tools don't seem to work with graphs with BatchNorms in them (which mine have).

    With my new files I actually no longer have the list index out of range error

    My new graph called optimised_graph.pb instead has the error

    mvNCCompile /media/sf_vBox/optimized_graph.pb -in inputA -on a2b_generator/output_image
    /usr/local/bin/ncsdk/Controllers/Parsers/TensorFlowParser/Convolution.py:44: SyntaxWarning: assertion is always true, perhaps remove parentheses?
      assert(False, "Layer type not supported by Convolution: " + obj.type)
    mvNCCompile v02.00, Copyright @ Intel Corporation 2017
    
    /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
    shape: [1, 256, 256, 3]
    Traceback (most recent call last):
      File "/usr/local/bin/mvNCCompile", line 169, in <module>
        create_graph(args.network, args.image, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights, args.explicit_concat, args.ma2480, args.scheduler, args.new_parser, args)
      File "/usr/local/bin/mvNCCompile", line 148, in create_graph
        load_ret = load_network(args, parser, myriad_config)
      File "/usr/local/bin/ncsdk/Controllers/Scheduler.py", line 100, in load_network
        parse_ret = parse_tensor(arguments, myriad_conf)
      File "/usr/local/bin/ncsdk/Controllers/TensorFlowParser.py", line 319, in parse_tensor
        item_shape = output_item.shape.as_list()
      File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 820, in as_list
        raise ValueError("as_list() is not defined on an unknown TensorShape.")
    ValueError: as_list() is not defined on an unknown TensorShape.
    

    The new files can be found here: https://github.com/andrewginns/CycleGAN-Tensorflow-PyTorch/releases/tag/tf1.7-py3.6.4

    Instructions to reproduce what I'm doing here: https://github.com/andrewginns/MSc-Project

Sign In or Register to comment.