frame

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Alert: Beginning Tuesday, June 25th, we will be freezing this site and migrating the content and forums to our new home at https://forums.intel.com. Check it out now!

NCS temperature issue

Hello,

I am running caffe model with NCS v2 on Raspberry Pi 3 Model B. I am running a stress test on it, the below is the device status after I run the inference 28370 times(around 1 hour) without any break:

n: 28370
fps: 8.292699107705577
RO_THERMAL_STATS: 61.4762
RO_THERMAL_THROTTLING_LEVEL: 0
RO_CURRENT_MEMORY_USED: 24094840
RO_MEMORY_SIZE : 522073264
RO_MAX_FIFO_NUM : 20
RO_ALLOCATED_FIFO_NUM : 2
RO_MAX_GRAPH_NUM : 10
RO_ALLOCATED_GRAPH_NUM : 1
RO_CLASS_LIMIT : 1
RO_HW_VERSION : 0

My question is that:
1. is the temperature above( about 60 degree) safe enough for the device? suppose it will be run 7x24.
2. I never experience the throttling, i.e. RO_THERMAL_THROTTLING_LEVEL = 1 or 2, I wonder when will the throttling activate and if I can manually activate it?
3. If I want to manually keep the device's temperature under a certain level like 50 degree, should I run the inference for 10 seconds and let it sleep for 10 seconds, or should I run every inference with 0.1 seconds break, or is there any better solution?

Comments

  • 16 Comments sorted by Votes Date Added
  • @Tome_at_Intel @cfdcfc
    I am running caffe model with NCS v2 in ai-core, and I want to get the limitation temperature of the NCS, but when the NCS temperature reachs 75 degrees celsius, NCS will run slowly ,at last , it returns an error " E[ 0] distpatcherEventReceive:200 distpatcherEventRecevice() Read failed -4". Can u share us with your code to show "NCS temperature". Thanks a lot!

  • Hi @Tome_at_Intel
    we are using the AI Core board from AAEON for our edge computing device as well. It has the same heating issue which happy_sky already posted in this thread.
    It overheats with 15 inferences per second to 80°C where thermal throttling kicks in and takes a toll on the processing thread (Using multiple sticks is not an option as the AI Core X wont be available in the desired time at our company. This means, we can't use the clustering approach).

    Qs:
    1. Do you have any recommendations on a passive cooling system which would help prevent abrupt handleDestruction due to overheating?
    2. Can you please explain how to have more control on the thermal throttling? Are the thermal throttling levels set automatically at 70°C and 80°C? (coz I could see longer inference times only around 80°C with default settings).
    3. Whats the critical temperature at which the device is at risk? (The UP Board has a thermal shut down at 105°C for example).

    Let me know if you need any info regarding our test.

  • I have the same issue with sendPingMessage:164 Failed send ping message: X_LINK_ERROR.Any solution?

  • @cfdcfc 60 degrees Celsius should still be fine. I don't think you can manually activate the thermal throttling. I think you can have it check every other frame or letting it rest for a a number of seconds should work also.

  • @happy_sky You can get the temperature using the device.get_option() function. And you can pass the RO_THERMAL_STATS option from the device options list to get the device temperature.

    example:

    thermal_stats = device.get_option(mvnc.DeviceOption.RO_THERMAL_STATS)
    print("NCS device temps: ", thermal_stats)

    This will return an 25 element array with the most recently detected device temperatures in 1 second increments.

  • @Tome_at_Intel
    If Movidius NCS keeps running, its temperature will be higher and higher. What is its maximum working temperature?
    In its official document(https://software.intel.com/en-us/neural-compute-stick), it is written that Operating temperature: 0° C to 40° C.

  • @chenzhi 0°C to 40°C is the ambient operating temperature for the NCS device. For the device itself, there is a thermal throttling feature built into the NCSDK that kicks in and sleeps the NCS when it reaches 70°C.

  • @Tome_at_Intel I encountered the similar issue while I using NCS2 (Myraid X) device with OpenVINO 2018 R5.0.1 (sending frame to TinyYOLO as soon as possible in async mode). May I know the temperature limit of NCS2? Is it possible to return a meaningful error code rather than USB link disconnection while NCSv2 overheating?

    https://ncsforum.movidius.com/discussion/1557/ncs2-rpi-running-multiple-models-as-a-flask-app#latest

  • Hi @lilohuang

    The maximum temperature for the NCS is about 70°C, with the ideal temperature being between 0°C to 40°C. With OpenVINO, you are not able to get the device temperature (that feature is only available with the NCSDK), so I don't think you would be able to output an error message when the device heated to a certain temperature.

    Sincerely,
    Sahira

  • @Sahira_at_Intel Could you do me a favor to check the below error message? I got below error message from my UP-BOARD computer (Win10 1709) with NCS2 (through additional power supply USB3.0 Y cable) and OpenVINO 2018 R5.0.1. This issue can be easily reproduced with TinyYOLO object detection Python example (async mode, ~30 fps) after running the object detection 1~2 hours continuously. I suspect it's related to NCS2 overheating issue. What do you think? Thank you!

    WinUsb_ReadPipe: System err 2
    [35mE: [xLink] [ 0] handleIncomingEvent:240 handleIncomingEvent() Read failed -2
    [0m
    WinUsb_WritePipe: System err 22
    [33mW: [xLink] [ 0] dispatcherEventReceive:324
    WinUsb_WritePipe failed with error:=22
    [35mE: [xLink] [ 0] dispatcherEventSend:889 Write failed header -2 | event USB_WRITE_REQ
    [0m
    Failed to handle incoming event[0m
    WinUsb_SetPipePolicy: System err 22
    [35mE: [xLink] [ 0] dispatcherEventReceive:308 dispatcherEventReceive() Read failed -2 | event 00000010C2B2FEA0 USB_WRITE_REQ
    [0m
    [35mE: [xLink] [ 0] eventReader:256 eventReader stopped[0m
    [35mE: [ncAPI] [ 0] ncGraphQueueInference:3538 Can't send trigger request[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m
    [35mE: [watchdog] [ 0] sendPingMessage:164 Failed send ping message: X_LINK_ERROR[0m

  • @Jesus_at_Intel  
    @Luis_at_Intel
    @Sahira_at_Intel
    @Tome_at_Intel

    Heavy sigh. I feel frustrated with using Intel NCS2 especially due to the unstable status which cannot be used on my 24x7 workload (15~30 fps, TinyYOLOv3). My room temperature is under 30°C. I even don’t know what the error message mean. I've bought two more NCS2 for failover, but I don't think it's the best solution with more NCS2. Could you escalate the issue to RD core team? Hopefully, they are able to reproduce the error what people encountered through torture test (long running test). Thank you!

  • Hi @lilohuang
    Thank you for your patience!

    I understand your frustration. I am working on this issue and will get back to you with the results. To help me reproduce this issue, can you please link the Object Detection example (include any code modifications you might have made).

    If temperature is the issue here, implementing a small sleep between several inferences could help with keeping the temperature low. Here is a program that has been able to run for several hours at AI conferences - it rests for a bit after doing several inferences on a video. Perhaps you can modify an object detection program to do the same.

    In the meantime however, you should open a thread in the Computer Vision forum here where the NCS2 + OpenVINO community is!

    Best Regards,
    Sahira

  • Hi @lilohuang ,

    I see that your issue was resolved on your other thread here. I am posting your findings here just in case other users come across this problem and can see what fixed it. Thanks for sharing your findings!

    Solution:
    From: @lilohuang
    Just an update, the X_LINK_ERROR doesn't occur any more after running 48 hours torture test. I think using a powerful AC powered USB hub is the right solution. Thanks.



    Regards,
    @Luis_at_Intel

This discussion has been closed.