[SOLVED] Issues with New Gateway Shields


#1

We have been getting reports of issues with the Gateway Shields. The basic issue is that once the Photon is programmed with the proper code, plugged into the gateway shield and powered on, the nrf51 does not respond. The normal startup procedure is:

  1. Photon and nrf51 power on. Photon is solid white, then flashing green, then breathing cyan. The nrf51 flashes the LED on D7 once.
  2. Once the Photon is online and breathing cyan, the LED on D7 of the gateway shield should start to blink rapidly for 10-15 seconds, indicating it is connecting
  3. The LED on D7 changes to a slow blink, once every 2 seconds. This indicates the gateway is connected.

The issue is that step 2 is never reached, the LED on D7 of the gateway shield never blinks after the first one at startup.

If you have this issue, please report it in this thread. Also, if you happen to read this thread and don’t see the issue, please report that also. We would like to get an idea of how many people are affected.

We are working as fast as we can on a resolution. Part of the problem is that this seems to be random, we cannot reproduce it at all on our side. Every single gateway shield is tested for cloud connectivity before it goes out the door, and we are just having issues reliably recreating the problem to fix it.

Any updates will follow on this thread. Thanks for the help!


Call for Gateway Shield Testers
#2

#3

@Skabe @tangesazen @AndyW Alright, well I have some good news, I was able to recreate the issue, though intermittently. I grabbed an old gateway shield I had lying around and paired it with the new Photon. I do not see the issue every time, far from it actually, I see it maybe 1 time in 10 reboots. Still, I was able to measure the system when it happened which makes me think I know what happened.

So, I have one more hopeful solution to this problem. If you have seen this issue repeatedly, could you please add a pull-up resistor (10k) between the Photons A2 pin and 3V3? The diagram is below showing the pins:

If this is what I think it is, this should solve the issue. I’ll save the gory details until we find out if this works. As before, this is something I can also fix in the firmware (if it is indeed the problem) and so if someone does run across the problem, the resistor could be temporary while I get everyone updated. Based on experience, though, I don’t think this will be a major problem for everyone. I have also seen other gateways in the logs getting connected and claimed by people, so the issue clearly doesn’t affect everyone.

Anyway, if someone can try that and report back, I would really appreciate it. Thanks so much again for your help!


#4

I am double checking(fw version, etc…) but on first bluff adding the 10K pin does not change any behavior.


#5

Yes, can you make sure the Photon firmware is 0.4.9? I haven’t tested yet with 0.5.0


#6
  1. I am unable to rollback to 0.4.9
  2. For the first time “EVER” I began receiving a stream of debug strings in my serial monitor. Everything appeared to be working normally. L2 was flashing!!! I am not sure what I did to produce this result.
  3. In all the excitement I did something and now the GW is not working again. Back to square one.
    I have retraced all my steps from memory repeatedly but I can not produce the same functional result.

#7

The 10K pull-up does not help my hard failing photon work, and the photon that “works” still exhibits the same bad behaviour of spending all it’s time spinning at the start of spi_retreive().

@Skabe, the failure mode is intermittent with one of my photons, it will fail hard for a few minutes, then decide to start working for a while - it may not be anything you changed.

My hard failing photon just shows “STARTING!” on the debug log, and never anything else.

My intermittent photon sometimes shows output such as:

2958:DEBUG: STARTING!
2960:DEBUG: In SPI Receive
2963:DEBUG: Handshake complete
2969:DEBUG: Receiving SPI data of size 1

and hangs hard at that point, so there are at least two failure modes.


#8

@AndyW, I have only seen the hard failing photon on my end. Just as you had described in the other post.(one blink, only Debug “STARTING!” etc…)

And I think I solved it. I will let you know shortly.


#9

@AndyW Did you also try the pull-down on RX of the Photon as indicated in the other thread? The fact that the debug log says “data size of 1” makes me think you are actually hitting that race condition I mentioned earlier. I think the pull-down on RX would fix that.

There are definitely some startup issues, when the two MCU’s start some pins are floating that shouldn’t be and that is causing at least some of these problems. These are simple fixes in firmware, but until we get you online, it will be hard to upgrade the firmware on your boards (though not impossible, it can be loses through bootloader setup just like the DK, you just need a USB to Serial adapter).


#10

Its Stupid but I have it.(Mostly) The GW is stable. When it first started working, Bluz devices all around me were automatically connecting to the Particle Cloud unaided by iPhones. Yet I was unable to claim the GW.

What I Do know is:
-The GW can not be powered through its own USB.
-The 10K resistor between A2 and 3v3 had no effect
-You do have to Pull Down on the SLAVE_PTS_PIN in order for the Photon not to flash SOS before you can complete the Following procedure:

  • Flash the GatewayFW, with the modified pinMode(SLAVE_PTS_PIN, INPUT_PULLDOWN);, while the photon is not connected to the GW, unplug from power source and plug back in.(Not connected to the GW)

  • With the Photon powered through USB and connected to the Particle Cloud plug it into the GW.

  • Done.
    P.S. Is the Photon suppose to short out when you press on the P0.


#11

@eric: I just tried your suggestion.

I added the 10K resistor between 3.3V and A2.
I still have a 10K resistor between RX and GND.

I’m sorry but it still does not get past L2 flashing once at startup.
I tried all four of my Photons and got the same behavior.


#12

I had removed the pull down, when that didn’t work. Did not understand these are cumulative, will retest.


#13

@eric, I have no pull-up or down resistors installed. There are times the GW photon will go to SOS on bootup but it will clear itself after a few reset cycles. Most of the time, holding reset on the GW for a second or two reboots the devices with the photon going to SOS. In all cases, the GW comes up working as expected. I have the latest firmware on the Photon and on the GW nRF.


#14

Mixed results ( but I’d have to say overall positive.)

  1. with both pull up & pull down, the photon that previously hard failed now works, and added bonus - works better than the other photon, because the handshake with the nrf51 works as intended (e.g. no spin in spi_retreive()). The DK now happily rides through a gateway reboot or in/out range event. The gateway photon also stays reliably connected to the cloud. This is all goodness.

  2. the previously “working” (albeit with caveats) gateway photon behaves as before: spinning in spi_retreive(), dropping from cloud etc.


#15

Interesting. Are you using the same bluz gateway, but with two different Photons? One works as expected and one has the “spin in spi_retreive” issue?

It is interesting that the issues follow the Photon. I would expect the resistors added would have fixed many of the issues on the gateway side, so I am not sure why the issue would be different with different Photons.

I can definitely see where the added resistors would fix two issues. The pull down on RX would prevent a cascade of problems that could happen where the nrf51 got ahead of itself. That was strictly my fault in FW, I should have configured the nrf51 to use an internal pulldown on the pin. I also should have waited for the buffers to be peopery set in the config function, this is how the nrf51 gets ahead of itself.

The pull up on A2 prevents an unintentional SPI transaction. When the Photon boots this pin is floating and it seemed that sometimes it could trigger a SPI Transfer Complete message in the nrf51. I should have configured the select pin in the nrf51 with a pull-up, clearly also my fault, but in all honesty I did check that before shipping and simply read it wrong. We use the SPI Slave driver directly from Nordic and the default is to not use a pull-up, which seems kind of silly to me. Either way, I read it wrong and should have swapped it.

Seems there could be a few issues that can pop up, but mom at least some are getting cleared up.


#16

Yes - single production GW board (I have a beta board too, but have no idea if that will work with current firmware, or how two GWs would fight over DKs…) - after the first photon didn’t work in it, I tried another - which behaved differently.

Fixing this in nrf51 firmware would be good, you may be able to fix it in the photon firmware too, but I think that will require a different system mode or other tricks, because right now setup() doesn’t run until after the cloud connection is active, which I suspect is where a great deal of the timing variability is coming from.

I’m not at all surprised that two photons behave differently. Unless the system design (both hw & sw) fails safe, small timing differences frequently cause the kind of problems we’re tracking.


#17

Yes, I made a change last night to move to SEMI_AUTOMATIC mode and then call Particle.connect() right at the end of setup(). This would let setup() run as fast as possible and configure the pins properly quickly. The issue is, it is still too slow, the RGB LED starts out white (I assume that is the bootloader?) and stays on for maybe half a second or so. The nrf51 would blink and move on well before setup() would run, and I was still able to reproduce the issues. I didn’t check it in yet because it didn’t seem to solve any of the issues and I didn’t want to throw another variable into the mix yet.

The spi_retrieve issue only seems to pop up on your system with one Photon, I haven’t seen that anywhere else. It could certainly be a timing issue or some other yet-unknown startup inconsistency. It would be good to see if anyone else can reproduce that problem. Based on the trace you sent of the pins, something definitely seemed wrong there.

It is good to know that not everyone has seen these issues. The worst one so far seems to be @tangesazen as we can’t recover that system at all. Everyone else has been able to get to at least a mostly working state. I am still worried that somehow the boards may have gotten damaged in shipping, I think swapping hardware on that one is the best bet.

Progress. Will keep plugging away and I am sure we will get this all straightened out quickly.


#18

Now I have a setup working the way it is supposed to, I can get logic analyser traces comparing the operation. I’ll work on soldering headers down and capturing good/bad traces today, then see if I can code up some pull requests that I am able to verify.

You are right - even the fastest mode to setup() in the particle ecosystem will still put you in a race condition with the nrf51 - so the design needs to fail-safe. If you can achieve that with nrf51 firmware, great - just let me know if you have code you’d like me to try without the additional resistors.


#19

Great, thanks for the offer!

I just checked in the changes I made to the nrf51 firmware, this sets the resistors internally and you shouldn’t need them soldered to the board anymore. You can get latest on the gateway_develop branch here: https://github.com/bluzDK/bluzDK-firmware/tree/gateway_develop

You can compile locally from the /modules folder with:

make PLATFORM=bluz-gw APP=tinker clean all

You need to use gcc 4.9, preferably 4.9.3 as that is what we all test with. I can also get you a pre-compiled binary if you would like, just let me know.

You can get the firmware over to the gateway shield in one of three ways:

  1. OTA, the simplest. Just get the gateway shield online and use ‘particle flash gateway_name system-part1.bin’. The LED will blink like crazy for a while, then the board will reboot and connect back normally.
  2. Bootloader Setup. Follow the exact same steps in the [docs for the DK] (http://docs.bluz.io/tutorials/bootloader/#updating-firmware), but you will need a 3.3V USB to Serial adapter and need to solder headers onto the gateway shield TX/RX/GND pins to make it work.
  3. Adalink and the STLink v2 programmer hooked to the 10-pin SWD header on the board. If you go this route, you will need to make sure you compile the bootloader as well and copy that over. And you will need to copy over the s120 softdevice file (as well as the user app), so 4 files in total need to be flashed as part of the command. It would look like this from the top level of the git repository:

adalink nrf51822 --programmer stlink --program-hex platform/MCU/NRF51/NRF51_StdPeriph_Driver/inc/softdevice/s120/hex/s120_softdevice.hex --program-hex build/target/bootloader/platform-269-lto/bootloader.hex --program-hex build/target/system-part1/platform-269-m/system-part1.hex --program-hex build/target/user-part/platform-269-m/tinker.hex

Let me know if you have any questions or how it goes with the new firmware.


#20

Best if you can provide me a precompiled binary - removes another whole layer of variables, that I add zero value to. Just point me at a .bin and I can try it (most likely using the OTA/CLI method, only reverting to jtag if things go pear-shaped.)