Possible RAM space overflow


#1

I have an app that exhibits signs of ram corruption.

This started to happen as I was adding functionality and unless I cut it back again it creates a situation where I have to do a factory reset - aarrgghh.

Obviously this could be some bad coding but I don’t think so. It could also be that some kind of buffer overflow is occurring.

So my question is how can an app determine the amount of RAM available? This would help to see if there is likely to be a problem in execution. Ideally, this would be something that can be called in the main loop to detect a problem before any bad side-effect occurs.


#2

Are you using malloc to dynamically allocate RAM? The user app has a specified block that is assigned to it, so there shouldn’t be any corruption between your code and the system firmware.

It could be a buffer overrun or stack overflow. If you could post the code, that would be helpful.


#3

Thx Eric,

Code attached. Most of the work is done by a contributed library which
uses a fair amoput of storage.
Simple examples run fine but the issues started to arise when I added
the BLE stuff,
Specifically, digitalRead() on lines 68/69 brings back the wrong value
so I end up withe the wrong SYSTEM_MODE
I verified the wiring using a modified blink sketch.

No mallocs used.
Can you suggest a way to trap overruns/ overflows so we can see what’s
happening.


#4

I don’t see any attached code? Can you use something like github gist and post it with a link?

C/C++ buffer overruns and other issues can be tricky to debug, especially in embedded environments. The language doesn’t have a way to catch the issue so you would need some kind of third party tool to evaluate things. I am personally not aware of anything and don’t use any tools like this, but others may. If anyone else has a suggestion, that would be good.


#5

Sorry, I forgot that replying with attachment is not workable.

BTW. I tried this again today after cutting it back to the minimum to produce the issue. This time the bluz would not even do a factory reset. I’ll try again later.


#6

What data is being sent down? A possible overflow issue could arise from this line: https://gist.github.com/paultanner/4795851584778ea81a84a9f8aa895fd6#file-buggy-ino-L24

That line doesn’t check that the length coming in is less than the length of the buffer you are trying to fill. That could cause issues, though I can’t say with any certainty that this is the source of your problem. This is just one example of how the problem could manifest.

When you said the values are getting corrupted, what did you mean? Are the values being output on the Serial line incorrect? Or do they look garbled? What is the exact issue?


#7

OK, thx @eric. I fixed that just in case any data was received (gist is updated). In fact it never gets that far. BLE is not yet connected as a problem occurs in setup() where I am trying to set SYSTEM_MODE.

Regardless of the input state on D1, line 34 prints 0. So the digital input value is somehow getting corrupted.

In addition, the device does not return to the proper state (green or cyan) after loading but instead ends up flashing fast magenta. (The app is executing so the load completed.) So in order to proceed I have to do another factory reset.

These two things make me think that my app is trying to use more RAM than is actually available.

(On other embedded platforms I can print out the remaining RAM by knowing the base address, amount available and current stack pointer. Is that not possible?)


#8

I’m not sure that RAM corruption is leading to the issue. Is it possible for you to try a different pin?

I assume you see 0=manual on the aerial line even if you have nothing connected to it?


#9

Yes, I agree that it could be a lot of things.

Anyway, I do see 0=manual regardless of the state off the switch on D1. I have tried other pins. I also use a modified blink sketch to check that D1 and the switch are working.

So I’m looking for something that only happens when I combine BLE with the OPC library, each of which works fine on its own.


#10

So if you comment out only this line, does it work? https://gist.github.com/paultanner/4795851584778ea81a84a9f8aa895fd6#file-buggy-ino-L8

The library simply uses SPI to communicate with the sensor, it seems, so it shouldn’t step on the toes of anything else. The only other possibility I can think of is that the constructor in the line I highlighted above is being called too early. You can try declaring the alpha variable there, but then move the constructor call to the setup() function.

Sorry for the delay, I will try and run this a little later today. I was out of town last week and am trying to catch up after getting home.


#11

If you comment out all references to alpha it works (but does nothing useful). Just removing the constructor throws lots of errors of course. I started the process with 2 apps, one which does the SPI stuff and the other that does the BLE. Both worked. Combining them does not.

Given that alpha is a complex object (which occupies significant RAM) I’m not sure how to declare it other than using the constructor.

Anyway, it would be interesting to see if you get the same error.


#12

I just tried this code with the latest library.

I can recreate the behavior you see with pins D0 and D1 as MODE. However, if I set MODE to D2 or D3, then things start to work as expected.

I think perhaps there is a bug where enabling SPI is also taking over the I2C pins. There are shared resources in the nrf51 between SPI and I2C and I think the underlying driver may be using both. This is probably fixable and I have opened an issue on it here: https://github.com/bluzDK/bluzDK-firmware/issues/34

For now, if you switch MODE to D2, that should fix the issue.


#13

Thx @eric,

I can confirm that works with the stripped-down example. As I build up to an actual solution I hit some other problem. I will try to find out which step causes the problem. This will be a slow process as each step requires a factory reset. Then I’ll post more details.