|
| View previous topic :: View next topic |
| Author |
Message |
potsaid
Joined: 20 Oct 2008 Posts: 25
|
Posted: Fri May 04, 2012 11:10 am Post subject: Sporadic Memory Error when calling VMP.Append(Packet) |
|
|
We have been trying to track down a runtime error we have been getting that looks to be associated with VMP calls to append the Velo Packets. The error occurs sporadically and we get a message of “System.AccessViolationException… Attempted to read or write protected memory”.
The error occurs if we repeatedly run streaming sessions back to back and occurs mid stream. Sometimes the error occurs after only 2-3 acquisitions, but sometimes we can run >400 acquisitions before the error occurs. We wonder if there may be a problem with the packetizer in the FPGA, but are curious if you see anything wrong or have any suggestions.
We see the error in the example Stream.exe and also our custom software modeled after Stream.exe, but using the VMP as described for the “in flight processing” in the following link:
http://www.innovative-dsp.com/forum/viewtopic.php?t=1928
Here are things we have noticed based on the following code:
void IIModuleInterface::HandleDataAvailable(Innovative::VitaPacketStreamDataEvent & Event)
{
System::Diagnostics::Debug::WriteLine("In Handle Data Available" );
if (Stopped)
return;
Innovative::VeloBuffer Packet;
// Extract the packet from the Incoming Queue in store in VeloBuffer
Event.Sender->Recv(Packet);
System::Diagnostics::Debug::WriteLine("About to Append Data --- Packet SizeInInts = " + Packet.SizeInInts() );
// Add Packet to VMP
VMP.Append(Packet);
System::Diagnostics::Debug::WriteLine("About to Parse Data" );
VMP.Parse();
Innovative::IntegerDG Packet_DG(Packet);
TallyBlock(Packet_DG.size()*sizeof(int));
FWordCount += Packet.SizeInInts();
// Per block triggering actions
Trig.AtBlockProcess(static_cast<unsigned int>(Packet_DG.size()*sizeof(int)));
}
1. If we comment out the two lines of VMP.Append(Packet); and VMP.Parse();, then our code runs for hours and many thousands of streaming processes without error.
2. But, if we then uncomment out the two lines of VMP.Append(Packet); and VMP.Parse();, then we consistently get the error after anywhere from 3 - 400 or so data acquisitions when VMP.Append(Packet) is called. Associated with this error is an unusually small Velo Packet size being received right before the call to VMP.Append(). Look to the last few lines of the attached log files to see examples.
3. Using FPGA logic version 102, Variant 0, Revision 4, subrevision 4, we would typically see the protected memory error after 3-120 acquisitions. Upgrading to FPGA logic version 105, Variant 0, Revision 5, subrevision 0, we typically see the error after 400 or so acquistions, so the frequency of the error does seem to have improved with the new logic.
4. Using logic version 102, we only saw the “Attempted to read or write protected memory” error. Using firmware version 105, we see both the same memory error, but we also see an occasional error of “Input FIFO overrun”. During several days of pretty rigorous testing, we never saw the “Input FIFO overrun” error with version 102. Note that we are using the X6-400M fixed 400MSPS internal clock, so this is unexpected and not related to clocking issues.
I was once able to generate the invalid memory error using logic 102 and Stream.exe as compiled by Innovative and supplied in the software installation download after pressing the stream button 30 times, while waiting about 2 seconds between when the acquisition stopped and I pressed the button (with the figure of the little running man). Note that you have to have the Vita Merge Parsing enabled to get the error.
More recently we have added a simple functionality to Form1.h in Stream.exe that (1) detects when the streaming session has stopped via Form1::AfterStreamStop(), (2) waits about 2 seconds using a nonCPU intensive wait, then (3) virtually presses the button of the little running man by calling StreamStartBtn_Click. This is only a few lines of code added to Form1.h and creates the effect of repeatedly clicking the Start Streaming button. Using this version of Stream.exe, we consistently see the invalid memory error if Vita Merge Parsing is enabled. The error might occur after 356, 419, or similar acquisitions.
We have tried different VeloPacket sizes and MergePacket sizes and a few different computers with visual studio 2008, but have not been able to eliminate the error.
We are curious if you see the error on your computers if you continuously repeat (loop) acquisitions for a few hours with the Merge parsing enabled or if you see anything wrong or have any advice.
Thanks again,
-Ben
P.S. Just read Jim's message on the intended hardware method of gating data flow. The approach above repeatedly starts and stops the stream. But, it is still useful for us in applications where we don't want to add external hardware, counters, and logic. We also see the error mid-stream, so think that there still might be a problem somewhere in the communications and appreciate suggestions you may have. |
|
| Back to top |
|
 |
csmith Site Admin
Joined: 13 Apr 2006 Posts: 202
|
Posted: Fri May 04, 2012 4:13 pm Post subject: |
|
|
The Merge Parser is going to expect that every packet, even these little runt packets, contain proper header information and data. If this is violated, then trouble could ensue.
I would suggest instead of VMP'ing these little guys, use LogWithHeader() with a new logger object in a separate files and direct any small packet (less than 4096) to it. Then send that file here and see if it looks like it has a Vita header in the front. _________________ Chris Smith
Innovative Integration
csmith@innovative-dsp.com |
|
| Back to top |
|
 |
potsaid
Joined: 20 Oct 2008 Posts: 25
|
Posted: Tue May 08, 2012 7:50 pm Post subject: |
|
|
Chris,
Nice concept. We tried what you said and got some interesting results. In order to help recognize the data, we injected a 150kHz sine wave into the A/D input.
1. We first directed *all* packets to the file, just to be sure we were using the logger correctly. You will see that indeed there is 150kHz sine wave data with header information visible as spikes when viewed with BinView.exe - these are files named "LogAllPackets_X.bin" in the attached zip file and there were no runtime errors generated for these files.
2. Next, we directed only runt packets with size < 4096 to the file. The surprising thing here is that we would get the memory access violation on the next packet received, even if the next packet was of normal size. Further, the runt packets include 150kHz sine data if viewed in BinView.exe. These are experiment 1 and experiment 2 in the zip file.
3. Finally, we introduced simple logic such that if a runt packet with size < 4096 was found, then the runt packet and all data that followed was directed to the file. This is experiment 3 in the zip file.
You will see in "runtpackets_experiment3_780.bin" that the 150kHz sine data is continuous across the runt packet-to-normal packet boundary.
Of 894 repeated acquisitions, numbers 185, 780, and 876 all generated runt packets. Data for these experiments is include in the zip.
This shows that it is not sufficient to just reject runt packets because the runt packet contains true data and also because it does not fix the memory access violation the occurs on the next normal sized packet.
If I understand correctly, it looks like on occasion the packets may not be being formed correctly in the FPGA. We have the system running and can gather additional data as you suggest.
Thanks again for looking at this.
- Ben |
|
| Back to top |
|
 |
csmith Site Admin
Joined: 13 Apr 2006 Posts: 202
|
Posted: Wed May 09, 2012 1:34 pm Post subject: |
|
|
The problem is a bad Vita header that creates an initial small Velo header - the runt packet. The disconnect between the size this initial packet and the real size causes an alignment problem that persists until the end of the run.
A workaround is to just stop when you see a runt packet and restart the stream.
Note the 'size' of the runt packet seems to be based on data from the previous run. So if your amplitude exceeds the real size, your runts could be larger than the normal first packet.
Could you send us the source of your test program that repeatedly starts and stops so that we can try and debug the logic with it? We only need the stream engine parts that start/stop/take data. _________________ Chris Smith
Innovative Integration
csmith@innovative-dsp.com |
|
| Back to top |
|
 |
potsaid
Joined: 20 Oct 2008 Posts: 25
|
Posted: Thu May 10, 2012 8:19 am Post subject: |
|
|
Chris,
This is great. Our code is based as closely as possible on the example Stream application. So, it might be best to use Stream as the shared code between us. We made a small change to stream to virtually press the run button after the end of an acquisition. This creates the effect of standing there and pressing the run button with the mouse after waiting a few seconds after the software has reported the acquisition is complete.
The two files we changed are attached. I just verified by looking at dates on the files that these are the only two files we modified.
An important note:
With Logic version 102, we only saw the “Attempted to read or write protected memory” error.
With Logic version 105, we see both the same memory error, but we also see an occasional error of “Input FIFO overrun”.
The Input FIFO overrun actually occurs more frequently than the memory access error with Logic 105. This is unexpected because we do not have any electrical connectors attached to the card during these tests. We never saw the Input FIFO overrun with Logic 102 during several days of testing.
When the FIFO overrun occurs, just press the stop logging button, then the running man button to resume testing. Eventually, you should encounter the memory access error.
Expect to see the memory access violation somewhere around 100-500 runs. This can take some time with the several second pause we have in the code now between the end of the previous acquisition and the start of the next. Feel free to speed this up.
Let me know if you indeed see the memory access error using the attached modifications or if there is anything else I can do to help.
- Ben |
|
| Back to top |
|
 |
csmith Site Admin
Joined: 13 Apr 2006 Posts: 202
|
Posted: Mon May 14, 2012 10:57 am Post subject: |
|
|
Could you run your test using the new logic Shant Moses sent you in order to see if this issue at least was improved by the changes he made? _________________ Chris Smith
Innovative Integration
csmith@innovative-dsp.com |
|
| Back to top |
|
 |
potsaid
Joined: 20 Oct 2008 Posts: 25
|
Posted: Wed May 16, 2012 11:46 am Post subject: |
|
|
We rant Shant's FPGA logic and compared to logic 105 from the web download. It looks like the download from the web is more stable than Shant's new logic version (5 vs 32 errors per 2000 start/stop acquisitions).
It is important to note that in our previous implementation, we detected runt packets by looking for a size < 4096. While running these experiments, we noticed that we would also get invalid memory access for intermediate sizes between the standard 4096 and 81920 sizes associated with good packets. For example, see the data for Logic 105 run1 wich produced packet sizes as follows, which I believe would be problematic:
Experiment 40: 65364
Experiment 227: 65360
Experiment 574: 65364
Experiment 1445: 65368
In this revised experiment, we only accept packets of size 4096 and 81920, while directing all other sizes to disk using LogWithHeader() once a suspect packet is detected. Doing so allows our code to run without the hard crash. Do you think think it is right to assume that all packets that do not precisely equal 4096 and 81920 are formed wrong?
Here is a summary of the experiments. We did 2 runs of 2000 start/stop acquisitions with standard logic 105 and Shant's new logic. It looks like the standard logic works better with about 5 errors per 2000, while the updated logic has about 31 errors per 2000 experiments. The log files and the .bin files are attached.
Logic 105 Run 1
4 of 2000 failed
Logic 105 Run 2
6 of 2000 failed
Shant Logic Run 1
33 of 2000 failed
Shant Logic Run 2
29 of 2000 failed
A quick question - are you also seeing similar errors with your hardware at Innovative? |
|
| Back to top |
|
 |
csmith Site Admin
Joined: 13 Apr 2006 Posts: 202
|
Posted: Wed May 16, 2012 12:04 pm Post subject: |
|
|
The runt packet sizes seem to be aliased data words. If the number is negative, it would look much like what you see when treated as unsigned int.
In the exam I did, the "real" VITA packet that the runt contained was of size 4096. _________________ Chris Smith
Innovative Integration
csmith@innovative-dsp.com |
|
| Back to top |
|
 |
csmith Site Admin
Joined: 13 Apr 2006 Posts: 202
|
Posted: Wed May 16, 2012 1:13 pm Post subject: |
|
|
I examined a 'large' runt packet run and did verify that the fail condition is exactly the same - the bad data written into the initial VITA packet makes the first packet too large to be aligned rather than too small. _________________ Chris Smith
Innovative Integration
csmith@innovative-dsp.com |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
© Copyright 2006-2012 Innovative Integration
Powered by phpBB © 2001, 2002 phpBB Group
Based on iCGstation v1.0 Template By Ray © 2003, 2004 iOptional
|
|
|