Use is confined to Engineers with more than 370 course hours of electronic engineering for theoretical studies. <p> ph +1(785) 841-3089 Email firstname.lastname@example.org
How Memory Works
 How Memory Works
 by Dr Ah Clem
The memory in your computer is called RAM (Random Access Memory). There are two basic types of RAM; DRAM (Dynamic RAM) and SRAM (Static RAM). DRAM is the most common and least expensive computer memory. SRAM works on the principle of a switch that is turned on or off and requires 2 to 4 transistors for each bit (a bit has two states; on or off and is the smallest unit of memory storage and is made of one memory cell). DRAM, on the other hand, is based on a capacitor's ability to hold a charge and requires only one transistor per bit. Because DRAM cells are smaller than SRAM, manufactures can put more memory into the same size thus reducing the cost per bit. Let us explore how the popular DRAM works.
 A Bucket with Holes or How a DRAM Memory Cell Works
To understand a DRAM memory cell, imagine a leaky bucket with some holes on the bottom. A DRAM cell, is nothing more than a capacitor with some amount of charge on it. If a DRAM cell is full of charge, we imagine a bucket full of water. Like all capacitors, the charge tends to leak off, so we think of a leaky bucket with water dripping out over time. Now, let's say that if a bucket if full of water, the bit represents a 1, and if a bucket is empty, a 0.
Because the bucket has holes in it, someone needs to keep adding water to maintain the water level. We call this refreshing the bucket and we send George down to put more water in as needed. When George refreshes a bucket, he needs to know if the bucket has water in it or not, so he looks at the bucket and adds more water if it has some water in it. If George doesn't refresh the bucket often enough, he will not know if it is supposed to contain water and we lose information. We must keep sending George down to refresh at a rate that is faster than the holes can drain the water.
DRAM uses refresh circuitry for the purpose of maintaining the charge and thus information stored, while SRAM does not need refresh because it has better and more expensive buckets without the holes in them. Should the refresh cycle be interrupted for any length of time, we lose the information in memory. Paying George to keep the buckets full of water is a constant cost, but by hiring George we save the higher cost of good buckets.
Sometimes we want to read the data or look and see if the bucket is full of water (a '1') or empty (a '0'). Because we are working with leaky DRAM buckets, George has to make a determination as to what the level of water is in the bucket to determine if it is a 1 or 0. He has to do this, because some of the water will have leaked out since the last refresh cycle. If the bucket is 1/2 or less, then it is a '0', but if the bucket is more than 1/2 full then it is a '1'. George must now stop and measure the level of water with a ruler which takes a little time. This decreases the speed at which memory can run. DRAM is more than a digital device, with only levels of 0 and 1; internally it is an analog device with a level detector. Level detectors requires a small amount of time to determine if we have a '0' or a '1' in a cell. This extra time is one reason SRAM runs faster than DRAM.
 Addressing Memory
DRAM Memory is arranged in a XY grid pattern of rows and columns. First, the row address is sent to the memory chip and latched, then the column address is sent in a similar fashion. This row and column-addressing scheme (called multiplexing) allows a large memory address to use fewer pins. If we want George to go check bucket 1023, we tell George to go to row 10 and once he gets to row 10, we then tell him to go to column 23. Multiplexing slows down access time but it saves on pin count. To access 1meg or 1,048,576 memory locations using a multiplexing scheme where each address line is used twice requires only 10 lines (plus one extra control signal). Without multiplexing, 20 address lines are required.
 The nitty-gritty of how Multiplexing Works
To read a memory cell, we place a row address on the address bus lines (all the address lines together are called an address bus) and activate the Row Access Select (RAS) line and wait for 15ns while the holding circuitry to latches the Row address. Then we place column address on the address bus and activate the Column Access Select (CAS) line. Now, we have to wait for the level checking circuitry to determine if the location contains a 0 or 1. This information or data will appear as a high or low voltage on the data output pin.
When they say that George can find a memory location in 70ns, it is 'sort of' true but somewhat misleading, (but less misleading than the synchronous memory hype@#$!). George can find a memory location in 70ns, but then he must be allowed to rest for 70ns so the total access cycle is 140ns. This allows for the internal circuitry to refresh the memory in all locations addressed by the row address and to do other internal housekeeping.
When you access memory with a read, write, (or just to refresh), all column locations within the row address get refreshed. This is very important to remember, because during refresh, a lot of power is consumed in a very short time. Think of this as poor old George having to fill 10,000 buckets all at once. He needs lots of water at a very high flow rate to fill the buckets in mere nanoseconds.
Power, in a bucket-filling machine, would be the pressure of water times the rate of flow. In electronics, power is the voltage times the current. Peak power demands for the refresh cycle are extreme and a power supply can cause many serious problems if it is insufficient; the voltage can sag and memory locations will not refresh completely.
 There are two basic different refresh methods
 RAS Refresh
RAS refresh is accomplished by sequentially addressing the row addresses and enabling the RAS signal which causes every memory cell in the row to refresh. This also happens reading or writing to a location in a particular row. This has to be repeated in a specific amount of time to insure valid data.
 Hidden or CAS-Before-RAS Refresh
By asserting a CAS signal before the RAS signal, the memory chip uses an internal counter to form a row address. Thus, you get a row refreshed while the memory is on the second 70 nS of its cycle. This can be intermeshed with accessing a memory location so that less time is consumed for refreshing. This option has to be built in to the memory controller.
 How to Find Memory Problems
Not all memory errors are caused by bad memory. When debugging memory errors in a PC system it is necessary to look at the power-supply, noise, parity system, CMOS settings, and motherboards.
 What is a Parity Error?
Parity is a simple mathematical calculation that provides a check to determine if the value of a byte (made of 8 bits) has been corrupted. In PC memory, a 9th bit or parity bit is set according to how many 1's there are in the byte. If you are using 'even parity' and there is an even number of 1's the parity bit is true or set to a '1'. If there is an odd number of '1's in a byte the parity bit is set to a 0.
 Even Parity example
01011101 has a parity of 0 10101011 has a parity of 0 10111111 has a parity of 0 10101000 has a parity of 0 01010101 has a parity of 1 10101010 has a parity of 1 10111110 has a parity of 1 10001000 has a parity of 1
Parity is NOT a foolproof method. A close examination shows, that if we flip two bits in a byte, we will have correct parity with a corrupted value. You can thus have memory problems that can 'lock up' a system without getting a parity error.
When a memory location is written to, a hardware parity generator generates a parity bit that is stored in an extra memory chip. This extra or 9th bit on a SIMM strip stores the parity bit for later use.
When a location is read the parity generator calculates a new value and compares it to the value stored, if they don't match a parity error is generated.
Some SIMM strips (with a pseudo or fake parity bit) contain parity generators on them instead of the extra 9th bit memory chip. This is done to save some cost in a memory strip. The false parity bit can cause timing skews that can cause problems on some systems.
 What does a Parity Error mean?
A parity error tells us we have had an error in memory storage or transfer, but it does NOT necessarily mean that the memory is bad. Bad memory is only one cause of a parity error. When confronted with a parity error we need figure out the root cause.
 Four basic candidates for memory failure
- Power Supply
- Mother Board
- Other Boards
- Memory Module
 Power Supply
Your computer uses a switching power supply to convert 120Vac to the voltages (5Vdc, 12Vdc, -12Vdc and -5Vdc) required by your PC. Switching power supplies are nothing less than remarkable; they can accept from 90-135VAC 50-60 Hertz input and maintain the output voltages within +/-5%. They do this by converting the line AC input to DC. By first changing the AC to DC (rectification), a power-supply can accept huge voltage changes and still maintain a constant output
This DC is then pulsed or chopped to run a transformer for isolation and voltage reduction purposes. On the other side of the transformer, the output is again rectified into DC, filtered, and sent on to power your PC.
While the power supply in a PC has a good power signal (PUP), this only tells the system that the 5 Volts power is available and within +/- 5% as determined at the power-supply. It does not tell the system if the 5 Volts is noisy or has spikes riding on it.
The power coming out from the wall outlet is far from perfect. If your power supply is defective (or just too cheap) or the power line is out of spec, your power-supply may fail to filter out all the noise that can affect your computer system. Bad or glitchy power can affect any part of your PC, but most often, it shows up as a memory parity error. A change in the stored value of one bit in a byte will cause your system to detect a Parity Error. A glitch in power can even corrupt the values stored in DRAM as it goes thorough a refresh cycle. Replacing the memory SIMM when the errors are created by power supply obviously won't do much good.
 Mother Boards
There are many problems that can be caused by the motherboard which among other things controls the timing and refresh of the memory:
 Bad Design
Attempting to accommodate a wide range of SIMMs has sometimes caused problems. When motherboard designers ignore the wide range of loads, termination requirements and inter-coupling that different grades of SIMMs have, it leads to troubles. When three chip SIMMs first came out they required less address line drive than nine chip SIMMs. They were more sensitive to noise due to the longer traces required on the SIMM board to route signals to the card edge. Going back to using nine chip SIMMs that have shorter traces between the chips and the card edge and a higher address line load factor solved this problem in the short term. The problem went away completely when they redesigned the motherboards with better memory bus termination.
A similar problem occurred with 72 pin SIMMs. Due to the change in DRAM chip size, SIMMs that used to have up to 36 chips now have only two or four. Instead of driving 36 address-line-loads it drives only 4. This overdrive can create a lot of noise or ringing. The address lines must be stable before memory can be accessed.
 Poor Decoupling (Motherboard Power Filtering)
The power-supply can only filter out noise from the power line. Motherboards use decoupling capacitors to filter out noise generated by the processor and drivers mounted on it. In the rush to come out with ever cheaper motherboards, some companies use cheaper electrolytic decoupling capacitors. This became so much of a problem that Intel pointed out in an EE-Times article that many motherboards were not meeting the power quality specifications required by the Pentium processor. The more expensive tantalum capacitors not only work better, they have a better service life. Electrolytic tend to slowly lose quality as they age. This wouldn't be much of a problem if they were specified conservatively, but often, cheaper motherboards are designed for a short service life. Add to this the trend towards low power sleep modes, where the draw on the power-supply can change by magnitudes in a single clock cycle. If there isn't adequate power decoupling on the motherboard you can have memory errors.
 Bad Chip Creating Noise on the 5V Power Lines
Bad motherboard chips also create noise on the power-supply. A leaky or defective input to a chip may require much more current than intended. As the driver chips attempt to drive into these excessive loads, they put noise onto the power-supply. A bad chip can also cause problems by disturbing memory timing.
When motherboards are designed, they try to make them as fast as possible by designing close to chip specifications. This leaves little or no margin for error. When chips are made, they have minimum and maximum timing specs. These specs will swing between these values from chip to chip. When making 1,000's of board, a few parts will be at the minimum and maximum specifications Sometimes in the rush to market and high performance, some worst case situations are ignored with the thought that the chips tend to get faster as the IC foundries gain experience. But some chips won't be quite fast enough. The product life of motherboards may be as short as 6 weeks, giving little time to prove designs in the field.
 Other Boards
Other boards plugged onto the motherboard can cause memory problems. If a daughter board sucks too much power or has a bad driver chip, it can affect the entire system. Remove all daughter boards and see if your memory problem goes away. Then replace one at a time.
 Memory Module
There are several things that can cause a memory module to be bad in a system but still test good. To test a memory model takes more than that old RAM test program you have been lugging around from the DOS days; these programs often end up testing only the processors cache memory!
- Minimum Timing Specs
- Options page mode'fast page mode'EDO(Extended Data Out)
- Speed-Speed-Speed -
 CMOS Settings
Newer motherboards let you change memory timing settings for different memory speeds and types using the advanced CMOS settings. We can not go through all the settings as they change each time a new BIOS and motherboard comes out. The best way to determine the correct settings, is to set all of them to the slowest (usually the highest number). Then one by one speed them up and run a memory test (such as 'PC certify pro') on the system overnight at the high temperature specification. Once problems appear, back off one setting.
These settings will cause the most problems (if your Bios has them)
Ras to Cas Delay
This is the time from when the Row Address Select is taken Lo to when the Column Address Select is taken Lo, this should be 30ns for most memory
This is how soon the memory can be re-accessed after having been accessed from the Ras signal. DRAM has to have a rest or dead time between access cycles, this time is usually the same as the access time. If the access time is 70ns then the dead time is 70ns for a complete cycle time of 140ns.
This is how soon the memory can be re-accessed after having been accessed from the Cas signal. This is for Page Mode or Static Column accessing and is the same thing as Ras Precharge accept it is a shorter time.
Mux to Cas Delay
This is the Delay time from when the address lines have changed, before you can assert the Cas signal, this is typically 15ns.
SDRAM has a unique (more spin of the 1990's) way of measuring speed. 15nS SDRAM is the length of time for of a complete clock signal (66Mhz). The speed no longer refers directly to the access time. It takes one clock cycle to access a memory location in burst mode (in addition to the set up instructions you must send to the SIMM) so a 15nS SDRAM is roughly equivalent to or slower than a 60nS DRAM that is interleaved (see more about interleaving below) as with most standard DRAM controllers. (Yeah, the marketing people got a hold of the SDRAM specifications.)
A SDRAM has most of the memory controller circuitry on chip, so the system has to talk to it just as it would a memory controller. This helps clean up timing problems that we saw in the past. If you are just going out to get one memory word it can be slower than comparable DRAM. This is because there is two dead cycles every time you setup to get data from the memory. This setup time is needed to initiate all memory access cycles no matter what the size being transferred. Thus a 1, 2, 8, or 512 bytes transfer all have the same 3 cycle overhead to setup the SDRAM whether in burst mode or not. The popular bust-mode-page-sizes are 512 and 256 (pretty small) so there is additional set up time between pages. The small page size may have to do with keeping the capacitance of memory lines down so to squeak out more speed in future versions.
Interleaving is a scheme where the system gets its memory accesses with two or more banks of memory. While in the second half of reading the first bank the second bank is already in the first half of it's read cycle. Thus, we have odd and even banks of memory. This cuts the access time in halve IF the memory accesses are sequential as in a burst mode request. If you are just going for a single word you still have to wait for the full access time or more (more due to setup time and the fact that you may have to wait for the right bank of memory to be available). Double data out SDRAM does interleaving on chip.
Now, the sharp eye might notice that the speed of the actual DRAM cell hasn't really changed that much of late. The true speed of a random word access may actually be slower with the new systems. These limitations are really the reason that the fast new processors don't really seem that much faster, unless you are running code that always hits inside the on processor chip cache, and there is no cache thrashing. (Cache thrashing happens when the cache if filled with one block of memory only to be re written with another, back and forth.) One more thing that makes cache trashing happen is multitasking. If we switch back and forth between programs, we guarantee cache misses once we get enough tasks running. The work around for this is to make each task time slot long enough so that the cache filling is not a significant time requirement.
 Caching and the future
Caching schemes usually grab the whole page when there is a cache miss (where the memory byte(s) needed are not in cache). Thus we no longer have Random Access Memory (RAM) in operation in the new computers; we end up getting a whole page into cache every time we get any memory. All the new computer specifications are for sequential memory operations; more of a serial memory interface in a way than what is traditionally thought of as RAM. (Reminds me of the very old drum memory)
These trends are even more evident with the new RAMBUS standard. RAMBUS gets its speed increase in two ways:
- Voltage level The voltage level or swing is much lower 0-3.3 Volts on Sdram and 0-2.2 Volts on Rambus. It takes less time to charge( or discharge) the DRAM cell data storage capacitor to 2.2V than 3.3V
- Proprietary Caching Scheme RAMBUS uses a proprietary caching scheme that allows it to anticipate what memory location(s) are going to be needed next. This does not always work and the software and hardware must be adjusted for it to be a real increase in speed. It is still a workaround tying to avoid the speed limits of the DRAM cell. RAMBUS comes with still more overhead operations to set up a memory to cache transfer than SDRAM.
RAMBUS is an example of how Interleaving can be carried on to more levels. 4 banks of interleaved memory, instead of 2 will once again double the speed of a sequential memory access and there is no reason not to go up to 8,16, or even 32 banks.
One limiting factor is the memory cell where stray capacitance in data lines slows the bus down - so look for the trend to a system module where the RAM die and processor are paced in a cartridge as is the Pentium II is with it's cache memory.
For users who want more and more speed, it turns out that the one thing that can make a huge difference is the software machine code itself. Newer compilers need to be more memory system aware. The trend to huge sloppy 'C' programs has made this a bigger problem than it needs to be. "C" programs are 4 to 10 times larger and slower than a hand crafted and packed assembly code program. I know of one outfit, that is streaming video off of hard-drives and out as HDTV at 80MB/S with an ordinary computer. They do this by careful pairing of instructions and hand packing of assembler code. If Windows was similarly crafted, the hardware of today would be much more than we need for most desktop applications. With the perishable nature of software today I doubt if we will see such a trend; the best we can hope for is the development of sophisticated compilers that take much more of this into account.
 Other Helpful Hints
- Do not mix speeds when you don't have to. Be sure that all your SIMMs are the same speed within 10 ns. If you have SIMMs that test out at 80ns and one that test at 100ns, try replacing it with a 80ns. Use the slower SIMMs in a slower computer.
- Let memory testers run at least 3 or more Passes. This allows the RAM to warm-up internally and find possible refresh problems. Better yet, warm them up to the specified hi temperature before testing.
- The speed of a SIMM is determined by the slowest chip on the strip. If all of the chips on a strip pass at 80ns but one only passes at 95ns, then the strip has an access time of 100ns not 80ns or 90ns.
About the author: Dr. Ah Clem lives in the subterranean labs at Transtronics.