Is Dual Core Unstable

AntonK · Mar 9, 2006

Well first, most processors have 2 levels of on-chip cache. These are usually referred to as L1 and L2 caches. In a dual core, there are many ways you can do it, you can do separate L1 and separate L2, meaning each processor is really completely independent. Or you can do separate L1 and shared L2, which means they share an L2. This is quite common because it allows for more cooperation between the processors since if processor 1 requests a cache line A, and processor 2 needs to do work on that same cache line, then it can use it without having to go to memory.

There are other cache issues with a multiprocessor however, and that is in cache coherence. Imagine if you will the following scenario. Processor 1 accesses memory address 0x1000 (made up address but just go with it) and it gets put into its L2 cache, which also goes into its L1 cache and it does some work with it. Then Processor 2 accesses address 0x1004 which is on the SAME cache line as 0x1000, so it pull that line into its L2 and its L1. Then processor 1 modifies address 0x1000-0x1012. These means that now, processor 1 has the ONLY updated copy of that data, while processor 2 is working with old stale data that is incorrect. This is called the coherence problem. There are multiple ways to solve this, but they all complicate the processor.

As for multiple simultaneous cache accesses, this is actually more of a materials engineering problem, because you simple need to make the cache multiported, that is to say if you need 2 processors to potentially be able to read multiple lines at the same time, you have to make the cache dual ported. This is not hard, it simply makes the cache larger than if it were single ported. So there are trade offs to be made, do you want to make it a take-turns approach where only one can access the cache at a time, or do you want LESS cache (since it is bigger) and have it so they can access at the same time. Usually there is no intuitive way to answer these questions, so we built simulators. It turns out that most of the time for instructions (since you fetch instructions EVERY cycle) you need to have as many ports as you do processors times the super-scalarness of the processor, for instance if you have two processors and each is a 4-way superscalar, it needs to be at least 8-ported. Of course if the L1 isn't shared you only need 2 4 ported caches. For an L2 on the other hand which may not necessarily be accessed every cycle you can have much less porting, so it may be single or just dual ported. Hope that answered some questions.

-AntonK

Log in or Sign up to hide all adverts.

apendrapew · Jan 22, 2006

This has become quite an amusing thread.

Log in or Sign up to hide all adverts.

Anomalous · Jan 23, 2006

AntonK said:

...

There are other cache issues with a multiprocessor however, and that is in cache coherence. Imagine if you will the following scenario. Processor 1 accesses memory address 0x1000 (made up address but just go with it) and it gets put into its L2 cache, which also goes into its L1 cache and it does some work with it. Then Processor 2 accesses address 0x1004 which is on the SAME cache line as 0x1000, so it pull that line into its L2 and its L1. Then processor 1 modifies address 0x1000-0x1012. These means that now, processor 1 has the ONLY updated copy of that data, while processor 2 is working with old stale data that is incorrect. This is called the coherence problem. There are multiple ways to solve this, but they all complicate the processor.

... Hope that answered some questions.

-Anton (TJ) Kiriwas
Click to expand...

That was clearly absolutely practically welleducated answer that went beyond my hopes, Thanks !

Why isnt it possible for the OS to equally distribute all the threads among the cores ? Now that U made it quite clear that having seperate cache is better. Threads do have seperate memory allocations, right ?

Log in or Sign up to hide all adverts.

daktaklakpak · Jan 23, 2006

phlogistician said:

Imagine all the OS crap on one processor, and the app on the other. Instant benefit, no screwing around required.
Click to expand...

Dedicating a core for OS and the other one for apps is wasting CPU resource because OS remains in idle most the time.

AntonK · Jan 23, 2006

Anomalous said:

That was clearly absolutely practically welleducated answer that went beyond my hopes, Thanks !

Why isnt it possible for the OS to equally distribute all the threads among the cores ? Now that U made it quite clear that having seperate cache is better. Threads do have seperate memory allocations, right ?
Click to expand...

Typically, an OS will have a process (really a thread, if that is the smallest unit of execution in the OS) scheduler that will do its best to schedule equally across the processors. Unfortunately, there are problems with that. The first is that sometimes programs simply aren't written as multiple threads. It takes a lot of overhead to write a program to run on N processors as opposed to 1. This overhead is worth it if you know you'll have a multiprocessor, but until lately, that wasn't the case. Also, you have to remember the processors are still having to fight for access to the main memory. The bus between the memory and the processor is really not as fast as people think. It can quickly become saturated.

As for your question about seperate memory allocations, I'm not quite sure what you're asking. It sounds like an OS question, in which case the answer is: "It depends on the kernel you're using." Perhaps rephrase and I can answer.

-AntonK

one_raven · Jan 24, 2006

This is true.
No it's not.
Yes it is, and I know it is because I know what I am talking about and these people, who know what they're talking about, told me so!
No it's not, and this is exactly why...
You're a big, dumb stupid head and I don't like you or your dumb facts. So there!

*sigh*

Facial · Jan 24, 2006

What happened to Dannyboy? He made this thread quite amusing.

Anomalous · Feb 6, 2006

Wow, This AntonK is dam good and genuienly into answering to the topic.

AntonK said:

Typically, an OS will have a process (really a thread, if that is the smallest unit of execution in the OS) scheduler that will do its best to schedule equally across the processors. Unfortunately, there are problems with that. The first is that sometimes programs simply aren't written as multiple threads.
Click to expand...

Yes, but yet it should be possible to fixate different threads on each one of core, they are always lurking in the memory when we look at the process viewer ? So if there are 10 services/progs running in the memory, the OS should put them 5 in one core and 5 in another for a two core processor.

It takes a lot of overhead to write a program to run on N processors as opposed to 1. This overhead is worth it if you know you'll have a multiprocessor, but until lately, that wasn't the case.
Click to expand...

So why not exploit the already existing threads to do the job, let the OS do that. So if we have a big AI Game and there are different components of the program on say 10 different threads the OS should make sure that threads of a single program should be uniformly distributed across the cores. So in our example with two core processor 5 threads will be in one core and 5 in another; the same game should break all performance records with a 10 cores processor.

Also, you have to remember the processors are still having to fight for access to the main memory. The bus between the memory and the processor is really not as fast as people think. It can quickly become saturated.
Click to expand...

I thought we already have dual channel. But clearly that could be a reason why we wont see double performance with doubling cores each time. Fast memory = hot memory. Super computer are hot ? Quad channel memory ? Multichannel Memory with single sim ?

As for your question about seperate memory allocations, I'm not quite sure what you're asking. It sounds like an OS question, in which case the answer is: "It depends on the kernel you're using." Perhaps rephrase and I can answer.

-AntonK
Click to expand...

Nope, it was related to the innitial points of cache coherence.

Do the os still use interrupts ? there must be its equivalent; if there is then is it possible to distribute them among the cores and will that make things any faster ? Clearly I am from the old generations.

Are U getting bored ?

AntonK · Feb 6, 2006

Anomalous said:

Yes, but yet it should be possible to fixate different threads on each one of core, they are always lurking in the memory when we look at the process viewer ? So if there are 10 services/progs running in the memory, the OS should put them 5 in one core and 5 in another for a two core processor.
Click to expand...

Okay, well I'll answer each part separately. Yes, this is entirely possible, but I would have to ask the question of why. I think you may be taking the abstraction of what a thread is a little far. Instead, lets look at what a thread (process, whichever, what we're really talking about is the smallest unit of execution for a given OS) really is. A thread is a place in memory with a thread_block. Inside this block, which is in the OS's portion of memory, we store the current PC (program counter) for that thread, its last CPU state (so that we can do a context switch back into the program) and some information about the memory that thread uses, such as page table or virtual memory translation table.

With this in mind, why fixate processes to one core or another? Given that the processors share the exact same physical memory (ram chips), each processor will run a process/thread scheduler. It will look at all 10 (9 if we assume that the other processor is already running one) and choose which will run. This is based on some algorithm usually looking at how long it has run before, how long since it last run, etc. This means that process/thread A may run on processor 1, then be paused, it may start again on processor 2, then pause, start again on processor 2, then pause, etc.

You may ask yourself, why do we do this run,pause,run,pause? This is what gives a single processor, or even dual processors the illusion of running dozens of things AND getting input from you at the same time. Technically every time you press a key or a packet comes into your ethernet card, your OS has to stop whatever program you're running, switch into OS mode, deal with the new input, then switch BACK to a program. (This of course ignores things such as DMA, but that is another issue).

Anomalous said:

So why not exploit the already existing threads to do the job, let the OS do that. So if we have a big AI Game and there are different components of the program on say 10 different threads the OS should make sure that threads of a single program should be uniformly distributed across the cores. So in our example with two core processor 5 threads will be in one core and 5 in another; the same game should break all performance records with a 10 cores processor.
Click to expand...

Here, I'm confused on what you mean by "already existing threads" You can't just make up threads. A given program must be build from scratch to use threading, if it wasn't, it will NEVER be able to use more than 1 thread. For instance, I have a program that takes a large matrix, say 1000x1000 in size. I want to sum up every single cell in that matrix. This is a simple loop in a single threaded program, right? Well, if we write the program to just use this single loop, how could the OS ever know how to run the program with 2 threads? It wouldn't know what to do with the other one? If instead I wrote the program so that I could use N processors. I decided that each processor would get 1000/N rows. Well if I did THIS, then the program would would run much faster on a multiprocessor system AND it would still run fine on a uniprocessor since 1000/1 = 10000. The problem is, this is a lot more programming, a lot more work, and it wasn't worth it if only 0.1% of the population actually had two processors.

In fact, if you ran this problem with 1 processor and told it to use 2 threads (thread 1 got 0-500 and thread 2 got rows 501-999) then it would run SLOWER than if we just used 1 thread and gave it 0-999 because the processor would waste time switching back and forth.

As for your example with the game, no games are currently written for 10 threads. At most, some use 2-3 threads. Usually 1 to do game AI and 1 to do game logic and maybe 1 to actually push the graphics to the graphics card. This is just an example there are others, but none actually do 10 threads. In the future this may change, since as more people get multiprocessors, we will have more people PROGRAMMING for multiprocessors. Its kind of a supply and demand thing: people won't program for multiprocessors if no one has multiprocessors, and no one buys multiprocessors if no program is written for multiprocessors.

Lets just say that if the OS could possible figure out how to take 1 thread (one stream of computer instructions) and knew WHERE and HOW to make it 2, then all problems would be solves. But they can't, they're not smart enough, and even if they were, the amount of time it would take to figure it out, we could have just ran it with 1 processor.

Anomalous said:

I thought we already have dual channel. But clearly that could be a reason why we wont see double performance with doubling cores each time. Fast memory = hot memory. Super computer are hot ? Quad channel memory ? Multichannel Memory with single sim ?

Nope, it was related to the innitial points of cache coherence.

Do the os still use interrupts ? there must be its equivalent; if there is then is it possible to distribute them among the cores and will that make things any faster ? Clearly I am from the old generations.

Are U getting bored ?
Click to expand...

Not bored, just not exactly sure what you were asking. Dual channel and other types of new ram, if you look, have not REALLY increased performance in computers because we are often still limited by the bus width and speed.

I fail to see how dual channel, quad channel, etc. related to cache coherence. Can you help me out on what you mean?

-AntonK

Anomalous · Feb 7, 2006

Thanks for the effort, I will rarely get someone with your caliber to know all this. Thanks to SciForums.

AntonK said:

Okay, well I'll answer each part separately. Yes, this is entirely possible, but I would have to ask the question of why. I think you may be taking the abstraction of what a thread is a little far. Instead, lets look at what a thread (process, whichever, what we're really talking about is the smallest unit of execution for a given OS) really is. A thread is a place in memory with a thread_block. Inside this block, which is in the OS's portion of memory, we store the current PC (program counter) for that thread, its last CPU state (so that we can do a context switch back into the program) and some information about the memory that thread uses, such as page table or virtual memory translation table.
Click to expand...

So will it make any difference if we could divide physical memory in N partition and have N pagefiles for each N threads. No more switching.

With this in mind, why fixate processes to one core or another? Given that the processors share the exact same physical memory (ram chips), each processor will run a process/thread scheduler. It will look at all 10 (9 if we assume that the other processor is already running one) and choose which will run. This is based on some algorithm usually looking at how long it has run before, how long since it last run, etc. This means that process/thread A may run on processor 1, then be paused, it may start again on processor 2, then pause, start again on processor 2, then pause, etc.
Click to expand...

But what if we just let the processes talk to eachother instead of mingling with eachothers memory, Ram is cheaper these days, so if we could have say 4 ram sims for each of 4 core with seprate buses then what ?

You may ask yourself, why do we do this run,pause,run,pause? This is what gives a single processor, or even dual processors the illusion of running dozens of things AND getting input from you at the same time. Technically every time you press a key or a packet comes into your ethernet card, your OS has to stop whatever program you're running, switch into OS mode, deal with the new input, then switch BACK to a program. (This of course ignores things such as DMA, but that is another issue).
Click to expand...

something that doesnt switched seems desirable.

Here, I'm confused on what you mean by "already existing threads" You can't just make up threads.
Click to expand...

I meant the existing threading model of programs but without switching and seperate memory allocations;

Agreed that the bus is limited but if memory is divided and the bus fetches data to and from cache equally for each division then there will be no cache coherance, each core should process data simultaneously without eachothers knowledge. The divisions should be OS's headache.

A given program must be build from scratch to use threading, if it wasn't, it will NEVER be able to use more than 1 thread. For instance, I have a program that takes a large matrix, say 1000x1000 in size. I want to sum up every single cell in that matrix. This is a simple loop in a single threaded program, right? Well, if we write the program to just use this single loop, how could the OS ever know how to run the program with 2 threads? It wouldn't know what to do with the other one? If instead I wrote the program so that I could use N processors. I decided that each processor would get 1000/N rows. Well if I did THIS, then the program would would run much faster on a multiprocessor system AND it would still run fine on a uniprocessor since 1000/1 = 10000. The problem is, this is a lot more programming, a lot more work, and it wasn't worth it if only 0.1% of the population actually had two processors.

In fact, if you ran this problem with 1 processor and told it to use 2 threads (thread 1 got 0-500 and thread 2 got rows 501-999) then it would run SLOWER than if we just used 1 thread and gave it 0-999 because the processor would waste time switching back and forth.

As for your example with the game, no games are currently written for 10 threads. At most, some use 2-3 threads. Usually 1 to do game AI and 1 to do game logic and maybe 1 to actually push the graphics to the graphics card. This is just an example there are others, but none actually do 10 threads. In the future this may change, since as more people get multiprocessors, we will have more people PROGRAMMING for multiprocessors. Its kind of a supply and demand thing: people won't program for multiprocessors if no one has multiprocessors, and no one buys multiprocessors if no program is written for multiprocessors.

Lets just say that if the OS could possible figure out how to take 1 thread (one stream of computer instructions) and knew WHERE and HOW to make it 2, then all problems would be solves. But they can't, they're not smart enough, and even if they were, the amount of time it would take to figure it out, we could have just ran it with 1 processor.
Click to expand...

Thats how thing were designed for single cores I guess.

Not bored, just not exactly sure what you were asking. Dual channel and other types of new ram, if you look, have not REALLY increased performance in computers because we are often still limited by the bus width and speed.

I fail to see how dual channel, quad channel, etc. related to cache coherence. Can you help me out on what you mean?

-AntonK
Click to expand...

I have personally seen performance of Dual channel and single channel, Dual channel is indeed lot faster as it uses seperate buses.

AntonK · Feb 7, 2006

Okay, I believe I have an idea of what you're talking about now, but what it really is, is basically a bunch of computers sitting next to each other on the same motherboard. Remember though, that in the end, we ARE trying to interact with the system as 1 system. This means that we need a single I/O channel, this would most likely become the bottleneck then.

As for having multiple busses, have you looked at motherboards lately? No room for more busses. As it is, MB makers are having a hard time fitting larger 64 bit wide busses. As for partitioning, it sounds like a good idea until you consider what happens if I'm running a single program? That is not a parallel program but completely single threaded? What if that one program needs more memory than I have allocated to 1 process? It can't use the other processor's memory space? People have tried static partioning schemes, they rarely ever see benefit in real application use. I'll post more later, just thought I'd respond.

I will leave you with this. There are two types of people who work on this. There are people like me who design the hardware systems and how the OS interacts with it, then there are the materials people whose job it is to actually figure out how to build it. We've got PLENTY of ideas and simulations that are killer fast. Problem is, can't physically build a lot of them. No easy solutions, at least not right now.

-AntonK

-AntonK

Log in or Sign up

Is Dual Core Unstable

AntonK Technomage Registered Senior Member

Google AdSense Guest Advertisement

apendrapew Oral defecator Registered Senior Member

Google AdSense Guest Advertisement

Anomalous Banned Banned

Google AdSense Guest Advertisement

daktaklakpak God is irrelevant! Registered Senior Member

AntonK Technomage Registered Senior Member

one_raven God is a Chinese Whisper Valued Senior Member

Facial Valued Senior Member

Anomalous Banned Banned

AntonK Technomage Registered Senior Member

Anomalous Banned Banned

AntonK Technomage Registered Senior Member

Share This Page

Log in or Sign up

Is Dual Core Unstable

AntonK Technomage Registered Senior Member

Google AdSense Guest Advertisement

apendrapew Oral defecator Registered Senior Member

Google AdSense Guest Advertisement

Anomalous Banned Banned

Google AdSense Guest Advertisement

daktaklakpak God is irrelevant! Registered Senior Member

AntonK Technomage Registered Senior Member

one_raven God is a Chinese Whisper Valued Senior Member

Facial Valued Senior Member

Anomalous Banned Banned

AntonK Technomage Registered Senior Member

Anomalous Banned Banned

AntonK Technomage Registered Senior Member

Share This Page

Useful Searches