Debate over CPU architecture (cache coerency) and Peterson
Posted: Thu Mar 05, 2015 11:09 pm
Today we had a small debate about CPU architecture in the forum of our classroom 
Basically the teacher proposed a problem:
The following code implements Peterson algorithm, it is prooven to be correct but on a modern x86 processor does not work (code given by the teacher):
the above algorithm infact does not work, and prints "Sum RANDOM NUMBER rather than 1000000000" (when input of program for gloopCount is 500000000).
However he asked to investigate the problem and find possible solutions.
Here's mine (c++11):
the real change in the code is that I use "alignas(4)". But in theory the compiler should already align integers to 4 bytes and hence to cache lines
The debate is that according to teacher's assistant the problem is not alignment of variables and infact applying "alignas(4)" to the original C code does not make the algorithm working.(however I read alignas of C is different from C++.. or that is false?)
So here we have 2 different versions:
1) not working C version (indipendently of the fact someone used alignas macro)
2) working C++ version (however people here a bit lazy, no one tested running my code, so it could be a peculiarity of my CPU)
Wanna join the debate?
Could be possible that code of teacher does not work because we are running it inside a VM? (well I don't know in wich environment the teacher's assistent runned the code, but at least me I runned the first algorithm on a Linux VM (because using pthreads), while I runned the 2nd algoirthm under windows Vista without using any Virtual env. How I love no one tried to run my code. ^^)
Little warning, the test requires ~2 minutes to end
EDIT: another intersting thing.
I was compiling using "-O3" with that flag my code works nice, without it doesn't work (maybe it depends on MMX registers too?)

Basically the teacher proposed a problem:
The following code implements Peterson algorithm, it is prooven to be correct but on a modern x86 processor does not work (code given by the teacher):
Code: Select all
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
static volatile int flag1 = 0;
static volatile int flag2 = 0;
static volatile int turn = 1;
static volatile int gSharedCounter = 0;
int gLoopCount;
void proc1() {
flag1 = 1;
turn = 2;
while((flag2 == 1) && (turn == 2)) ;
// Critical section
gSharedCounter++;
// Let the other task run
flag1 = 0;
}
void proc2() {
flag2 = 1;
turn = 1;
while((flag1 == 1) && (turn == 1)) ;
// critical section
gSharedCounter++;
// leave critical section
flag2 = 0;
}
//
// Tasks, as a level of indirection
//
void *task1(void *arg) {
int i;
printf("Starting task1n");
for(i=0;i<gLoopCount;i++) proc1();
}
void *task2(void *arg) {
int i;
printf("Starting task1n");
for(i=0;i<gLoopCount;i++) proc2();
}
int
main(int argc, char ** argv)
{
int loopCount = 0;
pthread_t dekker_thread_1;
pthread_t dekker_thread_2;
void * returnCode;
int result;
int expected_sum;
/* Check arguments to program*/
if(argc != 2)
{
fprintf(stderr, "USAGE: %s <loopcount>n", argv[0]);
exit(1);
}
/* Parse argument */
loopCount = atoi(argv[1]);
gloopCount = loopCount;
expected_sum = 2*gloopCount;
/* Start the threads */
result = pthread_create(&dekker_thread_1, NULL, task1, NULL);
result = pthread_create(&dekker_thread_2, NULL, task2, NULL);
/* Wait for the threads to end */
result = pthread_join(dekker_thread_1,&returnCode);
result = pthread_join(dekker_thread_2,&returnCode);
printf("Both threads terminatedn");
/* Check result */
if( gSharedCounter != expected_sum ) {
printf("[-] Sum %d rather than %d.n", gSharedCounter, expected_sum);
return 1;
} else {
printf("[+] Okn");
return 0;
}
}
However he asked to investigate the problem and find possible solutions.
Here's mine (c++11):
Code: Select all
#include <thread>
#include <limits>
#include <iostream>
#undef NDEBUG
#include <cassert>
// make globals local to this file (do no interfere with other tests namespace)
namespace {
alignas(4) static volatile int flag1 = 0;
alignas(4) static volatile int flag2 = 0;
alignas(4) static volatile int turn = 1;
alignas(4) static volatile int gSharedCounter = 0;
alignas(4) int gLoopCount = 500000000;
void proc1() {
flag1 = 1;
turn = 2;
while((flag2 == 1) && (turn == 2)) ;
// ENTER
++gSharedCounter;
// LEAVE
flag1 = 0;
}
void proc2() {
flag2 = 1;
turn = 1;
while((flag1 == 1) && (turn == 1)) ;
// ENTER
++gSharedCounter;
// LEAVE
flag2 = 0;
}
void task1(){
for(alignas(4) int i = 0; i<gLoopCount; i++)
proc1();
}
void task2(){
for(alignas(4) int i = 0; i<gLoopCount; i++)
proc2();
}
}
// Runned by CMake Test runner. See CMakeLists.txt
int main(int argc, char ** argv){
std::cout<<"running peterson with iterations: "<<gLoopCount<<std::endl;
std::thread t1(task1);
std::thread t2(task2);
t2.join();
t1.join();
assert(gLoopCount*2==gSharedCounter);
return 0;
}
The debate is that according to teacher's assistant the problem is not alignment of variables and infact applying "alignas(4)" to the original C code does not make the algorithm working.(however I read alignas of C is different from C++.. or that is false?)
So here we have 2 different versions:
1) not working C version (indipendently of the fact someone used alignas macro)
2) working C++ version (however people here a bit lazy, no one tested running my code, so it could be a peculiarity of my CPU)
Wanna join the debate?
Could be possible that code of teacher does not work because we are running it inside a VM? (well I don't know in wich environment the teacher's assistent runned the code, but at least me I runned the first algorithm on a Linux VM (because using pthreads), while I runned the 2nd algoirthm under windows Vista without using any Virtual env. How I love no one tried to run my code. ^^)
Little warning, the test requires ~2 minutes to end
EDIT: another intersting thing.
I was compiling using "-O3" with that flag my code works nice, without it doesn't work (maybe it depends on MMX registers too?)