So the conclusion of the Measurement and Tools course of UPC consisted on delivering a report on the performance analysis of a NAS benchmark on a simulated super computer and to test some parallel applications on different simulated cache architectures.
The used tools consisted of intel’s pin tool and Dimemas. From the first, we took the dcache application and multicache(not sure about this name) and changed it into allowing multiple cache levels for a parallel application. Then we used it to experiment with changes in cache sizes, associativity and line sizes. Dimemas was used to simulate the execution of our application in the MareNostrum Computer.
How Dimemas actually simulates this computer architectures, I have no idea. But it produced new Paraver traces, and I parsed them and used the results to make performance evaluations.
So, sincerely…what I really learned about it was probaly scripting and using gnuplot (with which I am in love <3). The fastest way and most effective seemed to be performing the experiments and loading all the results to an sqlite database. From there I performed queries as an input to gnuplot.
Also learned that hat my L3 cache was very poorly used 😛
Check the report below, in the appendix it contains all the used scripts to automate the use of Dimemas and the pin tool, such as configuration and graph generation.