Posted: February 5th, 2015

Computer Architectures

Project description
Part 1 (10 marks)
Instruction Level Parallelism (ILP) and Cycles per Instruction (CPI) measure, respectively, potential and actually attained parallelism. Pipelining is one way to

provide parallelism.
Imagine you are tasked with evaluating a new processor to calculate a graphic overlay adding a graphic to a video image. The basic operation is, for every pixel of the

graphic, to add its value to that of the corresponding pixel of the video image and then to replace the image pixel with this calculated value. The processing is to be

done in blocks of 2×2 pixels: there is one value for each pixel of the image and of the graphic.
For each of the 4 pixels in the 2×2 pixel block the program will:
Load the pixel from the graphic
Load the pixel from the image
Add the pixel values together
Store the calculated value to the image
Loop to do it again
The processor has a 4-stage pipeline with the following stages:
Fetch | Decode | Execute | Store
You are to ignore second-order pipeline effects such as latency, forwarding, and stalls.
You are to assume that the loop (branch or jump) has 4 stages just like other instructions and you are to ignore second order branch effects such as latency.
1.Draw a pipeline for the add operation, including the units used, and pipeline registers, for each stage. (3 marks)
2.Complete a table showing a row for each instruction and columns for each instruction clock cycle, and in each cell of the table write, for each instruction, which

pipeline stage unit is being used. Do this for all instructions to process an entire 2×2 block of 4 pixels.( 3 marks)
3.Calculate the ILP, and the CPI averaged over the entire processing for one 4-pixel block. (4 marks)
Part 2 (5 marks)
Cache is a small amount of fast memory: registers are a few memory locations inside the processor. Data that is in registers is accessed fastest of all: cache is

slower than registers but faster than memory.
One strategy for using the cache is for the memory unit to keep data that has been read from memory, in the cache until it must be overwritten to make room for new

data. If the data is in the cache when requested that is called a cache hit: if it is not in the cache, either because it has not yet been accessed or because it has

been overwritten, that is called a cache miss.
In the previous example, imagine that the video image is HD: 1920×1080 pixels, and the graphic is a small sprite icon of 128×128 pixels. Assume each pixel is a single

8-bit value. The graphic is unchanging: the video images are updated completely, in memory, 60 times per second. The graphic overlay is added to each video image once:

and only the corresponding block of 128×128 pixels from the video image are accessed to do this.
The processor has a data cache of 64 kbytes. It has 32 data registers. In the following, remember to take into account the necessary memory accesses to the pixels of

the video image.
1.Assume the graphic sprite of 128×128 pixels has been accessed from memory at least once. What percentage of future accesses to the graphic sprite pixels do you

estimate will result in cache hits? (2 marks)
2.If the cache were only 16 Mbytes, what percentage of future accesses to the graphic sprite pixels do you estimate will result in cache hits? (2 marks)
3.Data that is already in registers is accessed fastest of all. In this example would it be reasonable to use register variables for the data of the graphic sprite? (1

mark)

CI5220 in-class test
Example – for practice
Outline
The test is in two parts. The test counts for 15 marks in total.
You should submit a brief written report through StudySpace. Your report should be concise and clear, covering the points raised by the question and no more. The

format and detailed layout of your report is up to you and will play no part in its marking.
Part 1 (10 marks)
Instruction Level Parallelism (ILP) and Cycles per Instruction (CPI) measure, respectively, potential and actually attained parallelism. Pipelining is one way to

provide parallelism.
Imagine you are tasked with evaluating a new processor to calculate a graphic ‘overlay’ – adding a graphic to a video image. The basic operation is, for every pixel of

the graphic, to add its value to that of the corresponding pixel of the video image and then to replace the image pixel with this calculated value. The processing is

to be done in ‘blocks’ of 2×2 pixels: there is one value for each pixel of the image and of the graphic.
For each of the 4 pixels in the 2×2 pixel block the program will:
•   Load the pixel from the graphic
•   Load the pixel from the image
•   Add the pixel values together
•   Store the calculated value to the image
•   Loop to do it again
The processor has a 4-stage pipeline with the following stages:
Fetch | Decode | Execute | Store
You are to ignore second-order pipeline effects such as latency, forwarding, and stalls.
You are to assume that the loop (‘branch’ or ‘jump’) has 4 stages just like other instructions and you are to ignore second order branch effects such as latency.
1.   Draw a pipeline for the ‘add’ operation, including the units used, and pipeline registers, for each stage. (3 marks)
2.   Complete a table showing a row for each instruction and columns for each instruction clock cycle, and in each cell of the table write, for each instruction,

which pipeline stage unit is being used. Do this for all instructions to process an entire 2×2 block of 4 pixels.( 3 marks)
3. Calculate the ILP, and the CPI averaged over the entire processing for one 4-pixel block. (4 marks)
Part 2 (5 marks)
Cache is a small amount of fast memory: registers are a few memory locations inside the processor. Data that is in registers is accessed fastest of all: cache is

data. If the data is ‘in’ the cache when requested that is called a cache ‘hit’: if it is not in the cache, either because it has not yet been accessed or because it

has been overwritten, that is called a cache ‘miss’.
In the previous example, imagine that the video image is HD: 1920×1080 pixels, and the graphic is a small ‘sprite’ icon of 128×128 pixels. Assume each pixel is a

single 8-bit value. The graphic is unchanging: the video images are updated completely, in memory, 60 times per second. The graphic overlay is added to each video

image once: and only the corresponding block of 128×128 pixels from the video image are accessed to do this.
The processor has a data cache of 64 kbytes. It has 32 data registers. In the following, remember to take into account the necessary memory accesses to the pixels of

the video image.
1. Assume the graphic ‘sprite’ of 128×128 pixels has been accessed from memory at least once. What percentage of future accesses to the graphic sprite pixels do

you estimate will result in cache ‘hits’? (2 marks)
2. If the cache were only 16 Mbytes, what percentage of future accesses to the graphic sprite pixels do you estimate will result in cache ‘hits’? (2 marks)
3. Data that is already in registers is accessed fastest of all. In this example would it be reasonable to use ‘register variables’ for the data of the graphic

‘sprite’? (1 mark)

PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET AN AMAZING DISCOUNT 🙂

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Computer Architectures

Expert paper writers are just a few clicks away

Calculate the price of your order