Posted: February 6th, 2015

Computer Architectures

Project description
Part 1 (10 marks)
Instruction Level Parallelism (ILP) and Cycles per Instruction (CPI) measure, respectively, potential and actually attained

parallelism. Pipelining is one way to provide parallelism.
Imagine you are tasked with evaluating a new processor to calculate a graphic overlay adding a graphic to a video image.

The basic operation is, for every pixel of the graphic, to add its value to that of the corresponding pixel of the video

image and then to replace the image pixel with this calculated value. The processing is to be done in blocks of 2×2 pixels:

there is one value for each pixel of the image and of the graphic.
For each of the 4 pixels in the 2×2 pixel block the program will:
Load the pixel from the graphic
Load the pixel from the image
Add the pixel values together
Store the calculated value to the image
Loop to do it again
The processor has a 4-stage pipeline with the following stages:
Fetch | Decode | Execute | Store
You are to ignore second-order pipeline effects such as latency, forwarding, and stalls.
You are to assume that the loop (branch or jump) has 4 stages just like other instructions and you are to ignore second

order branch effects such as latency.
1.Draw a pipeline for the add operation, including the units used, and pipeline registers, for each stage. (3 marks)
2.Complete a table showing a row for each instruction and columns for each instruction clock cycle, and in each cell of the

table write, for each instruction, which pipeline stage unit is being used. Do this for all instructions to process an

entire 2×2 block of 4 pixels.( 3 marks)
3.Calculate the ILP, and the CPI averaged over the entire processing for one 4-pixel block. (4 marks)
Part 2 (5 marks)
Cache is a small amount of fast memory: registers are a few memory locations inside the processor. Data that is in

registers is accessed fastest of all: cache is slower than registers but faster than memory.
One strategy for using the cache is for the memory unit to keep data that has been read from memory, in the cache until it

must be overwritten to make room for new data. If the data is in the cache when requested that is called a cache hit: if it

is not in the cache, either because it has not yet been accessed or because it has been overwritten, that is called a cache

miss.
In the previous example, imagine that the video image is HD: 1920×1080 pixels, and the graphic is a small sprite icon of

128×128 pixels. Assume each pixel is a single 8-bit value. The graphic is unchanging: the video images are updated

completely, in memory, 60 times per second. The graphic overlay is added to each video image once: and only the

corresponding block of 128×128 pixels from the video image are accessed to do this.
The processor has a data cache of 64 kbytes. It has 32 data registers. In the following, remember to take into account the

necessary memory accesses to the pixels of the video image.
1.Assume the graphic sprite of 128×128 pixels has been accessed from memory at least once. What percentage of future

accesses to the graphic sprite pixels do you estimate will result in cache hits? (2 marks)
2.If the cache were only 16 Mbytes, what percentage of future accesses to the graphic sprite pixels do you estimate will

result in cache hits? (2 marks)
3.Data that is already in registers is accessed fastest of all. In this example would it be reasonable to use register

variables for the data of the graphic sprite? (1 mark)

CI5220 in-class test
Example – for practice
Outline
The test is in two parts. The test counts for 15 marks in total.
You should submit a brief written report through StudySpace. Your report should be concise and clear, covering the points

raised by the question and no more. The format and detailed layout of your report is up to you and will play no part in its

marking.
Part 1 (10 marks)
Instruction Level Parallelism (ILP) and Cycles per Instruction (CPI) measure, respectively, potential and actually attained

parallelism. Pipelining is one way to provide parallelism.
Imagine you are tasked with evaluating a new processor to calculate a graphic ‘overlay’ – adding a graphic to a video

image. The basic operation is, for every pixel of the graphic, to add its value to that of the corresponding pixel of the

video image and then to replace the image pixel with this calculated value. The processing is to be done in ‘blocks’ of 2×2

pixels: there is one value for each pixel of the image and of the graphic.
For each of the 4 pixels in the 2×2 pixel block the program will:
•   Load the pixel from the graphic
•   Load the pixel from the image
•   Add the pixel values together
•   Store the calculated value to the image
•   Loop to do it again
The processor has a 4-stage pipeline with the following stages:
Fetch | Decode | Execute | Store
You are to ignore second-order pipeline effects such as latency, forwarding, and stalls.
You are to assume that the loop (‘branch’ or ‘jump’) has 4 stages just like other instructions and you are to ignore second

order branch effects such as latency.
1. Draw a pipeline for the ‘add’ operation, including the units used, and pipeline registers, for each stage. (3

marks)
2. Complete a table showing a row for each instruction and columns for each instruction clock cycle, and in each cell

of the table write, for each instruction, which pipeline stage unit is being used. Do this for all instructions to process

an entire 2×2 block of 4 pixels.( 3 marks)
3. Calculate the ILP, and the CPI averaged over the entire processing for one 4-pixel block. (4 marks)
Part 2 (5 marks)
Cache is a small amount of fast memory: registers are a few memory locations inside the processor. Data that is in

must be overwritten to make room for new data. If the data is ‘in’ the cache when requested that is called a cache ‘hit’:

if it is not in the cache, either because it has not yet been accessed or because it has been overwritten, that is called a

cache ‘miss’.
In the previous example, imagine that the video image is HD: 1920×1080 pixels, and the graphic is a small ‘sprite’ icon of

128×128 pixels. Assume each pixel is a single 8-bit value. The graphic is unchanging: the video images are updated

completely, in memory, 60 times per second. The graphic overlay is added to each video image once: and only the

necessary memory accesses to the pixels of the video image.
1. Assume the graphic ‘sprite’ of 128×128 pixels has been accessed from memory at least once. What percentage of

future accesses to the graphic sprite pixels do you estimate will result in cache ‘hits’? (2 marks)
2. If the cache were only 16 Mbytes, what percentage of future accesses to the graphic sprite pixels do you estimate

will result in cache ‘hits’? (2 marks)
3. Data that is already in registers is accessed fastest of all. In this example would it be reasonable to use

‘register variables’ for the data of the graphic ‘sprite’? (1 mark)

PLACE THIS ORDER OR A SIMILAR ORDER WITH US TODAY AND GET AN AMAZING DISCOUNT 🙂

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Computer Architectures

Expert paper writers are just a few clicks away

Calculate the price of your order