Interactive MIPS 32-bit Single Cycle Processor on FPGA

Motivation

Ever since I was a kid I wondered how computers worked. I had disassembled some old PC towers and was amazed by the myriad of wires and printed circuit boards but didn’t really grasp how they all came together to form a functioning computer. Fast forward a few years, I built my first computer and could appreciate each component (processor, ram, motherboard, etc.) and how they worked together on a very abstract level but one of my first questions was how a processor worked. From the outside it was no more a silver square and yet it could carry out almost all of the important operations in a computer but it remained a total mystery as to how it actually accomplished this. Even since then I have gained a fascination for processors and by extension computer chips of all kinds but it was not until I discovered FPGAs and HDLs that I realized I could finally understand how all digital systems work on a fundamental level but also that I could build my own. (I won’t delete all this but I honestly think it could be cut out

With so much extra time granted to time during the summer of 2020, I dedicated a few months to teaching myself Verilog and purchased an FPGA (specifically the FII-PRA006) started designing digital logic purely out of a passion for the subject. I wanted to have an “ultimate” goal to work towards for the summer and it had to be something that tied all the topics of basic digital systems together but also engaging and educational. Naturally, I fell back on my long time interest in computer chips and decided I wanted to create my own processor. Since the MIPS instruction set was still fresh on my mind from the previous semester and I admired its elegance I settled on recreating a 32-bit single cycle variant.

Design Overview

In its simplest form a processor can be described as a device which fetches instructions from memory, decodes the instructions and executes these instructions. This cycle continues (more or less) indefinitely and forms the basis of most processors.

The MIPS processor follows a similar structure so that large groupings of hardware can roughly correspond to this cycle as shown in the diagram below. In single cycle processor, only one instruction is executed in any given clock cycle. Most modern processors use pipelining where multiple steps can be stacked together in a single clock cycle and I do plan on investigating that in another project.

Individual blocks of sequential hardware, referred to as state elements and combinational logic blocks used to determine state transitions. The wires and multiplexers connecting the blocks together provide data paths the processor will use to move data from one block to another or perform some type of operation. My overall plan was to characterize, design and test each block of hardware separately and then once that was complete slowly build up the processor block by block until it was complete and fully functioning.

Instruction Memory

Instruction memory acts as a storage place where all of the instructions that the processor reside in a sequential fashion. It is similar RAM in the sense that it receives an address and returns the contents stored at that address and in many processor designs, the RAM and instruction memory are actually one in the same. For simplicity sake I will keep it as a separate in this implementation.

The verilog code is very simple and only require the declaration of a single memory array. Each word is has a width of 32-bits and same goes for the address. The “readmemb” statement initializes the instruction memory with a text file. The text is later used for entering a program that the processor will run. After the instruction memory is first initialized, it will act as read only and cannot be written to.

module instruction_memory(
	input[31:0] a,
	output[31:0] rd
);

	reg[31:0] i_mem[31:0];
	
	//Used to load program into instruction memory
	
	initial begin
		$readmemb("program.txt",i_mem);
	end
	
	
	assign rd = i_mem[a];
	
endmodule

Register File

The register file can be thought as “scratch pad” where temporary values, intermediates operations and results are stored. On a hardware level, the register file represents a portion of the processor’s on board memory. The register file can receive two addresses simultaneously and output two register contents. One register can also be written to, if the write enable high and an appropriate address and data is provided.

The verilog code is relatively simple and utilizes an always statement to make the block sensitive to the positive edge of clock. Notice that address zero is constantly initialized to zero, this forms the “zero” register which is useful when the processor is executing certain type of instructions.

module register_file(
	input clk,
	input reset,
	input we3,
	input v_f,
	input [4:0] a1 ,
	input [4:0] a2 ,
	input [4:0] a3 ,
	input [31:0] wd3 ,
	output [31:0] rd1 ,
	output [31:0] rd2 
);
	reg [31:0] r_mem[31:0];
	assign rd1 = r_mem[a1];
	assign rd2 = r_mem[a2];
	
	always@(posedge clk)begin
		//Creating $0 register
		r_mem[0] <= 32'b0;
		if(we3&v_f)begin
			r_mem[a3]<=wd3;
		end
	end
endmodule

Data Memory

Data memory is similar to instruction memory except it can be written to and requires a clock input. Data memory takes a single address and can only read or write in single clock cycle. The write enable must be high in order for writing to occur.

The implementation is similar to the register file but only contains a single address input for read or write. The “readmemb” is used to initialize data memory from a text file much like instruction memory.

module data_memory(
	input clk,
	input we,
	input v_f,
	input [31:0] a,
	input [31:0] wd,
	output [31:0] rd,
	output [31:0] out_zero
);
	reg [31:0] d_mem[31:0];
	
	assign rd = d_mem[a];
	assign out_zero = d_mem[0];
	
	always@(posedge clk)begin
		if(we&v_f)begin
			d_mem[a]<=wd;
		end
	end
	
	initial begin
		$readmemb("data_memory.txt",d_mem);
	end
endmodule

Sign Extender

Instructions read from the instruction memory may contain and address offsets (meant for addressing the data memory) which are 16-bits wide, however the data memory takes addresses that are 32-bits wide. Address offsets are added to a specific data memory address as a way to traverse the memory space. The offset can be positive or negative and therefore must conform to two’s complement up to 32-bits. The sign extender will take the most significant bit of the 16-bit value and repeat it’s value another 16 times in order to make a 32-bit offset.

Sign extension is achieved through replicating the most significant bit of input 16 times and them concatenating that with the initial 16-bits.

module sign_extend(
	input [15:0] instr,
	output [31:0] sign_imm
);
	assign sign_imm = {{16{instr[15]}},instr};
endmodule

Arithmetic Logic Unit (ALU)

The arithmetic logic unit or ALU for short can be thought of as the heart of the MIPS processor and performs all of the operations (addition, subtraction, and, or, comparisons, etc) that the instruction set requires. The ALU take two arguments as inputs and performs an operation on them depending on an ALU control signal. It will output the result along with specific “flags” if certain conditions are met, such as a zero result or overflow.

The verilog code uses a case statement to determine what operation to perform depending on the control signal. A zero flag is set high if the result happens to be zero.

module alu(
	input [2:0] alu_control,
	input [31:0] src_a,
	input [31:0] src_b,
	output reg [31:0] alu_result,
	output reg zero
);
	always@(*)begin
		case(alu_control)
			0:alu_result = src_a&src_b;
			1:alu_result = src_a|src_b;
			2:alu_result = src_a+src_b;
			4:alu_result = src_a&~src_b;
			5:alu_result = src_a|~src_b;
			6:alu_result = src_a-src_b;
			7:alu_result = ($signed(src_a)<$signed(src_b)) ? 1:0;
			default:alu_result = 32'b0;
		endcase
		
		if(alu_result == 32'b0)begin
			zero = 1'b1;
		end
		else begin
			zero = 1'b0;
		end
	end
endmodule

Control Unit

The control unit receive and opcode and funct signal from the instructions, decode that specific input and translate it into the appropriate control signals that orchestrate the entire processors. It is special purpose decoder designed around the different instructions a processor might receive. In this case, R-type instructions which involve registers as arguments, I-Type instructions which involve immediate values and branch instructions which manipulate the program counter. Inside the control unit is a smaller decoder dedicated to taking the funct input and translating it into an ALU control signal which the ALU will understand.

The control unit is implemented in verilog as a decoder with separate case statements for the larger opcode decoder and the smaller alu decoder.

module control_unit(
	input [5:0] opcode,
	input [5:0] funct,
	output reg mem_to_reg,
	output reg mem_write,
	output reg branch,
	output reg [2:0] alu_control,
	output reg alu_src,
	output reg reg_dst,
	output reg reg_write,
	output reg jump,
	output reg reg_src
);
	reg [1:0] alu_op;
	always@(*)begin
		case(opcode)
			6'b000000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1100_001000;
			6'b100011:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1010_010000;
			6'b101011:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0010_100000;
			6'b000100:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0001_000100;
			6'b001000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1010_000000;
			6'b000010:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0000_000010;
			6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;
			default:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0000_000000;
		endcase
		casex(alu_op)
			2'b00:alu_control = 3'b010;
			2'bx1:alu_control = 3'b110;
			2'b1x:
			begin
				case(funct)
					6'b100000:alu_control = 3'b010;
					6'b100010:alu_control = 3'b110;
					6'b100100:alu_control = 3'b000;
					6'b100101:alu_control = 3'b001;
					6'b101010:alu_control = 3'b111;
					default:alu_control = 3'b000;
				endcase
			end
		endcase
	end
endmodule

Load Word Datapath

After verifying that all the major blocks of hardware that go into making processors were functioning properly, the next was to connect each block form complete data paths and continue adding blocks one by one until I arrived at the complete design.

The first data path to verify was the load word data path. It requires the program counter, instruction memory, register file, sign extender, alu, control unit and data memory. Once connected as shown in the diagram, the processor is able to read instructions sequentially and execute them. In the case with the available hardware and connections the processor is only able to perform the load word instruction where a 32-bit word from data memory is loaded into a register.

The verilog implementation mostly involved instantiating all the necessary hardware blocks and connecting them together. The program counter required a bit of extra code in order to increment every clock cycle.

module scp_1(
	input inclk,
	input ext_reset,
	output [31:0] out
);
	//Interconnects
	wire clk = inclk;
	wire reset = ext_reset;
	wire [31:0] pc_prime;
	wire [31:0] instr;
	wire [31:0] rd1;
	wire [31:0] rd2;
	wire [31:0] sign_imm;
	wire zero;
	wire [31:0] read_data;
	wire [31:0] alu_result;
	
	//Muxes
	
	//Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else begin
			pc<=pc_prime;
		end
	end
	
	//Next Instruction
	assign pc_prime = pc + 1;
	
	//Instruction Memory
	instruction_memory instruction_memory_0
	(
		pc,
		instr
	);
	
	//Control Unit
	wire mem_to_reg;
	wire mem_write;
	wire branch;
	wire [2:0] alu_control;
	wire alu_src;
	wire reg_dst;
	wire reg_write;
	control_unit control_unit_0
	(
		instr[31:26],
		instr[5:0],
		mem_to_reg,
		mem_write,
		branch,
		alu_control,
		alu_src,
		reg_dst,
		reg_write
	);
	
	
	//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		5'b0,
		instr[20:16],
		read_data,
		rd1,
		rd2
	);
	
	//Sign Extender
	sign_extend sign_extend_0
	(
		instr[15:0],
		sign_imm
	);
	
	//ALU
	alu alu_0
	(
		alu_control,
		rd1,
		sign_imm,
		alu_result,
		zero
	);
	
	//Data Memory
	data_memory data_memory_0
	(
	clk,
	mem_write,
	alu_result,
	32'b0,
	read_data
	);
	
	assign out = read_data;
endmodule

The program is written in MIPS assembly and then translated into machine by hand. It is three instructions which load some data from data memory into some registers. The data memory were initialized with some values ahead of time since the processor wasn’t capable of writing to the data memory on its own.

/*Assembly
lw $1,0($0)
lw $2,1($0)
lw $3,2($0)
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
//Data memory
00000000000000000000000000000001
00000000000000000000000000000101
00000000000000000000000000001001
//Register
00000000000000000000000000000000 
00000000000000000000000000000001
00000000000000000000000000000101 
00000000000000000000000000001001 

Save Word Datapath

The save word is quite similar to the load word datapath except the second register output of the register file is connected to the write data input of data memory. This allows data stored in the registers to be written to data memory.

A connection was made between the register file and the data memory address input.

register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		instr[20:16],
		instr[20:16],
		read_data,
		rd1,
		rd2
	);

The program starts the same as load word datapath but adds on three more instructions which copy the content from the registers into the next available locations in data memory.

/*Assembly
lw $1,0($0)
lw $2,1($0)
lw $3,2($0)
sw $1,3($0)
sw $2,4($0)
sw $3,5($0)
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
//Data memory
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001 
//Register File
00000000000000000000000000001001 
00000000000000000000000000000101
00000000000000000000000000000001 
00000000000000000000000000000000

R-Type Datapath

R-type instructions take two registers as arguments and store the output in a register. In order to make this possible some modifications to the data path were necessary. A multiplexer was placed between the register file’s write address and the instructions since the register address for r-type instructions are in a different bit location compared to other instructions. A multiplexer was also placed between the output of the register file and input b of the alu, so the processor as able to switch between operations involving a register and an immediate and only registers. Finally, another multiplexer allows the processor to choose where to write back to the registers via the data memory or the alu.

I decided a section in the code to these multiplexers and the future data routing blocks.

//Interconnects
        reg [4:0] write_reg;
	reg [31:0] src_b;
	reg [31:0] result;
	//Muxes
	always@(*)begin
		case(reg_dst)
			0:write_reg = instr[20:16];
			1:write_reg = instr[15:11];
		endcase
	end
	
	always@(*)begin
		case(alu_src)
			0:src_b = rd2;
			1:src_b = sign_imm;
		endcase
	end
	
	always@(*)begin
		case(mem_to_reg)
			0:result = alu_result;
			1:result = read_data;
		endcase
	end

//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		instr[20:16],
		write_reg,
		result,
		rd1,
		rd2
	);

Here I used two add instructions and store the result somewhere else in the register file to verify the result.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
add $1,$2,$3
add $1,$3,$4
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00000000001000100001100000100000
00000000001000110010000000100000
//Data memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000111
00000000000000000000000000000110
00000000000000000000000000000101
00000000000000000000000000000001
00000000000000000000000000000000

Branch if equals Datapath

Branch if equals or BEQ for short allows the processor to compare if the values contained into two registers is equal and if so, to branch to another part of the instruction memory using a given offset. In order to determine if two registers are equal, the alu will subtract one from the other. If the result is zero they are equal and the zero flag goes high. This is ANDed with a branch signal from the control unit and is used to determine whether or not the next value to loaded into the program counter is simply pc+1 or pc+1+offset.

The result of the offsetted address is put onto an intermediate line and a multiplexer is used to determine the next program counter value.

wire [31:0] pc_branch;
reg [31:0] pc_next;
wire pc_src;

always@(*)begin
		case(pc_src)
			0:pc_next = pc_prime;
			1:pc_next = pc_branch;
		endcase
	end

        //Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else begin
			pc<=pc_next;
		end
	end

	//Branch offset
	assign pc_src = branch&zero;
	assign pc_branch = pc_prime + sign_imm;

There are two BEQ instructions with one meant to fail and the next one to succeed and branch. In this case the last BEQ instruction branches to a previous location in the instructions and the processor is see to be looping for the first time.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
add $1,$2,$3
add $1,$3,$4
beq $1,$3,-1
beq $2,$1,-5
*/
//Program
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00000000001000100001100000100000
00000000001000110010000000100000
10001100000000100000000000000000
00010000001000111111111111111111
00010000001000101111111111111011
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000011 
00000000000000000000000000000010 
00000000000000000000000000000001 
00000000000000000000000000000001 
00000000000000000000000000000000

Add immediate Datapath

Add immediate or addi for short is used to take an immediate value store in the instruction and write it to a specific register. This is useful for initializing registers to a known value. The processor was already capable of performing the addi instruction with the available hardware blocks and connections and only needs a modification to the control unit.

Another line in the case statement of the control unit was added in order to accommodate the addi instruction.

6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;

The program initializes register at address 1 with an immediate value.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000 
00000000000000000000000000001010
00000000000000000000000000000101 
00000000000000000000000000001001

Jump Datapath

The jump instruction takes a specific location in the instruction memory and will jump to that location unconditionally. A small amount of hardware is used to accomplish this and an additional control signal must added to the control unit. When a jump instruction is executed the jump will go high and the next value loaded into the program counter in guaranteed to be the immediate value.

A multiplexer controlled by the jump signal achieves the necessary priority switching between the next, branch and jump values for the program counter.

//Interconnects
reg [31:0] pc_next;
wire jump;

//Multiplexers
always@(*)begin
		case(jump)
			0:pc_inter = pc_next;
			1:pc_inter = instr[25:0];
		endcase
	end

//Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else if(v_f) begin
			pc<=pc_inter;
		end
	end

The program will jump back to the first instruction (address 0) and continue in an infinite loop.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
j 0
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
00001000000000000000000000000000
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000 
00000000000000000000000000001010
00000000000000000000000000000101 
00000000000000000000000000001001

Load Switch Datapath

Eventually I wanted to the processor to be fully interactive using the hardware on the FPGA, so I needed some way for the processor to read data from the outside world and use it. I decided to use the bank of eight dip switches as it made for a extremely easy 8-bit input. There are couple of options to implement i/o into these simple processors. One way is to connect i/o signals to sections of the register or memory as designate them specifically as i/o locations. While this approach is more realistic, it also uses up precious FPGA real estate especially considering the die area of the Cyclone 10 LP. Instead I created a brand new instruction that the processor can execute called load switch or lsw for short. It can load the current value of the dip switches as an 8-bit binary number and store it in a specified register. The only extra hardware required was another multiplexer to bypass the register file’s data input with the switch input and another control signal in the control unit.

Similar to the jump datapath a multiplexer is used and a separate control signal controls what the data input to the register file is. The switch input is simulated as a constant value in the test bench.

//Module
input [7:0] sw

//Interconnects
wire reg_src;
reg [31:0] reg_data_write;

//Multiplexer
always@(*)begin
		case(reg_src)
			0:reg_data_write = result;
			1:reg_data_write = sw;
		endcase
	end

//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		v_f,
		instr[25:21],
		instr[20:16],
		write_reg,
		reg_data_write,
		rd1,
		rd2
	);

//Control Unit
	control_unit control_unit_0
	(
		instr[31:26],
		instr[5:0],
		mem_to_reg,
		mem_write,
		branch,
		alu_control,
		alu_src,
		reg_dst,
		reg_write,
		jump,
		reg_src
	);
//Control Unit
output reg reg_src

6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;
/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
lsw $1
*/

//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
00001000000000010000000000000000
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000
00000000000000000000000000011111 
00000000000000000000000000000101 
00000000000000000000000000001001 

Fibonacci Program

After the processor was completed I wanted to it to carry out a program which was simple enough that it could be easily verified and complicated enough that it utilized every block of hardware at least once. So I settled on the all time classic of a fibonacci calculator for that very reason with the added bonus of knowing what the expect result should be at all times. In words the program will calculate each value in the fibonacci sequence, save the current fibonacci number to data memory, stop at the Nth fibonacci number as determined by the dip switches and then repeat infinitely. The lsw instruction was important here as it allows for some level of interactivity.

Addi $4,$0,1 //Initialize registers
Addi $1,$0,0 
Addi $2,$0,1

Add $3,$2,$1 //Calculate next fibonacci number
Add $1,$2,$0
Add $2,$3,$0

Sw $3, 0($0)//Save to data memory

Lsw $5 //Update current switch value
Slt $6,$4,$5 //Check if counter is less than
 switch
Addi $4,$4,1
Beq $6,$0,1 //If counter >= switch, stop

J 3 //Inner Loop
J 0 //Outer Loop

//Machine Code
00100000000001000000000000000001
00100000000000010000000000000000
00100000000000100000000000000001
00000000001000100001100000100000
00000000000000100000100000100000
00000000000000110001000000100000
10101100000000110000000000000000
01000000000001010000000000000000
00000000100001010011000000101010
00100000100001000000000000000001
00010000000001100000000000000001
00001000000000000000000000000011
00001000000000000000000000000000

Interface Overview

As mentioned previously I wanted this processor to be fully interactive, so not only could you see what happening as it’s executing instructions but you can also see use the dip switches as a way to dynamic change the input. The idea is to use the dip switches as a number input and the four push buttons to navigate a “menu” showing each signal inside the processor and adjust its clock speed. Left and right would move through a circular list of different signals and up and down would either double or half the clock speed. Although modern processors can reach into the gigahertz range, I wanted to this project to be more about seeing each part of the processor at a fundamental level. For that reason the clock will range between about 0.25Hz and 32Hz, so it’s slow enough the human eye can read each instruction individually.

The processor itself will become a hardware block alongside other blocks used to constructing the interface. I find it best to take a problem like this and break it down into a series of simpler problems rather than one giant one. Very briefly, the processor has several signal “taps” that are translated into hexadecimal and then pushed to a seven segment display. The output menu allows selection of each signal and the frequency divider allows adjustment of the processor’s speed. The system is controlled by a system clock provided by a phase locked loop (PLL).

Output Menu

The output menu will select an input signal from one of its inputs and depending on its current state will output the selected input signal. The menu is “circular” such that progressing to the last menu option and once more will return back to the first option. Similarly, moving to the previous menu option on the first screen will move to the last option. To achieve this, a register is set to remember the current menu position and holds the select signal for a multiplexer. An adder is wired up such that whenever the enable of the register is high and the clock is on the positive edge, an incremented value will be stored. In effect, the enable line is able to cycle through each menu position. In this case the register is N bits wide and once the register reaches the value 2^N-1 it will roll over back to zero, thus by taking advantage of the properties of binary numbers, the menu can be made circular. They only caveat is that the number of menu items must be a power of 2.

module output_menu(
	input clk,
	input rst,
	input [1:0] button,
	input [23:0] in0,
	input [23:0] in1,
	input [23:0] in2,
	input [23:0] in3,
	input [23:0] in4,
	input [23:0] in5,
	input [23:0] in6,
	input [23:0] in7,
	output reg[23:0] out
);
	reg [2:0] position;
	always@(posedge clk)begin
		if(rst)begin
			position<=3'b0;
		end
		else if(button[0]&~button[1])begin
			position<=position - 1'b1;
		end
		else if(button[1]&~button[0])begin
			position<=position + 1'b1;
		end
	end
	
	always@(*)begin
		case(position)
			0:out = in0;
			1:out = in1;
			2:out = in2;
			3:out = in3;
			4:out = in4;
			5:out = in5;
			6:out = in6;
			7:out = in7;
		endcase
	end
endmodule

Frequency Divider

The frequency divider will take the system clock and output lower frequency flags that are required for other applications. For example the seven segment display uses both the system clock and a 1KHz (1ms period) flag to cycle through each digit. Since I want a processor to have variable speed capability, a separate variable flag is needed which can go faster or slower based on a button input. The frequency divider uses a system of cascading counters starting at the microsecond range and going down to some length of time longer than a millisecond. Each successive counter feeds off the last by counting the flags of the previous counter. Splitting up the counters like this makes for a faster circuit overall, since a large counter rolling over to zero as the signal has to propagate through more flip flops before it’s done.

The variable flag is generated by a variable counter which have a similar structure to a typical counter (like the one used for the millisecond flag) but an external signal which bit shift a register left or right and thus change the amount of time it takes for a flag to be pulsed. Shifting left doubles the time it takes for the counter to roll over making the flag pulse slower. Shifting right halves the time it take for the counter to roll over making the flag pulse faster. This variable flag is then connected to the enable of all the flip flops in the processor, allowing for adjustable speed control in powers of two. By default, the variable flag starts at a moderate frequency of 2Hz and can reach a maximum frequency 1024x that amount and a minimum frequency of 1/8x that amount.

module f_div(
	input clk,
	input rst,
	input [1:0] button,
	output reg m_f,
	output reg v_f,
	output reg dp_f
);
	reg [15:0]us_counter;
	reg [15:0]ms_counter;
	reg [31:0]vf_counter;
	reg [31:0]dp_counter;
	reg [31:0]v_max;
	reg u_f;
	
	always@(posedge clk)begin
		if(rst)begin
			us_counter<=0;
			u_f<=0;
		end
		else begin
			u_f<=0;
			if(us_counter == 49)begin
				us_counter<=0;
				u_f<=1;
			end
			else begin
				us_counter<=us_counter+1;
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			ms_counter<=0;
			m_f<=0;
		end
		else begin
			m_f<=0;
			if(u_f)begin
				if(ms_counter==999)begin
					ms_counter<=0;
					m_f<=1;
				end
				else begin
					ms_counter<=ms_counter+1;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			vf_counter<=0;
			v_f<=0;
		end
		else begin
			v_f<=0;
			if(u_f)begin
				if(vf_counter >= v_max)begin
					vf_counter<=0;
					v_f<=1;
				end
				else begin
					vf_counter<=vf_counter+1;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			dp_counter<=0;
			dp_f<=0;
		end
		else begin
			dp_f<=0;
			if(u_f)begin
				if(dp_counter >= v_max)begin
					dp_counter<=0;
					dp_f<=1;
				end
				else begin
					dp_counter<=dp_counter+2;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			v_max<=32'h0008_0000;
		end
		else if(button[1]&~button[0])begin
			if(v_max < 32'h0400_0000)begin
				v_max <= {v_max[30:0],v_max[31]};
			end
		end
		else if(button[0]&~button[1])begin
			if(v_max > 32'h0000_0200)begin
				v_max <= {v_max[0],v_max[31:1]};
			end
		end
	end
endmodule

Button Debounce

Buttons are mechanical devices and are therefore subject to less than ideal behavior in high speed digital systems. When a button is pressed it appears as though the transition from off to on is smooth and instantaneous but in fact the button makes and breaks contact several times before coming to equilibrium. The same is true when a button is released, the contacts will spend some time making and breaking the circuit before completely stabilizing. This is undesirable as each “bounce” could potentially be registered as an unintentional button press and the button will have an unpredictable behavior.

Instead, most electronic devices that require a button input will employ some kind of debouncing algorithm (in hardware or software) that removes the extraneous bounces. I used a state machine based solution which will detect the initial button press and then wait 10ms for the button to come to equilibrium and then detect the button release and wait another 10ms until it has reached idle again. If at any point the button transitions during the 10ms timer, the timer will reset and guarantees any bounces are ignored. The button debounce will signal high that the button has been successfully pressed for one clock cycle.

The verilog implementation uses a case statement to determine the next state of the system and the millisecond flag from the frequency divider is used for the timer.

module debounce(
	input clk,
	input rst,
	input m_f,
	input button,
	output reg db_button
);
	reg [1:0] state;
	reg [3:0] counter;
	always@(posedge clk)begin
		if(rst)begin
			state<=0;
			counter<=0;
			db_button<=0;
		end
		else begin
			case(state)
				0:
				begin
					counter<=0;
					db_button<=0;
					if(button)begin
						state<=1;
					end
					else begin
						state<=0;
					end
				end
				1:
				begin
					if(~button)begin
						state<=0;
					end
					else begin
						if(counter == 9)begin
							state<=2;
							counter<=0;
						end
						else if(m_f)begin
							counter<=counter+1;
						end
					end
				end
				2:
				begin
					counter<=0;
					if(~button)begin
						state<=3;
					end
					else begin
						state<=2;
					end
				end
				3:
				begin
					if(button)begin
						state<=2;
					end
					else begin
						if(counter == 9)begin
							state<=0;
							counter<=0;
							db_button<=1;
						end
						else if(m_f)begin
							counter<=counter+1;
						end
					end
				end
			endcase
		end
	end
endmodule

Display Driver

The display driver with translate the binary input into something the seven segment display understands and also management which digits to light up when. The simple solution is to wire each segment of the 6 digit display to a single pin of the FPGA, but that require 48 separate traces and 48 pins decided just to the display. This is not efficient, so instead most systems that have a seven segment display use multiplexing to reduce the amount of hardware and i/o required. Every digit’s segments are wired together and the FPGA will choose which digit to illuminate using a select line. Only one digit is on at any given time but if they are cycled through fast enough the persistence of vision gives the illusion of a continuous display.

A state machine will sequentially flash each digit using a 6-bit select line while simultaneously selecting what to display from a hexadecimal to seven segment display decoder. The state machine transitions states every millisecond flag, giving the display a refresh rate of ~167Hz.

A separate decimal point or dp flag was added in later revisions as a way to indicate the status of the variable flag. The dp flag runs at twice the frequency, so as to simulate the rising and falling edge of a clock if it were to be running that fast.

module display_driver(
	input clk,
	input m_f,
	input dp_f,
	input rst,
	input [23:0] hex,
	output reg [7:0] digit,
	output reg [5:0] select
);
	reg [3:0] state;
	reg [3:0] decode;
	always@(posedge clk)begin
		if(rst)begin
			state<=0;
			decode<=0;
			select<=6'b111111;
		end
		else begin
			case(state)
				0:
				begin
					if(m_f)begin
						state<=1;
					end
					decode<=hex[3:0];
					digit <= {dp,num};
					select<=6'b111110;
				end
				1:
				begin
					if(m_f)begin
						state<=2;
					end
					decode<=hex[7:4];
					digit <= {1'b1,num};
					select<=6'b111101;
				end
				2:
				begin
					if(m_f)begin
						state<=3;
					end
					decode<=hex[11:8];
					digit <= {1'b1,num};
					select<=6'b111011;
				end
				3:
				begin
					if(m_f)begin
						state<=4;
					end
					decode<=hex[15:12];
					digit <= {1'b1,num};
					select<=6'b110111;
				end
				4:
				begin
					if(m_f)begin
						state<=5;
					end
					decode<=hex[19:16];
					digit <= {1'b1,num};
					select<=6'b101111;
				end
				5:
				begin
					if(m_f)begin
						state<=0;
					end
					decode<=hex[23:20];
					digit <= {1'b1,num};
					select<=6'b011111;
				end
			endcase
		end
	end
	
	always @ (*)begin
		case (decode)
			0 : num <= 7'b100_0000;
			1 : num <= 7'b111_1001;
			2 : num <= 7'b010_0100;
			3 : num <= 7'b011_0000;
			4 : num <= 7'b001_1001;
			5 : num <= 7'b001_0010;
			6 : num <= 7'b000_0010;
			7 : num <= 7'b111_1000;
			8 : num <= 7'b000_0000;
			9 : num <= 7'b001_0000;
			10 : num <= 7'b000_1000;
			11 : num <= 7'b000_0011;
			12 : num <= 7'b100_0110;
			13 : num <= 7'b010_0001;
			14 : num <= 7'b000_0110;
			15 : num <= 7'b000_1110;
			default : num <= 7'b1000_0000;
		endcase
	end
	
	reg dp;
	reg [6:0] num;
	
	always@(posedge clk)begin
		if(rst)begin
			dp<=1'b0;
		end
		else if(dp_f)begin
			dp<=~dp;
		end
	end
	
endmodule

System Control

The system control contains a phase locked loop (PLL) to provide a stable system clock and a system reset to bring all hardware blocks into a known state. The PLL takes time to stabilize and will signal high on the “locked” line once it has done so. At that time that system reset will be brought low and the FPGA will be responsive to the system clock.

module system_control(
	input clk,
	output sys_clk,
	output reg sys_rst
);
	wire locked;
	
	PLL1 PLL_inst
	(
		.areset(0),
		.inclk0(clk),
		.c0(sys_clk),
		.locked(locked)
	);
	
	always@(posedge sys_clk)begin
		sys_rst<=~locked;
	end
endmodule

Overall Implementation

The following block diagram is summary of how each hardware block comes together to form the complete system. Button presses and switch positions are taken as input, the processor decides what to do and the result is pushed to the seven segment displays.

module top_module(
	input [7:0] sw,
	input [3:0] button,
	input clk,
	input rst,
	output [7:0] digit,
	output [5:0] select
);
	wire sys_clk;
	wire sys_rst;

	wire [23:0] out_zero;
	wire [23:0] control_signal;
	wire [23:0] instr_lower;
	wire [23:0] instr_upper;
	wire [23:0] alu;
	wire [23:0] program_counter;
	wire [23:0] register_1;
	wire [23:0] register_2;
	
	wire [23:0] out;

	wire m_f;
	wire v_f;
	wire dp_f;
	
	wire b_up;
	wire b_down;
	wire b_left;
	wire b_right;
	
	system_control system_control_inst
	(
		.clk(clk),
		.sys_clk(sys_clk),
		.sys_rst(sys_rst)
	);
	
	 scp_1 scp_1_inst
	 (
		.inclk(sys_clk),
		.v_f(v_f),
		.ext_reset(sys_rst),
		.sw(sw),
		.out_zero(out_zero),
		.control_signal(control_signal),
		.instr_lower(instr_lower),
		.instr_upper(instr_upper),
		.alu(alu),
		.program_counter(program_counter),
		.register_1(register_1),
		.register_2(register_2)
	);
	
	f_div f_div_inst
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.button({b_down,b_up}),
		.m_f(m_f),
		.v_f(v_f),
		.dp_f(dp_f)
	);
	
	output_menu output_menu_inst
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.button({b_right,b_left}),
		.in0(out_zero),
		.in1(control_signal),
		.in2(instr_lower),
		.in3(instr_upper),
		.in4(alu),
		.in5(program_counter),
		.in6(register_1),
		.in7(register_2),
		.out(out)
	);
	
	
	display_driver display_driver_inst
	(
		.clk(sys_clk),
		.m_f(m_f),
		.dp_f(dp_f),
		.rst(sys_rst),
		.hex(out),
		.digit(digit),
		.select(select)
	);
	
	
	debounce up
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[0]),
		.db_button(b_up)
	);
	
	debounce down
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[1]),
		.db_button(b_down)
	);
	
	debounce left
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[2]),
		.db_button(b_left)
	);
	
	debounce right
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[3]),
		.db_button(b_right)
	);
endmodule

It was determined mathematically that the largest fibonacci number that could display was 9,227,465 which corresponds to the 35th number in the sequence or 100011 on the dip switches.

Performance Analysis

After compiling the final version of the design, it used 1,128 logic elements out of a total of 6,272 and contained 468 registers.

In the early stages of the processor design itself, the maximum frequency or fmax for a Slow 1200 mV 85C model was ~200 Mhz. Once all the components were added the fmax dropped to ~70 Mhz and once the processor was incorporated in the final design, the fmax was 56.49 MHz. I did not do a detailed timing analysis of the design but what I can observe is that in general, as more components are added the slower the final design will be.

Demonstration