Interactive MIPS 32-bit Single Cycle Processor on FPGA

Motivation

Ever since I was a kid I wondered how computers worked. I had disassembled some old PC towers and was amazed by the myriad of wires and printed circuit boards but didn’t really grasp how they all came together to form a functioning computer. Fast forward a few years, I built my first computer and could appreciate each component (processor, ram, motherboard, etc.) and how they worked together on a very abstract level but one of my first questions was how a processor worked. From the outside it was no more a silver square and yet it could carry out almost all of the important operations in a computer but it remained a total mystery as to how it actually accomplished this. Even since then I have gained a fascination for processors and by extension computer chips of all kinds but it was not until I discovered FPGAs and HDLs that I realized I could finally understand how all digital systems work on a fundamental level but also that I could build my own. (I won’t delete all this but I honestly think it could be cut out

With so much extra time granted to time during the summer of 2020, I dedicated a few months to teaching myself Verilog and purchased an FPGA (specifically the FII-PRA006) started designing digital logic purely out of a passion for the subject. I wanted to have an “ultimate” goal to work towards for the summer and it had to be something that tied all the topics of basic digital systems together but also engaging and educational. Naturally, I fell back on my long time interest in computer chips and decided I wanted to create my own processor. Since the MIPS instruction set was still fresh on my mind from the previous semester and I admired its elegance I settled on recreating a 32-bit single cycle variant.

Design Overview

In its simplest form a processor can be described as a device which fetches instructions from memory, decodes the instructions and executes these instructions. This cycle continues (more or less) indefinitely and forms the basis of most processors.

The MIPS processor follows a similar structure so that large groupings of hardware can roughly correspond to this cycle as shown in the diagram below. In single cycle processor, only one instruction is executed in any given clock cycle. Most modern processors use pipelining where multiple steps can be stacked together in a single clock cycle and I do plan on investigating that in another project.

Individual blocks of sequential hardware, referred to as state elements and combinational logic blocks used to determine state transitions. The wires and multiplexers connecting the blocks together provide data paths the processor will use to move data from one block to another or perform some type of operation. My overall plan was to characterize, design and test each block of hardware separately and then once that was complete slowly build up the processor block by block until it was complete and fully functioning.

Instruction Memory

Instruction memory acts as a storage place where all of the instructions that the processor reside in a sequential fashion. It is similar RAM in the sense that it receives an address and returns the contents stored at that address and in many processor designs, the RAM and instruction memory are actually one in the same. For simplicity sake I will keep it as a separate in this implementation.

The verilog code is very simple and only require the declaration of a single memory array. Each word is has a width of 32-bits and same goes for the address. The “readmemb” statement initializes the instruction memory with a text file. The text is later used for entering a program that the processor will run. After the instruction memory is first initialized, it will act as read only and cannot be written to.

module instruction_memory(
	input[31:0] a,
	output[31:0] rd
);

	reg[31:0] i_mem[31:0];
	
	//Used to load program into instruction memory
	
	initial begin
		$readmemb("program.txt",i_mem);
	end
	
	
	assign rd = i_mem[a];
	
endmodule

Register File

The register file can be thought as “scratch pad” where temporary values, intermediates operations and results are stored. On a hardware level, the register file represents a portion of the processor’s on board memory. The register file can receive two addresses simultaneously and output two register contents. One register can also be written to, if the write enable high and an appropriate address and data is provided.

The verilog code is relatively simple and utilizes an always statement to make the block sensitive to the positive edge of clock. Notice that address zero is constantly initialized to zero, this forms the “zero” register which is useful when the processor is executing certain type of instructions.

module register_file(
	input clk,
	input reset,
	input we3,
	input v_f,
	input [4:0] a1 ,
	input [4:0] a2 ,
	input [4:0] a3 ,
	input [31:0] wd3 ,
	output [31:0] rd1 ,
	output [31:0] rd2 
);
	reg [31:0] r_mem[31:0];
	assign rd1 = r_mem[a1];
	assign rd2 = r_mem[a2];
	
	always@(posedge clk)begin
		//Creating $0 register
		r_mem[0] <= 32'b0;
		if(we3&v_f)begin
			r_mem[a3]<=wd3;
		end
	end
endmodule

Data Memory

Data memory is similar to instruction memory except it can be written to and requires a clock input. Data memory takes a single address and can only read or write in single clock cycle. The write enable must be high in order for writing to occur.

The implementation is similar to the register file but only contains a single address input for read or write. The “readmemb” is used to initialize data memory from a text file much like instruction memory.

module data_memory(
	input clk,
	input we,
	input v_f,
	input [31:0] a,
	input [31:0] wd,
	output [31:0] rd,
	output [31:0] out_zero
);
	reg [31:0] d_mem[31:0];
	
	assign rd = d_mem[a];
	assign out_zero = d_mem[0];
	
	always@(posedge clk)begin
		if(we&v_f)begin
			d_mem[a]<=wd;
		end
	end
	
	initial begin
		$readmemb("data_memory.txt",d_mem);
	end
endmodule

Sign Extender

Instructions read from the instruction memory may contain and address offsets (meant for addressing the data memory) which are 16-bits wide, however the data memory takes addresses that are 32-bits wide. Address offsets are added to a specific data memory address as a way to traverse the memory space. The offset can be positive or negative and therefore must conform to two’s complement up to 32-bits. The sign extender will take the most significant bit of the 16-bit value and repeat it’s value another 16 times in order to make a 32-bit offset.

Sign extension is achieved through replicating the most significant bit of input 16 times and them concatenating that with the initial 16-bits.

module sign_extend(
	input [15:0] instr,
	output [31:0] sign_imm
);
	assign sign_imm = {{16{instr[15]}},instr};
endmodule

Arithmetic Logic Unit (ALU)

The arithmetic logic unit or ALU for short can be thought of as the heart of the MIPS processor and performs all of the operations (addition, subtraction, and, or, comparisons, etc) that the instruction set requires. The ALU take two arguments as inputs and performs an operation on them depending on an ALU control signal. It will output the result along with specific “flags” if certain conditions are met, such as a zero result or overflow.

The verilog code uses a case statement to determine what operation to perform depending on the control signal. A zero flag is set high if the result happens to be zero.

module alu(
	input [2:0] alu_control,
	input [31:0] src_a,
	input [31:0] src_b,
	output reg [31:0] alu_result,
	output reg zero
);
	always@(*)begin
		case(alu_control)
			0:alu_result = src_a&src_b;
			1:alu_result = src_a|src_b;
			2:alu_result = src_a+src_b;
			4:alu_result = src_a&~src_b;
			5:alu_result = src_a|~src_b;
			6:alu_result = src_a-src_b;
			7:alu_result = ($signed(src_a)<$signed(src_b)) ? 1:0;
			default:alu_result = 32'b0;
		endcase
		
		if(alu_result == 32'b0)begin
			zero = 1'b1;
		end
		else begin
			zero = 1'b0;
		end
	end
endmodule

Control Unit

The control unit receive and opcode and funct signal from the instructions, decode that specific input and translate it into the appropriate control signals that orchestrate the entire processors. It is special purpose decoder designed around the different instructions a processor might receive. In this case, R-type instructions which involve registers as arguments, I-Type instructions which involve immediate values and branch instructions which manipulate the program counter. Inside the control unit is a smaller decoder dedicated to taking the funct input and translating it into an ALU control signal which the ALU will understand.

The control unit is implemented in verilog as a decoder with separate case statements for the larger opcode decoder and the smaller alu decoder.

module control_unit(
	input [5:0] opcode,
	input [5:0] funct,
	output reg mem_to_reg,
	output reg mem_write,
	output reg branch,
	output reg [2:0] alu_control,
	output reg alu_src,
	output reg reg_dst,
	output reg reg_write,
	output reg jump,
	output reg reg_src
);
	reg [1:0] alu_op;
	always@(*)begin
		case(opcode)
			6'b000000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1100_001000;
			6'b100011:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1010_010000;
			6'b101011:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0010_100000;
			6'b000100:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0001_000100;
			6'b001000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1010_000000;
			6'b000010:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0000_000010;
			6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;
			default:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b0000_000000;
		endcase
		casex(alu_op)
			2'b00:alu_control = 3'b010;
			2'bx1:alu_control = 3'b110;
			2'b1x:
			begin
				case(funct)
					6'b100000:alu_control = 3'b010;
					6'b100010:alu_control = 3'b110;
					6'b100100:alu_control = 3'b000;
					6'b100101:alu_control = 3'b001;
					6'b101010:alu_control = 3'b111;
					default:alu_control = 3'b000;
				endcase
			end
		endcase
	end
endmodule

Load Word Datapath

After verifying that all the major blocks of hardware that go into making processors were functioning properly, the next was to connect each block form complete data paths and continue adding blocks one by one until I arrived at the complete design.

The first data path to verify was the load word data path. It requires the program counter, instruction memory, register file, sign extender, alu, control unit and data memory. Once connected as shown in the diagram, the processor is able to read instructions sequentially and execute them. In the case with the available hardware and connections the processor is only able to perform the load word instruction where a 32-bit word from data memory is loaded into a register.

The verilog implementation mostly involved instantiating all the necessary hardware blocks and connecting them together. The program counter required a bit of extra code in order to increment every clock cycle.

module scp_1(
	input inclk,
	input ext_reset,
	output [31:0] out
);
	//Interconnects
	wire clk = inclk;
	wire reset = ext_reset;
	wire [31:0] pc_prime;
	wire [31:0] instr;
	wire [31:0] rd1;
	wire [31:0] rd2;
	wire [31:0] sign_imm;
	wire zero;
	wire [31:0] read_data;
	wire [31:0] alu_result;
	
	//Muxes
	
	//Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else begin
			pc<=pc_prime;
		end
	end
	
	//Next Instruction
	assign pc_prime = pc + 1;
	
	//Instruction Memory
	instruction_memory instruction_memory_0
	(
		pc,
		instr
	);
	
	//Control Unit
	wire mem_to_reg;
	wire mem_write;
	wire branch;
	wire [2:0] alu_control;
	wire alu_src;
	wire reg_dst;
	wire reg_write;
	control_unit control_unit_0
	(
		instr[31:26],
		instr[5:0],
		mem_to_reg,
		mem_write,
		branch,
		alu_control,
		alu_src,
		reg_dst,
		reg_write
	);
	
	
	//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		5'b0,
		instr[20:16],
		read_data,
		rd1,
		rd2
	);
	
	//Sign Extender
	sign_extend sign_extend_0
	(
		instr[15:0],
		sign_imm
	);
	
	//ALU
	alu alu_0
	(
		alu_control,
		rd1,
		sign_imm,
		alu_result,
		zero
	);
	
	//Data Memory
	data_memory data_memory_0
	(
	clk,
	mem_write,
	alu_result,
	32'b0,
	read_data
	);
	
	assign out = read_data;
endmodule

The program is written in MIPS assembly and then translated into machine by hand. It is three instructions which load some data from data memory into some registers. The data memory were initialized with some values ahead of time since the processor wasn’t capable of writing to the data memory on its own.

/*Assembly
lw $1,0($0)
lw $2,1($0)
lw $3,2($0)
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
//Data memory
00000000000000000000000000000001
00000000000000000000000000000101
00000000000000000000000000001001
//Register
00000000000000000000000000000000 
00000000000000000000000000000001
00000000000000000000000000000101 
00000000000000000000000000001001 

Save Word Datapath

The save word is quite similar to the load word datapath except the second register output of the register file is connected to the write data input of data memory. This allows data stored in the registers to be written to data memory.

A connection was made between the register file and the data memory address input.

register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		instr[20:16],
		instr[20:16],
		read_data,
		rd1,
		rd2
	);

The program starts the same as load word datapath but adds on three more instructions which copy the content from the registers into the next available locations in data memory.

/*Assembly
lw $1,0($0)
lw $2,1($0)
lw $3,2($0)
sw $1,3($0)
sw $2,4($0)
sw $3,5($0)
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
//Data memory
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001 
//Register File
00000000000000000000000000001001 
00000000000000000000000000000101
00000000000000000000000000000001 
00000000000000000000000000000000

R-Type Datapath

R-type instructions take two registers as arguments and store the output in a register. In order to make this possible some modifications to the data path were necessary. A multiplexer was placed between the register file’s write address and the instructions since the register address for r-type instructions are in a different bit location compared to other instructions. A multiplexer was also placed between the output of the register file and input b of the alu, so the processor as able to switch between operations involving a register and an immediate and only registers. Finally, another multiplexer allows the processor to choose where to write back to the registers via the data memory or the alu.

I decided a section in the code to these multiplexers and the future data routing blocks.

//Interconnects
        reg [4:0] write_reg;
	reg [31:0] src_b;
	reg [31:0] result;
	//Muxes
	always@(*)begin
		case(reg_dst)
			0:write_reg = instr[20:16];
			1:write_reg = instr[15:11];
		endcase
	end
	
	always@(*)begin
		case(alu_src)
			0:src_b = rd2;
			1:src_b = sign_imm;
		endcase
	end
	
	always@(*)begin
		case(mem_to_reg)
			0:result = alu_result;
			1:result = read_data;
		endcase
	end

//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		instr[25:21],
		instr[20:16],
		write_reg,
		result,
		rd1,
		rd2
	);

Here I used two add instructions and store the result somewhere else in the register file to verify the result.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
add $1,$2,$3
add $1,$3,$4
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00000000001000100001100000100000
00000000001000110010000000100000
//Data memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000111
00000000000000000000000000000110
00000000000000000000000000000101
00000000000000000000000000000001
00000000000000000000000000000000

Branch if equals Datapath

Branch if equals or BEQ for short allows the processor to compare if the values contained into two registers is equal and if so, to branch to another part of the instruction memory using a given offset. In order to determine if two registers are equal, the alu will subtract one from the other. If the result is zero they are equal and the zero flag goes high. This is ANDed with a branch signal from the control unit and is used to determine whether or not the next value to loaded into the program counter is simply pc+1 or pc+1+offset.

The result of the offsetted address is put onto an intermediate line and a multiplexer is used to determine the next program counter value.

wire [31:0] pc_branch;
reg [31:0] pc_next;
wire pc_src;

always@(*)begin
		case(pc_src)
			0:pc_next = pc_prime;
			1:pc_next = pc_branch;
		endcase
	end

        //Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else begin
			pc<=pc_next;
		end
	end

	//Branch offset
	assign pc_src = branch&zero;
	assign pc_branch = pc_prime + sign_imm;

There are two BEQ instructions with one meant to fail and the next one to succeed and branch. In this case the last BEQ instruction branches to a previous location in the instructions and the processor is see to be looping for the first time.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
add $1,$2,$3
add $1,$3,$4
beq $1,$3,-1
beq $2,$1,-5
*/
//Program
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00000000001000100001100000100000
00000000001000110010000000100000
10001100000000100000000000000000
00010000001000111111111111111111
00010000001000101111111111111011
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000011 
00000000000000000000000000000010 
00000000000000000000000000000001 
00000000000000000000000000000001 
00000000000000000000000000000000

Add immediate Datapath

Add immediate or addi for short is used to take an immediate value store in the instruction and write it to a specific register. This is useful for initializing registers to a known value. The processor was already capable of performing the addi instruction with the available hardware blocks and connections and only needs a modification to the control unit.

Another line in the case statement of the control unit was added in order to accommodate the addi instruction.

6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;

The program initializes register at address 1 with an immediate value.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000 
00000000000000000000000000001010
00000000000000000000000000000101 
00000000000000000000000000001001

Jump Datapath

The jump instruction takes a specific location in the instruction memory and will jump to that location unconditionally. A small amount of hardware is used to accomplish this and an additional control signal must added to the control unit. When a jump instruction is executed the jump will go high and the next value loaded into the program counter in guaranteed to be the immediate value.

A multiplexer controlled by the jump signal achieves the necessary priority switching between the next, branch and jump values for the program counter.

//Interconnects
reg [31:0] pc_next;
wire jump;

//Multiplexers
always@(*)begin
		case(jump)
			0:pc_inter = pc_next;
			1:pc_inter = instr[25:0];
		endcase
	end

//Program counter
	reg [31:0] pc;
	always@(posedge clk)begin
		if(reset)begin
			pc<=32'b0;
		end
		else if(v_f) begin
			pc<=pc_inter;
		end
	end

The program will jump back to the first instruction (address 0) and continue in an infinite loop.

/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
j 0
*/
//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
00001000000000000000000000000000
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000 
00000000000000000000000000001010
00000000000000000000000000000101 
00000000000000000000000000001001

Load Switch Datapath

Eventually I wanted to the processor to be fully interactive using the hardware on the FPGA, so I needed some way for the processor to read data from the outside world and use it. I decided to use the bank of eight dip switches as it made for a extremely easy 8-bit input. There are couple of options to implement i/o into these simple processors. One way is to connect i/o signals to sections of the register or memory as designate them specifically as i/o locations. While this approach is more realistic, it also uses up precious FPGA real estate especially considering the die area of the Cyclone 10 LP. Instead I created a brand new instruction that the processor can execute called load switch or lsw for short. It can load the current value of the dip switches as an 8-bit binary number and store it in a specified register. The only extra hardware required was another multiplexer to bypass the register file’s data input with the switch input and another control signal in the control unit.

Similar to the jump datapath a multiplexer is used and a separate control signal controls what the data input to the register file is. The switch input is simulated as a constant value in the test bench.

//Module
input [7:0] sw

//Interconnects
wire reg_src;
reg [31:0] reg_data_write;

//Multiplexer
always@(*)begin
		case(reg_src)
			0:reg_data_write = result;
			1:reg_data_write = sw;
		endcase
	end

//Register File
	register_file register_file_0
	(
		clk,
		reset,
		reg_write,
		v_f,
		instr[25:21],
		instr[20:16],
		write_reg,
		reg_data_write,
		rd1,
		rd2
	);

//Control Unit
	control_unit control_unit_0
	(
		instr[31:26],
		instr[5:0],
		mem_to_reg,
		mem_write,
		branch,
		alu_control,
		alu_src,
		reg_dst,
		reg_write,
		jump,
		reg_src
	);
//Control Unit
output reg reg_src

6'b010000:{reg_write,reg_dst,alu_src,branch,mem_write,mem_to_reg,alu_op,jump,reg_src} = 10'b1000_000001;
/*Assembly
lw $1, 0($0)
lw $2, 1($0)
lw $3, 2($0)
sw $1, 0($0)
sw $2, 1($0)
sw $3, 2($0)
addi $0, $1, 10
lsw $1
*/

//Machine Code
10001100000000010000000000000000
10001100000000100000000000000001
10001100000000110000000000000010
10101100000000010000000000000011
10101100000000100000000000000100
10101100000000110000000000000101
00100000000000010000000000001010
00001000000000010000000000000000
//Data Memory
00000000000000000000000000001001 
00000000000000000000000000000101 
00000000000000000000000000000001 
00000000000000000000000000001001
00000000000000000000000000000101 
00000000000000000000000000000001
//Register File
00000000000000000000000000000000
00000000000000000000000000011111 
00000000000000000000000000000101 
00000000000000000000000000001001 

Fibonacci Program

After the processor was completed I wanted to it to carry out a program which was simple enough that it could be easily verified and complicated enough that it utilized every block of hardware at least once. So I settled on the all time classic of a fibonacci calculator for that very reason with the added bonus of knowing what the expect result should be at all times. In words the program will calculate each value in the fibonacci sequence, save the current fibonacci number to data memory, stop at the Nth fibonacci number as determined by the dip switches and then repeat infinitely. The lsw instruction was important here as it allows for some level of interactivity.

Addi $4,$0,1 //Initialize registers
Addi $1,$0,0 
Addi $2,$0,1

Add $3,$2,$1 //Calculate next fibonacci number
Add $1,$2,$0
Add $2,$3,$0

Sw $3, 0($0)//Save to data memory

Lsw $5 //Update current switch value
Slt $6,$4,$5 //Check if counter is less than
 switch
Addi $4,$4,1
Beq $6,$0,1 //If counter >= switch, stop

J 3 //Inner Loop
J 0 //Outer Loop

//Machine Code
00100000000001000000000000000001
00100000000000010000000000000000
00100000000000100000000000000001
00000000001000100001100000100000
00000000000000100000100000100000
00000000000000110001000000100000
10101100000000110000000000000000
01000000000001010000000000000000
00000000100001010011000000101010
00100000100001000000000000000001
00010000000001100000000000000001
00001000000000000000000000000011
00001000000000000000000000000000

Interface Overview

As mentioned previously I wanted this processor to be fully interactive, so not only could you see what happening as it’s executing instructions but you can also see use the dip switches as a way to dynamic change the input. The idea is to use the dip switches as a number input and the four push buttons to navigate a “menu” showing each signal inside the processor and adjust its clock speed. Left and right would move through a circular list of different signals and up and down would either double or half the clock speed. Although modern processors can reach into the gigahertz range, I wanted to this project to be more about seeing each part of the processor at a fundamental level. For that reason the clock will range between about 0.25Hz and 32Hz, so it’s slow enough the human eye can read each instruction individually.

The processor itself will become a hardware block alongside other blocks used to constructing the interface. I find it best to take a problem like this and break it down into a series of simpler problems rather than one giant one. Very briefly, the processor has several signal “taps” that are translated into hexadecimal and then pushed to a seven segment display. The output menu allows selection of each signal and the frequency divider allows adjustment of the processor’s speed. The system is controlled by a system clock provided by a phase locked loop (PLL).

Output Menu

The output menu will select an input signal from one of its inputs and depending on its current state will output the selected input signal. The menu is “circular” such that progressing to the last menu option and once more will return back to the first option. Similarly, moving to the previous menu option on the first screen will move to the last option. To achieve this, a register is set to remember the current menu position and holds the select signal for a multiplexer. An adder is wired up such that whenever the enable of the register is high and the clock is on the positive edge, an incremented value will be stored. In effect, the enable line is able to cycle through each menu position. In this case the register is N bits wide and once the register reaches the value 2^N-1 it will roll over back to zero, thus by taking advantage of the properties of binary numbers, the menu can be made circular. They only caveat is that the number of menu items must be a power of 2.

module output_menu(
	input clk,
	input rst,
	input [1:0] button,
	input [23:0] in0,
	input [23:0] in1,
	input [23:0] in2,
	input [23:0] in3,
	input [23:0] in4,
	input [23:0] in5,
	input [23:0] in6,
	input [23:0] in7,
	output reg[23:0] out
);
	reg [2:0] position;
	always@(posedge clk)begin
		if(rst)begin
			position<=3'b0;
		end
		else if(button[0]&~button[1])begin
			position<=position - 1'b1;
		end
		else if(button[1]&~button[0])begin
			position<=position + 1'b1;
		end
	end
	
	always@(*)begin
		case(position)
			0:out = in0;
			1:out = in1;
			2:out = in2;
			3:out = in3;
			4:out = in4;
			5:out = in5;
			6:out = in6;
			7:out = in7;
		endcase
	end
endmodule

Frequency Divider

The frequency divider will take the system clock and output lower frequency flags that are required for other applications. For example the seven segment display uses both the system clock and a 1KHz (1ms period) flag to cycle through each digit. Since I want a processor to have variable speed capability, a separate variable flag is needed which can go faster or slower based on a button input. The frequency divider uses a system of cascading counters starting at the microsecond range and going down to some length of time longer than a millisecond. Each successive counter feeds off the last by counting the flags of the previous counter. Splitting up the counters like this makes for a faster circuit overall, since a large counter rolling over to zero as the signal has to propagate through more flip flops before it’s done.

The variable flag is generated by a variable counter which have a similar structure to a typical counter (like the one used for the millisecond flag) but an external signal which bit shift a register left or right and thus change the amount of time it takes for a flag to be pulsed. Shifting left doubles the time it takes for the counter to roll over making the flag pulse slower. Shifting right halves the time it take for the counter to roll over making the flag pulse faster. This variable flag is then connected to the enable of all the flip flops in the processor, allowing for adjustable speed control in powers of two. By default, the variable flag starts at a moderate frequency of 2Hz and can reach a maximum frequency 1024x that amount and a minimum frequency of 1/8x that amount.

module f_div(
	input clk,
	input rst,
	input [1:0] button,
	output reg m_f,
	output reg v_f,
	output reg dp_f
);
	reg [15:0]us_counter;
	reg [15:0]ms_counter;
	reg [31:0]vf_counter;
	reg [31:0]dp_counter;
	reg [31:0]v_max;
	reg u_f;
	
	always@(posedge clk)begin
		if(rst)begin
			us_counter<=0;
			u_f<=0;
		end
		else begin
			u_f<=0;
			if(us_counter == 49)begin
				us_counter<=0;
				u_f<=1;
			end
			else begin
				us_counter<=us_counter+1;
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			ms_counter<=0;
			m_f<=0;
		end
		else begin
			m_f<=0;
			if(u_f)begin
				if(ms_counter==999)begin
					ms_counter<=0;
					m_f<=1;
				end
				else begin
					ms_counter<=ms_counter+1;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			vf_counter<=0;
			v_f<=0;
		end
		else begin
			v_f<=0;
			if(u_f)begin
				if(vf_counter >= v_max)begin
					vf_counter<=0;
					v_f<=1;
				end
				else begin
					vf_counter<=vf_counter+1;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			dp_counter<=0;
			dp_f<=0;
		end
		else begin
			dp_f<=0;
			if(u_f)begin
				if(dp_counter >= v_max)begin
					dp_counter<=0;
					dp_f<=1;
				end
				else begin
					dp_counter<=dp_counter+2;
				end
			end
		end
	end
	
	always@(posedge clk)begin
		if(rst)begin
			v_max<=32'h0008_0000;
		end
		else if(button[1]&~button[0])begin
			if(v_max < 32'h0400_0000)begin
				v_max <= {v_max[30:0],v_max[31]};
			end
		end
		else if(button[0]&~button[1])begin
			if(v_max > 32'h0000_0200)begin
				v_max <= {v_max[0],v_max[31:1]};
			end
		end
	end
endmodule

Button Debounce

Buttons are mechanical devices and are therefore subject to less than ideal behavior in high speed digital systems. When a button is pressed it appears as though the transition from off to on is smooth and instantaneous but in fact the button makes and breaks contact several times before coming to equilibrium. The same is true when a button is released, the contacts will spend some time making and breaking the circuit before completely stabilizing. This is undesirable as each “bounce” could potentially be registered as an unintentional button press and the button will have an unpredictable behavior.

Instead, most electronic devices that require a button input will employ some kind of debouncing algorithm (in hardware or software) that removes the extraneous bounces. I used a state machine based solution which will detect the initial button press and then wait 10ms for the button to come to equilibrium and then detect the button release and wait another 10ms until it has reached idle again. If at any point the button transitions during the 10ms timer, the timer will reset and guarantees any bounces are ignored. The button debounce will signal high that the button has been successfully pressed for one clock cycle.

The verilog implementation uses a case statement to determine the next state of the system and the millisecond flag from the frequency divider is used for the timer.

module debounce(
	input clk,
	input rst,
	input m_f,
	input button,
	output reg db_button
);
	reg [1:0] state;
	reg [3:0] counter;
	always@(posedge clk)begin
		if(rst)begin
			state<=0;
			counter<=0;
			db_button<=0;
		end
		else begin
			case(state)
				0:
				begin
					counter<=0;
					db_button<=0;
					if(button)begin
						state<=1;
					end
					else begin
						state<=0;
					end
				end
				1:
				begin
					if(~button)begin
						state<=0;
					end
					else begin
						if(counter == 9)begin
							state<=2;
							counter<=0;
						end
						else if(m_f)begin
							counter<=counter+1;
						end
					end
				end
				2:
				begin
					counter<=0;
					if(~button)begin
						state<=3;
					end
					else begin
						state<=2;
					end
				end
				3:
				begin
					if(button)begin
						state<=2;
					end
					else begin
						if(counter == 9)begin
							state<=0;
							counter<=0;
							db_button<=1;
						end
						else if(m_f)begin
							counter<=counter+1;
						end
					end
				end
			endcase
		end
	end
endmodule

Display Driver

The display driver with translate the binary input into something the seven segment display understands and also management which digits to light up when. The simple solution is to wire each segment of the 6 digit display to a single pin of the FPGA, but that require 48 separate traces and 48 pins decided just to the display. This is not efficient, so instead most systems that have a seven segment display use multiplexing to reduce the amount of hardware and i/o required. Every digit’s segments are wired together and the FPGA will choose which digit to illuminate using a select line. Only one digit is on at any given time but if they are cycled through fast enough the persistence of vision gives the illusion of a continuous display.

A state machine will sequentially flash each digit using a 6-bit select line while simultaneously selecting what to display from a hexadecimal to seven segment display decoder. The state machine transitions states every millisecond flag, giving the display a refresh rate of ~167Hz.

A separate decimal point or dp flag was added in later revisions as a way to indicate the status of the variable flag. The dp flag runs at twice the frequency, so as to simulate the rising and falling edge of a clock if it were to be running that fast.

module display_driver(
	input clk,
	input m_f,
	input dp_f,
	input rst,
	input [23:0] hex,
	output reg [7:0] digit,
	output reg [5:0] select
);
	reg [3:0] state;
	reg [3:0] decode;
	always@(posedge clk)begin
		if(rst)begin
			state<=0;
			decode<=0;
			select<=6'b111111;
		end
		else begin
			case(state)
				0:
				begin
					if(m_f)begin
						state<=1;
					end
					decode<=hex[3:0];
					digit <= {dp,num};
					select<=6'b111110;
				end
				1:
				begin
					if(m_f)begin
						state<=2;
					end
					decode<=hex[7:4];
					digit <= {1'b1,num};
					select<=6'b111101;
				end
				2:
				begin
					if(m_f)begin
						state<=3;
					end
					decode<=hex[11:8];
					digit <= {1'b1,num};
					select<=6'b111011;
				end
				3:
				begin
					if(m_f)begin
						state<=4;
					end
					decode<=hex[15:12];
					digit <= {1'b1,num};
					select<=6'b110111;
				end
				4:
				begin
					if(m_f)begin
						state<=5;
					end
					decode<=hex[19:16];
					digit <= {1'b1,num};
					select<=6'b101111;
				end
				5:
				begin
					if(m_f)begin
						state<=0;
					end
					decode<=hex[23:20];
					digit <= {1'b1,num};
					select<=6'b011111;
				end
			endcase
		end
	end
	
	always @ (*)begin
		case (decode)
			0 : num <= 7'b100_0000;
			1 : num <= 7'b111_1001;
			2 : num <= 7'b010_0100;
			3 : num <= 7'b011_0000;
			4 : num <= 7'b001_1001;
			5 : num <= 7'b001_0010;
			6 : num <= 7'b000_0010;
			7 : num <= 7'b111_1000;
			8 : num <= 7'b000_0000;
			9 : num <= 7'b001_0000;
			10 : num <= 7'b000_1000;
			11 : num <= 7'b000_0011;
			12 : num <= 7'b100_0110;
			13 : num <= 7'b010_0001;
			14 : num <= 7'b000_0110;
			15 : num <= 7'b000_1110;
			default : num <= 7'b1000_0000;
		endcase
	end
	
	reg dp;
	reg [6:0] num;
	
	always@(posedge clk)begin
		if(rst)begin
			dp<=1'b0;
		end
		else if(dp_f)begin
			dp<=~dp;
		end
	end
	
endmodule

System Control

The system control contains a phase locked loop (PLL) to provide a stable system clock and a system reset to bring all hardware blocks into a known state. The PLL takes time to stabilize and will signal high on the “locked” line once it has done so. At that time that system reset will be brought low and the FPGA will be responsive to the system clock.

module system_control(
	input clk,
	output sys_clk,
	output reg sys_rst
);
	wire locked;
	
	PLL1 PLL_inst
	(
		.areset(0),
		.inclk0(clk),
		.c0(sys_clk),
		.locked(locked)
	);
	
	always@(posedge sys_clk)begin
		sys_rst<=~locked;
	end
endmodule

Overall Implementation

The following block diagram is summary of how each hardware block comes together to form the complete system. Button presses and switch positions are taken as input, the processor decides what to do and the result is pushed to the seven segment displays.

module top_module(
	input [7:0] sw,
	input [3:0] button,
	input clk,
	input rst,
	output [7:0] digit,
	output [5:0] select
);
	wire sys_clk;
	wire sys_rst;

	wire [23:0] out_zero;
	wire [23:0] control_signal;
	wire [23:0] instr_lower;
	wire [23:0] instr_upper;
	wire [23:0] alu;
	wire [23:0] program_counter;
	wire [23:0] register_1;
	wire [23:0] register_2;
	
	wire [23:0] out;

	wire m_f;
	wire v_f;
	wire dp_f;
	
	wire b_up;
	wire b_down;
	wire b_left;
	wire b_right;
	
	system_control system_control_inst
	(
		.clk(clk),
		.sys_clk(sys_clk),
		.sys_rst(sys_rst)
	);
	
	 scp_1 scp_1_inst
	 (
		.inclk(sys_clk),
		.v_f(v_f),
		.ext_reset(sys_rst),
		.sw(sw),
		.out_zero(out_zero),
		.control_signal(control_signal),
		.instr_lower(instr_lower),
		.instr_upper(instr_upper),
		.alu(alu),
		.program_counter(program_counter),
		.register_1(register_1),
		.register_2(register_2)
	);
	
	f_div f_div_inst
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.button({b_down,b_up}),
		.m_f(m_f),
		.v_f(v_f),
		.dp_f(dp_f)
	);
	
	output_menu output_menu_inst
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.button({b_right,b_left}),
		.in0(out_zero),
		.in1(control_signal),
		.in2(instr_lower),
		.in3(instr_upper),
		.in4(alu),
		.in5(program_counter),
		.in6(register_1),
		.in7(register_2),
		.out(out)
	);
	
	
	display_driver display_driver_inst
	(
		.clk(sys_clk),
		.m_f(m_f),
		.dp_f(dp_f),
		.rst(sys_rst),
		.hex(out),
		.digit(digit),
		.select(select)
	);
	
	
	debounce up
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[0]),
		.db_button(b_up)
	);
	
	debounce down
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[1]),
		.db_button(b_down)
	);
	
	debounce left
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[2]),
		.db_button(b_left)
	);
	
	debounce right
	(
		.clk(sys_clk),
		.rst(sys_rst),
		.m_f(m_f),
		.button(button[3]),
		.db_button(b_right)
	);
endmodule

It was determined mathematically that the largest fibonacci number that could display was 9,227,465 which corresponds to the 35th number in the sequence or 100011 on the dip switches.

Performance Analysis

After compiling the final version of the design, it used 1,128 logic elements out of a total of 6,272 and contained 468 registers.

In the early stages of the processor design itself, the maximum frequency or fmax for a Slow 1200 mV 85C model was ~200 Mhz. Once all the components were added the fmax dropped to ~70 Mhz and once the processor was incorporated in the final design, the fmax was 56.49 MHz. I did not do a detailed timing analysis of the design but what I can observe is that in general, as more components are added the slower the final design will be.

Demonstration

Password System on Cortex-M3 Dev Board

Motivation

Through EiE, I have been exploring embedded systems, specifically the SAM3U2 CORTEX-M3.

“(EiE) is an extracurricular initiative at post-secondary engineering schools that delivers dynamic and relevant hands-on embedded systems training from an industry perspective. 

I had mastered the basic functionality of the board and the microcontroller, and was given the choice of taking on an assignment that would be used to test the skills I had acquired thus far. I chose to complete a password system. The goal of the project is to create a system which can recognize a correct password through the onboard push buttons and light up the appropriate LEDs to indicate its current status. The password also needed to be reprogrammable up to 10 digits by using a button combination.

Design

Below is a diagram I devised illustrating how the password system should function.

Implementation

The first step was to create a place to store the password entered by the user both for unlocking attempts and for programming a new password. Here, three arrays are used to accomplish the task. One will hold the password itself, and another will act as an input buffer for button-presses. The final array will act as a temporary storage space for a newly programmed password before it is verified.

If a user enters a password, the resulting button input is stored in the buffer and checked up against the actual password before unlocking. Otherwise, a warning message is flashed and the buffer is cleared. If a user enters the code unlocking the programming state, the candidate password is monitored for length and copied into the password array. The result of these array comparisons determines the current state machine function the system is running and what state machine function to transition to next.

After the core implementation was complete, more specific LED indicators, ASCII display messages, and unique buzzer frequencies were added to each state of the password system.

Demonstration

Here is a video demonstration walking through the proper functionality of the password system. The source code is available on Github as well.

Avionics Power Management Board

Motivation

In the Andromeda III rocket, there are many challenges and extremes in terms of its power design. From the low power used for the avionics package to the high current draw of the main engine solenoids, many considerations have to be made in order to create a system that can power the wide range of components on board. This is particularly true during the launch procedure, where the rocket must seamlessly switch its power source from the ground system to the onboard battery supply.

Attempts have been made in past to create a circuit that can gracefully switch and closely monitor these different power supplies but success has been limited by the constants of problem. First of all, any significant drop out in power while switching would prove disastrous for the flight computer as much of its launch-ready capabality relies upon volatile memory. The other challenge is the large amount of current such a switching device would have to pass in order to properly energize the main engine solenoids, which are responsible for controlling ignition of the engine. Earlier boards used electromechanical relays controlled by a small integrated circuit but such boards ultimately failed to cope with the large amount of heat generated by high current loads. (honestly idk how to make this a suggestion other than putting it here but i think this paragraph is unnecessary because it’s just talking about how hard it is to create this thing without giving any useful information relating to how it was actually made)

During the launch, an umbilical cord from the ground system powers the rocket. Its use also conserves power while the rocket is on the launchpad. The umbilical cord is then released from the rocket and, ideally, it should switch seamlessly from the ground system to the battery. It should also switch seamlessly from the primary battery to a number of backup batteries in case of a system failure. The ideal behaviour of the rocket is summarized below.

Circuit Design

Several mechanisms to facilitate switching and voltage sensing were considered while planning the rocket. Single Pole Double Throw (SPDT) relays were familiar, but quite slow, with the fastest available models clocking in at around 100 microseconds. Considering the high current draw and tight voltage tolerances, even a 100-microsecond drop out would require ten of thousands worth of capacitance to compensate, which was far too much added weight. Relays also used a great amount of current to keep circuits closed, a function more useful in other parts of the rocket. Devices like MOSFETs were found to be a superior option as they require very minimal current and can switch rapidly.

The next proposed design was a diode ORing circuit. As the name suggests, a diode ORing circuit functions much like a logical OR gate, but instead will OR two power sources of similar voltage level into a single power source from the perspective of the load. The diode’s cutoff voltages could be selected such that one power source has priority over the other; in this case, the ground system receives higher priority. The diode ORing looked promising, but concerns surrounding the tolerances of high power diodes and the lack of versatility with a passive circuit meant this solution was far from perfect.

After consulting a professor at the University of Calgary who specializes in sensors, biophoniotics, and VLSI, the use of a power multiplexer was considered. The multiplexer outputs a single power source selected from a multitude of available power sources based on a select signal.

Texas instruments makes a line of priority power muxes dubbed the “TPS” series and after doing a bit of research, the TPS2121 fit all the requirements.

Although the TPS2121 functioned as 2 to 1 multiplexer for power, a single IC was not enough to accommodate all possible power sources for the rocket. Instead, two TPS2121s were strung together, effectively expanding the number of ports.

The last step before building any kind of test circuit was determining the required component values. TI’s power muxes are programmed for priority based on the voltage certain pins will receive, which can be calculated via formulas available on the datasheet and then implemented using simple resistor dividers. To aid in the design process of the circuit, a calculator was created in Excel and several scenarios were run before settling on the final values.

Evaluation of TPS2121

Actually using the TPS2121 once it arrived proved to be extremely difficult, as the package itself was extremely small and had metal pads that were impossible to solder using any kind of conventional iron. Instead, a “breakout” board was designed to act as a converter between the minuscule QFN form factor and the breadboard-friendly 0.1″ pitch header pins. Extra vias were placed in pads that carried power in order to increase heat dissipation and trace width.

In order to get a feel for programming a device of this complexity, the circuit found in the “typical applications” section of the datasheet was recreated.

The switching speed was slow, at around 85 microseconds but the switching voltage was recorded at 7.94V versus 7.6V, which is within the tolerance of the resistors being used. After experimenting with the slew rate pin, the switching time was greatly reduced and shortened to 5 microseconds.

Next, the values of the components determined in our calculator were substituted into the circuit and a simulated load was added. Despite the use of 5% tolerance resistors, the measurement results were much closer to the desired result than expected.

Evaluation of Dual TPS2121

(Coming soon!)

PCB Design and Assembly

(Coming soon!)

Final Test

(Coming soon!)

Planar Magnetic Speaker

Motivation

I am a serious audio enthusiast. I have invested in endless hours in to the pursuit of the ‘perfect’ audio system. A good system requires constant tweaking, audition of equipment, measurements, and most importantly listening tests. If you want a taste of what my system of capable visit my other channel:

https://www.youtube.com/channel/UCDc4YhefZPk6IwYizpWqYRA.

There are many videos where I demo different turntable configurations with high quality analog recordings.

One of the aspects that makes for a great audio system is utilizing the right technologies for doing the right thing. This is especially true with transducers where a signal represented in one form of energy is transformed into another form of energy. This obvious example is a loudspeaker where an electrical signal containing variations in voltage level is converted into an acoustical signal containing variations in air pressure level. In an ideal world, the conversion would be perfect and every nuance of the original electrical waveform would be transferred into the air. Unfortunately, that is far from reality. Conventional loudspeaker drivers contain a voice coil, seated around a permanent magnet. When current passes through the voice coil a magnetic field is produced which interacts with the magnetic field of the permanent magnet and thus causes the diaphragm to vibrate. This produces the sound.

This has several disadvantages. The high mass of the driver means the system is slow to response to transients and any overhang due to its inertia results in distortion of the original signal. The voice coil forms an inductor and it means the load is difficult to drive due to its reactive nature. A loudspeaker of this type projects sounds in (more or less) spherical wavefronts, so the sound intensity falls of in accordance with the inverse square law. This may or may not be a disadvantage depending the on the use case. In a serious listening environment, this is a disadvantage as amplifier power is wasted on sound that is not reaching the listener’s ears and causing spurious acoustic reflections around the room.

To solve this, a driver with a planar form factor can be used. Planer drivers are rare and haven’t really been implemented in many commercial products aside from boutique audiophile speakers and headphones. I hope to change that by showing they can be constructed at home. Drivers are inheritedly low mass, have little to no inductance and as the name suggests, project sound in plane wave fronts. This addresses all the shortcoming of a conventional loudspeaker with only introducing a few small disadvantages.

Broadly speaking, there are two kinds of planar designs: magnetostatic and electrostatic drivers. I won’t bore you with the messy details but what I will say is that magnetostatic drivers have several advantages over their electrostatic counterparts. I chose to build a magnetostatic type for this project.

Driver Design

Planar magnetic drivers consist of a thin diaphragm tensioned and suspended over an array of magnets. Conductive material is attached to the diaphragm in such a way that when a current is passed through it, the conductor will move the presence of magnetic field and in turn move the diaphragm.

The planar design works on the principle of the lorentz force (F = IxB*L) and thus a conductor which ‘snakes’ across the diagram must use an array of magnets. These magnets alternate in polarity such that the current in the conductor, which generates the force, all work in the same direction.

After I worked out the basic design, I had to consider the performance specifications I wanted to meet. Typically, bookshelf speakers have a frequency response of 100Hz – 20kHz (±3dB) and can reach a maximum output of level of 105dB. I thought these were good targets to aim for and provided a guide to determine the details of the driver design. I settled on an overall speaker size of 12″ by 12″. Knowing the active area of the driver and target specification gave me enough information to flesh out the rest of the design.

The first consideration was maximum driver excursion as this determines how loud a speaker can play before the diaphragm starts hitting the magnetic array and distorting. The maximum excursion or Xmax for short, occurs at the low frequency and at the maximum desired output level. Plugging those numbers into an online calculator yields about 1 mm.

This means a gap of at least 1 mm will be needed between the diaphragm and the magnet array for proper operation. However, there is trade off to make here because as the conductor moves further away from the magnetic field, the speaker is less efficient but the extra space allows for low frequencies at higher amplitudes. For this first prototype I felt 1 mm was a good comprise as the magnetic fields are unusably weak beyond about 2 mm.

The entire frequency range is difficult to reproduce in a single driver, even for a low mass planar magnetic driver, so often speaker manufacturers will split up the frequency spectrum over multiple drivers. About 75% of the driver area is dedicated to bass/midrange frequencies (100 Hz~1000 Hz) and 25% to upper midrange and high frequencies (1000 Hz to 20000 Hz). The result is woofer/tweeter combination on a single diaphragm. The frequencies have longer wavelengths and are better suited to wires that are farther apather and vice versa. The difference in area is due to increased area requirement to maintain the same sound pressure level at lower frequencies. As a result, tweeters tend to be far more efficient than woofers. This is addressed later.

Driver Construction

Before any of the electronics can be designed, the driver mustbe constructed and characterized. The first step was building the magnet array. The magnets are the rubber type used to seal fridge doors. These magnets are cut to length and epoxied onto a piece of perforated steel. Perforated steel was chosen to allow the pressure between the diagram and the magnet array to be relieved. The 5mm wide strips are for the woofer section and the 3mm wide strips are for the tweeter section.

After completing the entire magnet array, a corrugated plastic spacer was adhered to the steel. It will support diaphragm and provide the required gap between the mylar and the magnets.

A piece of 12 micron thick mylar was stretched over the entire driver. The tension of the mylar was verified by measuring its resonant frequency. Typically, any sort of driver with a suspension or membrane will have a natural resonant frequency that will act as a mechanical high pass filter with a cutoff frequency close to that of the resonant frequency. In this case, a resonant frequency of around 100 Hz is desired and determines the correct amount of tension.

The most difficult step of the project proved to be the gluing of necessary wires to the diaphragm. The wire used is enameled aluminum wire in both 0.315 mm and 0.172 mm thicknesses. Aluminum was chosen due to its low weight and relatively high electrical conductivity. The thicker green wire is used for the woofer to allow high current handling and the thinner yellow wire is used for the tweeter to minimize the amount of weight. The woofer required three passes of wire in order to hit ~4Ω and the tweeter required two passes to hit ~3.5Ω. 4Ω is required to prevent too much current reaching the speaker. The adhesive is a latex-based epoxy that remains pliable after curing. Shown below is one of the unsuccessful attempts at the gluing the wires.

Crossover

Now that the driver was completed, the next part of the project was creating a passive analog filter called a crossover. A crossover performs two very important tasks in a loudspeaker. The first is splitting up signal coming from the amplifier into frequency bands corresponding to the woofer and the tweeter. The second is providing a small amount of equalization and level-matching so the speaker can maintain a smooth frequency response. The driver’s frequency response was measured using a UMIK-1 calibrated measurement microphone and a piece of software called RoomEQWizard (REW.) The woofer and the tweeter were measured separately. The woofer’s frequency response is in red and the tweeter’s response is in blue.

Every crossover circuit has a crossover frequency wherein the woofer hands off the rest of the frequency spectrum to the tweeter. Ideally, the crossover frequency should lie in a region where both drivers are working optimally so the transition between them is less obvious to the ear. The graph shows below demonstrates that the crossover frequency should be somewhere between 800 Hz and 2000 Hz. After simulating this transition in VituixCAD, the flattest frequency response was found to be achieved at a crossover frequency of 900 Hz.

In essence, a simple crossover is a low pass filter whose output is connected to the woofer and a high pass filter whose output is connected to the tweeter. Put together, the speaker is able to output the full frequency range it was designed for. Combining the responses for the woofer/tweeter using this crossover design yields a smooth frequency response. The graph shown is the output from the simulation and was verified with real-world measurements later.

A circuit diagram was then sketched based on the component values given by VituixCAD. A 3rd order Bessel filter was chosen as the phase response because it is very linear, thus preserving the waveform of the original electrical signal. The Bessel diagram is shown below.

The circuit was soldered together on perfboard with large gold-plated taps serving as connection points for the amplifier, woofer, and tweeter. The inductors were placed at right angles to each other in order to minimize interference.

With the crossover finished, the entire speaker was now complete. A final frequency response measurement confirmed that the simulation matched up very closely with actual speaker performance.

Listening test

Compared to a conventional speaker, listening to the planar magnetic speaker yielded a drastically different audio experience. The first thing that is immediately apparent is the highly directional nature of the sound, especially in the high frequencies but obvious down into the bass frequencies as well. Directly facing the speaker, the sound is loud, clear, and crisp, but listening even a few degrees off-axis causes the perceived loudness to drop off. It is akin to a sonic laser beam rather than the conventional sound of a highly diffused wavefront. The music itself is rendered very precisely and it makes a conventional speaker sound sluggish in comparison. The only drawback was the apparent lack of deep bass and impact, but not much can be expected from a speaker of this size. Below is a sound demo to illustrate the speaker’s sound profile.

Future improvements

To summarize, the final measured specifications of the speaker are as follows:

  • Frequency response: 200 Hz – 14 kHz (±3dB)
  • Maximum output: 105dB @ 200 Hz
  • Crossover frequency: 900Hz

Unfortunately, this falls short of the target response on both ends. The 200 Hz frequency response is likely due to a resonance between the overall speaker housing as some strange peaking occurs around 150 Hz. However, this is not to discount its output below 150 Hz; tones are audible through to 80 Hz. The tweeter falls short of 20 kHz, likely due to the high mass of glue used to attach the wires and a lack of field strength from the magnet array. However, this does not prevent the speaker from generating excellent sound within its range capacity.

In the future, I would like to investigate switching to a DSP designed crossover, integrated a subwoofer for improved bass extension, and a ribbon tweeter for improved high-frequency performance.