**A Simple Memory-Mapped Interface** **Home Page** In this lab we'll look at making a pretty simple Memory-Mapped piece of IP on the PL and then access it from Python (the PS). This will be a "toy example" so don't expect crazy good speedups or something, but it should demonstrate the overall pipeline and how we can build some simple Verilog which will then get implemented in the PL. We will wrap this module up so that it becomes AXI-interfaceable IP, which means we can then link it conveniently and scalably to our PS. Getting Started ======================= So, I'm assuming you already did the first lab found HERE. If you didn't, make sure you do since it does a lot more step-by-step pieces that we skip/assume here. Start by making a new project targeted for your PYNQ-Z2 board. When setting it up, make sure that you specify the Board File rather than just choose the correct chip. It starts to matter in this lab (as opposed to the last lab). If you're in the 6th floor digital lab, the files will be there already. If you're working on your own machine, if you did a standard Vivado install, the files (found here) will need to be placed **`/opt/Xilinx/Vivado/2018.2/data/boards/board_files/`** or the Windows equivalent location, and then you'll need to restart Vivado. Once you're in your new project, create a new block diagram like before, and add a ZYNQ Processing System. Run the connection automation right away. !!! warning Remember to allow it to run the self-automation on the Zynq core!! This configures the clock frequency among other important things. Set up for AXI ======================= What we want to do is create a memory-mapped interface that we can access from our Python environment. This will be a more data-friendly means of interfacing to a potential hardware accelerator from the Python environemtn. For this lab we will utilize one of General Purpose AXI Interfaces we've heard about in Lecture 2B from this past week. Specifically we'll use one of the PS Master AXI Ports. ![](./resources/interface.png width="300px") Chances are your Zynq processing system IP has already included that, but if it didn't open up your IP and then under the **`PS/PL`** tab, and under **`AXI Non Secure Enablement`**, ensure that a General Purpose AXI Master Interface (Pick GP0), is activated. ![](./resources/zynq_gp_maxi_enable.png) Afterwards (or maybe already automatically), your block diagram should look like the following: ![](./resources/zynq_post_setup.png) Create a Piece of AXI IP ======================== Now we need to build some IP that will interface with and AXI interface! Do the following: 1. Go up to **`Tools > Create and Package New IP`** 1. Click **`Next`** 1. Select **`Create AXI4 Peripheral`** 1. Call it whatever you want at the next window, and feel free to position the IP you're about to write anywhere in your file structure. I usually keep a pile in my home directory under a folder called `joes_ip`, but that's just me. 1. In the next window that comes up it will show you a default module that has a single AXI4-LITE Slave Interface. That's in fact what we want here. In fact we'll keep all the defaults as they are: * The data width will stay at 32 bits (we can't change this actually since AXI-Lite requires 32 bits) * We'll keep the number of registers in the memory map at four, though we could certainly add more if we wanted to. 1. Click **`Next`** 1. At the next window we're going to want to immediately go in and start editing our IP so click on **`Edit IP`** which will now take us to our IP editor. !!! Tip After you create your IP or if you ever need to edit it in the future, always feel free to right/control click on it in the block diagram view and then select **`Edit in IP Packager`**! Once the IP editor is up, navigate into the sources menu and you will see a nested set of two Verilog files which represent your default AXI4-Lite Wrapper fitted to our specifications. Double click on the inner file (the outer one calls an instance of the inner one...the inner one is where the good stuff is at.) You'll note my IP is called **`joe_5`** below. ![](./resources/axi_where_to_click.png) This Verilog file, believe it or not, actually takes care of all the AXI-LITE timing and state machine handling for us! Our goal is to create a module that takes a simple set of three numbers, which we'll call $a$, $b$, and $c$, and then performs the function that will return $d$ where $d = 3+ c^2\left(a+b\right)$. This is of course a pretty arbitrary mathematical operation, but it would be cool to see if we can do it in hardware. From an input-to-output perspective we're going to want the following memory mapped interface. As we've requested it from Vivado in creating the IP, our module can be thought of as an AXI interfaced spot with four read/write-accessible memory locations like shown in the image below: ![](./resources/memory_map.png) Our $a$, $b$, and $c$ values will be written to relative memory locations `0x00`, `0x04`, and `0x08`, while our answer, $d$, will appear at `0x0C`. Let's add in our math. For the sake of this lab, we'll provide you the Verilog that does this here. We *pipelined* the calculation, meaning we calculate the squaring of $c$ on one clock cycle and then the rest of the equation on the next clock cycle. You can see that with the non-blocking statements below. We may or may not need to do that, but I did it just to be safe. Another thing you'll notice is that in creating my module, I made all `input`s, `output`s, and `reg`s **`signed`**. What this means is that when Verilog is interpretted it will make sure to implement two's complement-appropriate hardware for the operations (since some operations in two's complement are carried out differently at the bit level than when your numbers are unsigned). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ c linenumbers module math_op( input clk, input signed [31:0] a, input signed [31:0] b, input signed [31:0] c, output signed [31:0] d ); reg signed[31:0] d_reg; reg signed [31:0] in_between; always @(posedge clk) begin in_between <= c*c; d_reg <= 3 + in_between*(a+b); end assign d = d_reg; endmodule ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!! WARNING Use caution with signed numbers in Verilog. If your intention is for an operation to be signed, make sure all signals that come into contact with the operation are themselves signed. Otherwise sneaky things can happen (some part of your operation might get treated as unsigned and then it gets weird and hard to debug). In order to incorporate this module into your AXI IP that we're building, while in the IP Editor, on the left side, create a new source via the **`Add Sources`** button. Go ahead and make a new Verilog file, call it whatever you want, and after it has been created, go and edit it and add the module above into the file. When that is done, now go back to the AXI_INST verilog file in your IP (the one that contains all the AXI line handling), and scroll all the way to the bottom. We're going to add an instance of the module you just wrote (copy/pasted). Scroll all the way to the bottom of the module's code until you find an area commented up with something like "User IP goes here". In that area I'm going to create an instance of my `math_op` module I just wrote. The syntax for making an instance of a module in Verilog is shown below: ![](./resources/adding_module.png width="400px") There's a few things to point out. First what I'm doing is providing three already-existing register values as the inputs to the module. These registers are called `slv_reg0`, `slv_reg1`, and `slv_reg2`. These registers correspond to the first, second, and third memory locations in our device, and the wrapper code in this file will make sure they are always filled with the most up-to-date values written to them. You can see where this is done in the neighborhood of lines 219 to 269 of the file. Above where I create my module, I've also created a wire which will link the module's output (the `d` output port), to a output register that will enable what we make to be placed into a memory-mapped location and then the wrapper code will handle the AXI handshake/transfer out! In order to link that, you must inject your calculation into the appropriate memory/register location. Following our desired memory map from above, we'll want this to go into `slv_reg3`, and this can be done with the following simple replacement: ![](./resources/tying_to_output.png width="400px") When this is all done, double check that things look ok and all file are updated. You can do this by: * Making sure your files are saved * Looking at your IP's source hierarchy and **`Package IP`** window and making sure things are blue and that there are all check marks like shown below: ![](./resources/looking_good.png ) If anything is missing you may need to go back and readd your verilog file you just wrote. If you needed to make any changes (now or in the future) to your IP module, you'll need to always check the **`Package IP`** lab. This is important since it'll tell you what needs to be automatically updated, etc. For example a change in the Verilog file will require you to refresh stuff in the **`File Groups`** tab, and you'll almost always have to go the final **`Review and Package`** tab at the bottom and click on **`Re-Package IP`**. Assuming you've built everything and all is good (and you've packaged the IP), it will ask if you'd like to return to your project. Click OK. If it doesn't and all is saved, you can just exit the IP project (but not Vivado overall). !!! warning Upon exit, it may also pop up a window about Generating Output Products. If that comes up, make sure to click **`Global`**, and totally and feel free to click on **`Generate`**. (you'll need to do that now or in the future). Back in Block Diagram ====================== When back into the main block diagram, add an instance of your new module. You should be able to just search for it by name under your IP entry field. (Mine was named `joe5`) You should be prompted to allow the system to automate the AXI wiring. Click on that and let it do its thing: ![](./resources/automation_example.png width="400px") After it does that, and you've cleaned up your wiring, your overall diagram should look like the following: ![](./resources/overall.png ) Depending on the order of operations from before you may get prompted about generating output products here (if you didn't earlier). The window looks like the following. It is very important that you make sure **`Global`** is specified~ ![](./resources/generate_output_products.png width="300px") !!! warning Just to reiterate, make sure to specify **`Global`** under Synthesis Options when it pops up. If you don't you'll get errors/critical warnings related to **`Out of Context Build`** issues later on. If you forgot to do this or can't remember you can always right click on your block diagram, go to **`Generate Output Products`**, change things and then click **`Generate`**. !!! Tip You are always free to go back and edit IP and stuff in the future, just remember at every step along the way just regenerate things (make sure Verilog is refreshed, saved, etc). Also note that if you go and update your IP at any point after having integrated it, Vivado will notice and prompt you to update/refresh all your IP. That will usually come up as an error, or a more friendly yellow notifier at the top. See the two images below: ![Ooooh IP has changed. Let me click on `Show IP Status`](./resources/update_ip_1.png) ![Ahhh...just as I suspected. `joe_5` was changed. Let me upgrade the IP!](./resources/update_ip_2.png) Back on our main plot here, if all is good, verify your block diagram with the check-mark symbol. Then right-click on your block diagram file, and go to **`Create HDL wrapper`**. If that builds fine, then save everything, and go to **`Generate Bitstream`**. AFter that is completed (hopefully with no errors), you'll then be able to (while having the block diagram active), go to **`File>Export>Export Block Design`** to generate your tcl file. Interacting with it in Python ========================= Once your bit file and tcl file are in place in the file structure of your system, go ahead and create either a new Python file or a Jupyter notebook. Make sure it is placed in the same relative file path as your bit file and tcl file from this lab. In my particular code I called them `mmio1.bit` and `mmio1.tcl` and you can see that in the code below. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python linenumbers from pynq import Overlay #import the overlay module ol = Overlay('./mmio1.bit') #locate/point to the bit file j5 = ol.joe5_0 #find the AXI MMIO module which we can talk to (name of IP) #Now it is time to interface with the j5 IP: j5.write(0x00,4) #write 4 to address location 0x00 (the a value) j5.write(0x04,-9) #write -9 to address location 0x04 (the b value) j5.write(0x08,2) #write 2 to address location 0x08 (the c value) d = j5.read(0x0C) # should be: 3 + 2*2*(4-9) = -17 if d > 0x7FFFFFFF: #comes back raw...need to treat as two's complement d -= 0x100000000 print(d) #print output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you run this, you should, amazingly get out: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python -17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you didn't, you might have messed something up. But if you did, we're in serious business. This provides a much faster and more convenient and multiplexable way of sending information down to the PL and then getting it back up to the PS. This can get even easier by us writing custom Python functions (which could eventually be incorportaed into libraries or something). Consider the following where I create a simple "wrapper" function for my hardware operation: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python linenumbers def mathop(a,b,c): j5.write(0x00,a) #write 4 to address location 0x00 j5.write(0x04,b) #write -9 to address location 0x04 j5.write(0x08,c) #write 2 to address location 0x08 d = j5.read(0x0C) # should be: 3 + 2*2*(4-9) = -17 if d > 0x7FFFFFFF: #comes back raw...need to treat as two's complement d -= 0x100000000 return d ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Then in any downstream code in my larger Python I can just do: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python linenumbers print(mathop(4,-9,2)) print(mathop(12,-50,12)) print(mathop(13,13,13)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ which will poop out: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python -17 -5469 4397 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ahh, you can smell the layers of abstraction piling up. And they smell good. In the field (or your final project), you could now imagine that you could perform a complex operation in the hardware quickly (let's say a Fourier Transform on some audio data, and then extract certain peaks and report them back, and have all of that wrapped up in a nice little function that anyone could use.) Moving Forward/Wrapping Up ========================= Most modules which are connected to the system over AXI (and you can have quite a few since it is designed to be set up like a bus), will appear as `DefaultIP` objects, which are discussed. Basically what you can do with them is read and write to them as if they were standard external modules with read and write registers. It works really nicely. You can always see what IP has been generated/linked to a given bit/tcl file set by looking at the `ip_dict` attribute of the `Overlay` instance you have for a given project. Running code such as the following (the stuff we generated):... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python linenumbers from pynq import Overlay #import the overlay module ol = Overlay('./mmio1.bit') #locate/point to the bit file print(ol.ip_dict) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ...will yield the following output: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python {'joe5_0': {'addr_range': 65536, 'driver': pynq.overlay.DefaultIP, 'fullpath': 'joe5_0', 'gpio': {}, 'interrupts': {}, 'phys_addr': 1136656384, 'state': None, 'type': 'user.org:user:joe5:1.0'}} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I can then use to package up these IP's in convenient interfaceable ways! Finishing Up ======================= Feel free to go back and implement your own more complicated mathematical operation. Either edit/rebuild the IP you wrote, or create one with more interfaces. Dealing with Vivado's IP creator takes a few cycles of attempts to get the pattern down, so repitition can be good here. What we wrote was a relatively simple module. If you needed to implement something that took lots of clock cycles to do its work, you'd need to do a bit more interfacing with the AXI depending on how quickly you're asking for data. *Lab initially inspired by video here*