OpenCores
URL https://opencores.org/ocsvn/core1990_interlaken/core1990_interlaken/trunk

Subversion Repositories core1990_interlaken

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /core1990_interlaken/trunk/documentation/protocol_survey_report/Sections
    from Rev 10 to Rev 9
    Reverse comparison

Rev 10 → Rev 9

/Hardware.tex
0,0 → 1,154
\section{Proposed hardware implementation}
%\label{sec:hardware}
There is a lot of information about the functionality of protocols gathered. The purpose of this assignment is to also make an implementation of a protocol, which will be the focus of this section.
 
For the implementation a Xilinx Virtex-7 VC707 Evaluation Board was received. Despite the board being several years old, the hardware is still excellent for the purposes of this assignment. The VC707 contains 27 accessible transceivers according to the documentation. \cite{VC707} This sounds like a lot of opportunities but unfortunately about two of these transceivers are really interesting for this purpose. One the GTX transceivers is connected to the SMA connector on the board while the second one is connected to the optical SFP/SFP+ (Small form-factor pluggable) connector.\\
These so called GTX transceivers support transfer speeds up to 12,5 Gbps, when using the QPLL instead of the CPLL, which is excellent since 10 Gbps is the target line late. \cite{GTXT}
 
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{GTXQPLL.png}
\caption{Line rates while using the QPLL with GTX transceivers. \cite{GTXT}}
\label{fig:GTXT}
\end{figure}
 
\subsection{CRC}
One the steps to begin building the protocol is to start with simple CRC encoding of data and decoding this again. For example the interlaken protocol uses CRC-24. The polynomial mentioned below comes from the Interlaken specifications.\cite{InterlakenProtocol} This form of CRC is used to check all data that was transmitted in a previous burst and the control word itself the error correction was send in. The polynomial can also be written out as 0x328B63 in hex form.
 
\begin{equation*}
X^{24}+X^{21}+X^{20}+X^{17}+X^{15}+X^{11}+X^{9}+X^{8}+X^{6}+X^{5}+X+1
\end{equation*}
 
CRC-32 is also an CRC variant that is often used but this add several additional redundant bits.\cite{CRC32} The same polynomial as below is used for CRC-32 in IEEE 802.3 and PCIe protocols which can also be noted as 0x04C11DB7.
 
\begin{displaymath}
X^{32}+X^{26}+X^{23}+X^{22}+X^{16}+X^{12}+X^{11}+X^{10}+X^{8}+X^{7}+X^{5}+X^{4}+X^{2}+X+1
\end{displaymath}
 
A different polynomial for CRC-32 is used by Interlaken on a per-lane basis. This is implemented so that errors can be traced to a separate lane while bonding is used. The error correction is calculated over al the words that have been transmitted in a so called meta frame. The CRC is of course calculated before the scrambling and 64b/67b encoding. Framing bits will be excluded but all other data and control words will be included in the calculation, also the diagnostic word in which the CRC-32 will be put. The CRC-32 and scrambler state locations will be padded with zeros in the calculation. Below follows the polynomial Interlaken uses for the implementation of CRC-32 which can also be interpreted as 0x1EDC6F41 in hex form.
 
\begin{displaymath}
X^{32}+X^{28}+X^{27}+X^{26}+X^{25}+X^{23}+X^{22}+X^{20}+X^{19}+X^{18}+X^{14}+X^{13}+X^{11}+X^{10}+X^{9}+X^{8}+X^{6}+1
\end{displaymath}
 
There are several sites on the Internet which will generate a CRC file written in VHDL. This will be huge files since parallel CRC is preferred over serial CRC in this assignment. In the case a serial variant is wished, this can be implemented using a for loop. Every clock cycle the bits will shift a position and CRC encoding will be applied.
 
A way to encode the data a lot faster is to implement this in a parallel way. Instead of the for loop, all bits will be saved in a register and go through all the or gates at the same time. Thus the output data is ready in a few clock cycles. Unfortunately this approach is a bit more complex and the code can get quite big depending on the size of to be encoded data and the size of the polynomial.
 
When designing a parallel CRC encoder, the first step is to start designing with Linear-Feedback Shift Registers (LFSR) which actually is a shift register using a few additional XOR gates. The advantage of these registers is that they require little hardware and contain the ability to operate at high speeds.\\
A detailed article \cite{CRCpaper} has been written on the parallel calculation of the CRC VHDL code. If the polynomial is written in the form of multiple LFSR's, the outputs of every register can be determined. From here on two matrices can be drawn which can easily be combined and translated to code. For small CRC's this method has been checked and reproduced.
 
In the case of CRC-24 of CRC-32 and a 64 bit data-input the matrices get huge and a generator is required to speed up the process. Fortunately the author of the article also has developed an online generator\cite{CRCgen}. %This resulted in a CRC-24 code that is suitable to be implemented for the protocol.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.9\textwidth]{CRC-24.png}
\caption{Visualization of CRC-24 with LFSR components.}
\label{fig:CRC-24}
\end{figure}
 
In figure \ref{fig:CRC-24} the CRC-24 is visualized. It will constantly loop the data and generate a 24-bit output. The same will apply for CRC-32 but then of course with a different scheme and 32-bit output.
 
Unfortunately another code was developed which uses a different approach but was easier to use and understand. Instead of designing parallel CRC, the code has been written in for loops which will eventually be parallelized by the synthesizer built in the FPGA development software.
 
Nevertheless the following sites offer tools to generate a parallel CRC VHDL file:\\
\url{http://www.easics.be/webtools/crctool}\\
\url{http://outputlogic.com/?page_id=321}\\
\url{http://www.sigmatone.com/utilities/crc_generator/crc_generator.htm}\\
 
\subsection{Scrambler}
The scrambler is a very important part and also makes use of a polynomial. This means the scrambler is also implementable using LFSR components. The widely accepted polynomial used in combination with 64 bit input is written down below. The polynomial can also be interpreted as 0x400008000000000.
 
\begin{displaymath}
X^{58}+X^{38}+1
\end{displaymath}
 
While using a polynomial, the structure in hardware is different because it's purpose is to scramble data and not to encode data. The developer of the earlier mentioned CRC generator also developed a scrambler generator which could prove useful. \cite{Scramblergen}
 
The scrambler itself can be chosen as self-synchronous which offers the advantage that no synchronization is required. The state will be determined according to the received input data. The disadvantage of a self-synchronous scrambler is that errors in the transmitted data cause even more errors at the receiving side. This appears because the self-synchronous scrambler uses two feedback taps and this way the earlier errors cause even more corrupted data at the receiving side which is absolutely undesirable. To prevent these scenarios a synchronous scrambler will be designed.
 
One of the difficulties with synchronous scrambling is that data will be converted continuously, even if the input stops. It uses the data stored in the registers and this will be updated every clock cycle. This means the same input can deliver very different outputs depending on the data stored in the registers before.
 
In other words, it is absolutely necessary to synchronize the scrambler and descrambler with each other. Otherwise the descrambled data will be completely different and thus corrupt. A solution for this is to start both components with the same starting condition for the registers, also called starting seed. Another solution is to send the current scrambler state to the descrambler which will then be used.
 
For example in case the Interlaken protocol is used, while preparing for transmitting data, the synchronization and scrambler state words are transmitted to the other side. This can be at the beginning of a transmission or after errors in communication.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{Interlaken_SyncScramWord.png}
\caption{Interlaken synchronization and scrambler state words. \cite{InterlakenProtocol}}
\label{fig:Interlaken_scram}
\end{figure}
 
 
 
 
\subsection{Framing}
The concept of transmitting data is fairly simple but it is also necessary to send commands and important information to the receiver. For this reason different frames came into existence. These are packets containing 64-bit of information that the receiver can read. These packets can be divided in data words and control words. For the data words it is clear what to be expected from them but the control words can be separated in different categories. This increases the protocols reliability and provides the opportunity to transmit other important information. For example Interlaken contains data words with additional idle/burst and framing layer control words.
The last one contains space for a six bit so called block type. This indicates what kind of information the specific word contains. There are four possible combinations which translate to synchronization, scrambler state, skip and diagnostic.
 
As expected the synchronization is used start the communication before data is transmitted. The word is send multiple times to the receiving side and after correct synchronization the scrambler state can be send. These words are of course send unscrambled because the descrambler is missing the scrambler state from the other side. This is also the reason for sending the scrambler state.
 
Synchronization is also very useful in the case bonding is implemented. Here the word comes in very useful. It is absolutely necessary to align the SerDes lanes. This can be done by sending the synchronization words across al these lanes simultaneously. The receiver will see these messages appear on every lane and can measure the skew between them. Internal logic can then be adjusted to compensate for this skew. These words are send at a fixed frequency so that the data paths can be aligned regularly.
 
Skip words are implemented to enable clock compensation which is very useful when a repeater is placed between the transmitter and receiver. In case the clock frequencies of both devices mismatches, skip words can be added or removed. This has to be done by the repeater who also has to change the message in structure before it will be forwarded.
 
Diagnostic words contain two functions. From the 58-bit it effectively uses 34-bits, the rest is padded with zeros. There are two status bits used from which one gives an indication of the current lane's health and one bit indicates the entire interface's condition. The 32-bits which are left have been reserved for the CRC-32.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{Interlaken_BlockType.png}
\caption{Interlaken framing layer block types. \cite{InterlakenProtocol}}
\label{fig:Interlaken_bt}
\end{figure}
 
Idle/burst control words can also be separated in two different words. The word mode can be changed by setting a specific control bit in the words.
 
Burst control words are used to indicate the beginning of a data burst. In Interlaken every data burst has to start with a burst control word. The start of packet (SOP) and channel number directly apply to the following data. This word will also be used to indicate the end of a packet and the EOP Format offers the possibility to define the number of valid bytes in the last eight byte word. The CRC-24 will also be included to check the preceding bytes on error. SOP and channel numbers can be set in the same control word to prepare for the next burst. So all of this can be included in a single 64-bit word.
 
The idle word is always transmitted when there is no new data available to send. This causes the SOP and channel number field to be invalid and no data bytes will follow this word.
 
The flow control words will always contain information whether the word is idle of burst.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{Interlaken_ControlWord.png}
\caption{Interlaken control word formats. \cite{InterlakenProtocol}}
\label{fig:Interlaken_wf}
\end{figure}
 
These discussed frames will be send infrequently and thanks to that they will consume a minimal amount of bandwidth. This causes low frame overhead. Of course the exact amount of overhead is largely dependent on the entire size of the meta frame.
 
 
\subsection{64b/67b encoding}
This will form the data bits with the additional three bits. Two of these bits will indicate the presence of a data or control word. The last bit will cause an inversion of the complete word when set. As told earlier in subsection \ref{subsec:64b67b} this additional bit is meant to prevent DC baseline wander. In high speeds communications timings often won't allow a full voltage swing before the next bit is transmitted. A variable is used to register the average value between ones and zeros. A one will increase the value and a zero will decrease the value. This generated value will be compared to the last transmitted value and according to that it will be determined whether or not the data will be inverted.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{Interlaken_EncoderPreamble.png}
\caption{The 64b/67b encoding used in the Interlaken protocol. \cite{InterlakenProtocol}}
\label{fig:Interlaken_64b67b}
\end{figure}
 
The two other bits will indicate a control or data word. When different combinations are used, the whole word will be discarded and an error will appear on all channels. This can only happen when a word has it's framing bits corrupted.
 
\subsection{Flow control}
There are multiple ways of implementing flow control.
Interlaken for example uses out-of-band, in-band and full-packet mode flow control.
The idle/burst control word contains reserved bits to use for flow control data.
 
\newpage
\subsection{Transmission}
 
The Interlaken interface is not designed to transmit random bytes. The data will be send to the other side in so called bursts. These can be sent once or multiple times. The maximum size of the bursts is 64 bytes so this is 8 data packs each burst and this is called BurstMax. The minimum size is 32 bytes and is known as BurstShort. Every byte amount in between can also be transmitted, this has to be an eight byte increment of course since the data packs are all 64-bit. \\
However is there are 72 bytes to be transmitted this can be done is a BurstMax of eight bytes and a BurstShort. The problem in this case is that a BurstShort requires at least 32 bytes so the last eight bytes are added to three idle control words to fill up the amount to 32 bytes. This is very inefficient and wastes a lot of bandwidth.
 
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{InterlakenBurst.png}
\caption{An example of a short burst. \cite{InterlakenProtocol}}
\label{fig:InterlakenBurst}
\end{figure}
 
A solution for this complication is mentioned in the Interlaken documentation. BurstMax and BurstShort contain sizes that can be predefined by implementing the protocol. Additional BurstMin is introduced which in size is half that of BurstMax and is bigger or equal to BurstShort. When the payload to be send is bigger that BurstMax but smaller than BurstMax plus BurstShort this means that too much idle words will be used again. So in this case a payload of BurstMax minus BurstMin will be send. This way it can be guaranteed that the last data to be transmitted is enough to fill up BurstShort.
\newpage
/Progress.tex
0,0 → 1,39
\subsection{Current state of progress}
It is most important to keep track of the current status of development.
This subsection will describe the current progress on the hardware development of the Interlaken Protocol. Two tables will be presented from which Table~\ref{Tab:Transmitter_Status} gives information on the transmitter side and Table~\ref{Tab:Receiver_Status} will provide an overview of the status on the receiver side. This has been done to match the structure in which the hardware will be developed.
%Options to customize the table easily
\taburowcolors[2] 2{tableLineOne .. tableLineTwo}
\tabulinesep = ^2mm_2mm
\everyrow{\tabucline[.3mm white]{}}
\begin{table}[H]
\begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}
\tableHeaderStyle
Function & Simulation & Hardware & Comments \\
Generating bursts& Done & - & Working in simulation as intended (simple packet mode)\\
Meta Framing & Done & - & Working in simulation\\
Scrambling & Done & - & Working in simulation - Amount of input pins may change\\
Encoding & Done & - & Working in simulation\\
CRC generating & Done & - & Working in simulation\\
Flow control & - & - & Not started yet/bits reserved\\
\end{tabu}
\caption{Overview of progress on the transmitter side.}
\label{Tab:Transmitter_Status}
\end{table}
\begin{table}[H]
\begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}
\tableHeaderStyle
Function & Simulation & Hardware & Comments \\
Deframing & Done & - & Working in simulation\\
Descrambler & Done & - & Working as expected in simulation\\
Decoder & Done & - & Working in simulation\\
CRC checking& Done & - & Working in simulation\\
Flow control& - & - & Not started yet/bits read but static\\
\end{tabu}
\caption{Overview of progress on the receiver side.}
\label{Tab:Receiver_Status}
\end{table}
/VHDL.tex
0,0 → 1,62
\section{VHDL}
This section contains multiple subsections which will discuss the written code to implement Interlaken on the FPGA. All code will be written in VHDL and all code is designed according to the Interlaken Protocol Definition. \cite{InterlakenProtocol} \\
In case implemented code is borrowed the original author gave permission to this by publishing the code as open source or giving personal permission to use this code. The original author will be mentioned and all credits will be granted to this author. All VHDL files in complete form without interruptions can be found in the appendix at the end of this document. \\
The explanation of code will start with plain text describing it followed by the specific discussed VHDL lines.
 
\subsection{64b/67b Encoder}
The standard libraries are included and in the entity several ports are defined. There is a 64-bit data input and a data control input which defines whether or not the incoming data is an actual data word or a control word. There are additional inputs to enable and asynchronously rest the encoder. \\
Two outputs are available. One is the real encoded data ready for transmission and the other output is available to monitor the average transmitted count of ones and zeros. This was intentionally designed to check the functionality but can also be used for error indication or the cause of offsets in the transmission line.
 
\lstinputlisting[numbers=left, firstline=1, lastline=15] {Transmitter/Encoder/Encoder.vhd}
 
The architecture begin with a output process and introduces four variables. One to store the input data, a disparity counter which counts the amount of ones in the incoming package, the offset indicator and a disparity counter which counts the amount of ones the to be transmitted package contains. This last variable looks to be containing the same data as the earlier mentioned disparity counter for the incoming package since they both analyze data\_temp. The difference is that this last one also contains the in case needed inverted data so the real to be transmitted data.
 
\lstinputlisting[numbers=left, firstnumber=17,firstline=17, lastline=25] {Transmitter/Encoder/Encoder.vhd}
 
When the process really begins, the asynchronous reset is defined and all the real processing will happen on the rising edge. The input data will be stored in the variable Data\_Temp and the amount of ones in the input data will be counted.
 
\lstinputlisting[numbers=left, firstnumber=26,firstline=26, lastline=38] {Transmitter/Encoder/Encoder.vhd}
 
Bit-67 will standard be set to zero so the if-statement could shrink in size. If the already running data and current input data both contain more than 32 ones and both less than 32 ones (so more zeros), the current data will be inverted. This will cause the average amount of transmitted ones and zeros to be equal.
 
\lstinputlisting[numbers=left, firstnumber=40,firstline=40, lastline=51] {Transmitter/Encoder/Encoder.vhd}
 
The Data\_Control input will cause bit-65 and 66 to be in the wrong order as can be seen. When this input is high a control header will be added, otherwise the stream will be seen as data and will contain the usual data header.\\
Another piece of code has been added which delivers the average amount of ones. All data the data\_temp variable contains will be analyzed and in case a one in encountered, the offset will be increased. In case a zero is stumbled upon, the offset will be decremented. This make it very easy for debugging and error indication to check the functionality of the inversion.
 
\lstinputlisting[numbers=left, firstnumber=53,firstline=53, lastline=69] {Transmitter/Encoder/Encoder.vhd}
 
This resets the disparity of the to be transmitted data and counts the average amount of ones in the package which will be save to compare with te disparity of new incoming data. After this the temporarily saved data will be copied to the Data\_Out vector.
 
\lstinputlisting[numbers=left, firstnumber=71, firstline=71, lastline=84] {Transmitter/Encoder/Encoder.vhd}
 
\subsection{Scrambler}
 
The code implementing the scramber has partly been written by the author except for the part implementing the polynomial. This has been copied from the Interlaken Protocol Definition Appendix B. \\
The scrambler contains a 64-bit data in- and output. Control word input and valid data output indicators are also implemented. A lane number vector input is available because every implemented lane contains its own scrambler and for the sake of preventing possible crosstalk, every lane is reset to another scrambler state.
 
\lstinputlisting[numbers=left, firstnumber=1,firstline=1, lastline=17] {Transmitter/Scrambler/scrambler_interlaken.vhd}
 
The architecture contains two variables for the LFSR to constantly generate the scrambler state. The complete algorithm for this contains many lines and can be found in Appendix \ref{Appendix:Scrambler}.
 
\lstinputlisting[numbers=left, firstnumber=19,firstline=19, lastline=22] {Transmitter/Scrambler/scrambler_interlaken.vhd}
 
The architecture contains a single process called scrambledescramble because the same algorithm can be used to descramble the scrambled data again. An asynchronous reset has been implemented. The polynomial will be reset to all ones except for the specific bits that separate the lane starting condition.
 
\lstinputlisting[numbers=left, firstnumber=89,firstline=89, lastline=94] {Transmitter/Scrambler/scrambler_interlaken.vhd}
 
First thing to happen in the process at the rising edge of the clock, is to check which type of word the incoming data contains. While data words are immediately scrambled, control words are checked on the synchronization and scrambler state word. These two words will not be scrambled and copied to the output right away. All other control words will be scrambled.
 
\lstinputlisting[numbers=left, firstnumber=95,firstline=95, lastline=112] {Transmitter/Scrambler/scrambler_interlaken.vhd}
 
\subsection{Burst}
The to be transmitted data has to be packed in a burst. This will be a separate component which will generate the burst control words and will limit the burst sizes between a minimum and maximum size.\\
 
The CRC-24 that covers the previous data burst has also to be added. Because the CRC also covers the word itself that in the end has to contain the CRC pipelining has been used. The CRC calculation takes two clock cycles. By adding two registers parallel to the CRC calculator the burst word and CRC value will end up synchronously to each other. This was the CRC value can eventually be copied to the burst word and now includes the CRC value.
\begin{figure}[ht]
\centering
\includegraphics[width=0.7\textwidth]{Interlaken_CRC.png}
\caption{Hardware flow in the case of CRC calculation}
\end{figure}
 
\newpage

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.