# The binary angle

Implementation of operations using integer arithmetic instructions is often but not always faster than the corresponding floating point instructions. A position for the 'binary point' is chosen for each variable to be represented, and binary shifts associated with arithmetic operations are adjusted accordingly.

To give an example, a common way to use integer arithmetic to simulate floating point, using 32 bit numbers, is to multiply the coefficients by Using binary scientific notation , this will place the binary point at B That is to say, the most significant 16 bits represent the integer part the remainder are represent the fractional part. Put another way, the B number, is the number of integer bits used to represent the number which defines its value range.

Remaining low bits i. For instance, to represent 1. This gives B16, which when converted back to a floating point number by dividing again by 2 16 , but holding the result as floating point gives 6. The correct floating point result is 6. The example above for a B16 multiplication is a simplified example. Re-scaling depends on both the B scale value and the word size. B16 is often used in 32 bit systems because it works simply by multiplying and dividing by or shifting 16 bits.

Consider a 32 bit word size, and two variables, one with a B scaling of 2 and the other with a scaling of 4. Note that here the 1. A 32 bit floating-point number has 23 bits to store the fraction in. This is why B scaling is always more accurate than floating point of the same word size. This is especially useful in integrators or repeated summing of small quantities where rounding error can be a subtle but very dangerous problem when using floating point. The number of bits to store the fraction is 28 bits.

This result is in B7 in a 64 bit word. Shifting it down by 32 bits gives the result in B7 in 32 bits. Because the position of the binary point is entirely conceptual, the logic for adding and subtracting fixed-point numbers is identical to the logic required for adding and subtracting integers.

Thus, when adding one half plus one half in Q3. Which is equal to one as we would expect. This applies equally to subtraction.

In other words, when we add or subtract fixed-point numbers, the binary point in the sum or difference will be located in exactly the same place as in the two numbers upon which we are operating.

When multiplying two 8-bit fixed-point numbers we will need 16 bits to hold the product. Clearly, since there are a different number of bits in the result as compared to the inputs, the binary point should be expected to move.

However, it works exactly the same way in binary as it does in decimal. When we multiply two numbers in decimal, the location of the decimal point is N digits to the left of the product's rightmost digit, where N is sum of the number of digits located to the right side of the decimal point in the multiplier and the multiplicand.

Thus, in decimal when we multiply 0. The multiplier has one digit to the right of the decimal point, and the multiplicand has two digits to the right of the decimal point. Thus, the product has three digits to the right of the decimal point which is to say, the decimal point is located three digits to the left. From the addition example above, we know that the number one half in Q3. Since 0x8 times 0x8 in hex is 0x also in hex , the fixed-point result can also be expected to be 0x - as long as we know where the binary point is located.

Let's write the product out in binary:. Since both the multiplier and multiplicand have four bits to the right of the binary point, the location of the binary point in the product is eight bits to the left. Thus, our answer is If we want the format of the output to be the same as the format of the input, we must restrict the range of the inputs to prevent overflow. To convert from Q7. Many embedded systems that produce sine waves, such as DTMF generators, store a "sine table" in program memory.

It's used for approximating the mathematical sine and cosine functions. Since such systems often have very limited amounts of program memory, often fixed-point numbers are used two different ways when such tables are used: Typically one quadrant of the sine and cosine functions are stored in that table.

The values in such tables are usually stored as fixed point numbers—often bit numbers in unsigned Q0. There seems to be two popular ways to handle the fact that Q0.

A few people draw fairly accurate circles and calculate fairly accurate sine and cosine with a Bezier spline. Many people prefer to represent rotation such as angles in terms of "turns". The integer part of the "turns" tells how many whole revolutions have happened.

The main advantage to storing angles as a fixed-point fraction of a turn is speed. Combining some "current position" angle with some positive or negative "incremental angle" to get the "new position" is very fast, even on slow 8-bit microcontrollers: Other formats for storing angles require the same addition, plus special cases to handle the edge cases of overflowing degrees or underflowing 0 degrees.

Using a binary angle format in units of "turns" allows us to quickly using shift-and-mask, avoiding multiply separate the bits into:.