LinearCosine: Benchmarking L-Mul Algorithm

Important Note: This project is an unofficial benchmark implementation focusing on cosine similarity calculations, based on concepts from the research paper "Addition is All You Need for Energy-efficient Language Models" by Hongyin Luo and Wei Sun (2024). It applies their proposed L-Mul algorithm to cosine similarity computations, but is not the original work of the authors and is intended solely for my own exploratory purposes.

GitHub Implementation of Below Read the Original Paper

TLDR: Technical Explanation

The L-Mul algorithm proposes an efficient approximation of floating-point multiplication using primarily integer addition operations. Here's a quick breakdown:

Standard Floating-Point Multiplication

The traditional method multiplies two floating-point numbers as follows:

\[\text{Mul}(x,y) = (1 + x_m) \cdot 2^{x_e} \cdot (1 + y_m) \cdot 2^{y_e} = (1 + x_m + y_m + x_m \cdot y_m) \cdot 2^{x_e + y_e}\]

Where \(x_m\) and \(y_m\) are mantissas, and \(x_e\) and \(y_e\) are exponents.

L-Mul Algorithm

The L-Mul method approximates this multiplication:

\[\mathcal{L}\text{-Mul}(x,y) = (1 + x_m + y_m + 2^{-l(m)}) \cdot 2^{x_e + y_e}\]

Where \(l(m)\) is defined as:

\[l(m) = \begin{cases} m & \text{if } m \leq 3, \\ 3 & \text{if } m = 4, \\ 4 & \text{if } m > 4. \end{cases}\]

Here, \(m\) represents the number of mantissa bits.

Key Differences

This approximation significantly reduces computational complexity and energy consumption while maintaining competitive accuracy, especially for lower-precision operations common in many AI tasks.

About This Benchmark

This benchmark explores the Linear-complexity Multiplication (L-Mul) algorithm proposed by Luo and Sun in their 2024 paper. Our goal is to provide a practical implementation of their concepts, focusing on cosine similarity calculations. This project is not affiliated with or endorsed by the original authors.

Experiment Overview

We've implemented the L-Mul algorithm as described by Luo and Sun, comparing its performance against standard multiplication in cosine similarity computations. Our benchmarks cover:

Results

1D to 1D Cosine Similarity

Mantissa Bits L-Mul MSE L-Mul Time (ns) Mul MSE Mul Time (ns) Time Reduction (%)
29.45862e-065074.07024e-0867024.3284%
31.84347e-054594.07024e-0859723.1156%
41.84347e-053834.07024e-0851625.7752%
54.44707e-054771.93169e-0662924.1653%
61.78036e-054776.42665e-0862523.68%
72.11236e-054982.21875e-0865423.8532%
82.0259e-054421.28071e-0958924.9576%
91.9652e-054606.87609e-0960724.2175%
101.84467e-054447.4093e-1058824.4898%
111.82827e-055066.00017e-1066924.3647%
121.80075e-054331.11101e-1057224.3007%
131.79965e-054871.01831e-1064324.2613%
141.79698e-054942.53293e-1165724.8097%
151.79675e-054601.13981e-1260924.4663%
161.79675e-053861.02339e-1252025.7692%
171.79673e-054353.19744e-1458025.0%
181.79673e-053963.19744e-1453225.5639%
191.79673e-054021.42109e-1453925.4174%
201.79673e-05496065123.8095%
211.79673e-05477062723.9234%
221.79673e-05431057224.6503%
231.79673e-05415055525.2252%

Summary

1D to 2D Cosine Similarity

Mantissa Bits L-Mul MSE L-Mul Time (ns) Mul MSE Mul Time (ns) Time Reduction (%)
24.67427e-0516764.55376e-07201716.9063%
37.48827e-0516774.55376e-07201616.8155%
47.48827e-0517104.55376e-07209118.2209%
59.02946e-0517736.61553e-07216518.1062%
67.6781e-0517948.51294e-08220018.4545%
77.59022e-0517482.00377e-08213418.0881%
87.49402e-0517602.28993e-09214818.0633%
97.32966e-0517302.9637e-09211618.242%
107.24353e-0517755.58836e-10216217.9001%
117.19966e-0517212.21817e-10209317.7735%
127.17846e-0517394.44593e-11212218.049%
137.14999e-0516413.42347e-11200818.2769%
147.1498e-0517088.72289e-12208318.0029%
157.15258e-0516103.84912e-13197218.357%
167.15262e-0516613.46476e-13203518.3784%
177.15261e-0516631.53951e-14203418.2399%
187.15261e-0516461.53951e-14201318.2315%
197.15261e-0515984.73695e-15196318.594%
207.15256e-0517271.18424e-15211018.1517%
217.15261e-0516345.92119e-15199718.1773%
227.15256e-0516534.73695e-15202318.2897%
237.15256e-0516930206818.1335%

2D to 2D Cosine Similarity

Mantissa Bits L-Mul Time (ns) Mul MSE Mul Time (ns) Time Reduction (%)
22.95e-0460482.00e-06774221.88%
34.25e-0460222.00e-06772322.03%
44.48e-0461382.00e-06793122.61%
55.03e-0461401.00e-06793422.61%
64.74e-0463410.00e+00816222.31%
74.80e-0461940.00e+00800322.60%
84.81e-0457610.00e+00748223.00%
94.77e-0457520.00e+00744722.76%
104.74e-0462430.00e+00805922.53%
114.74e-0462410.00e+00807522.71%
124.74e-0457280.00e+00744723.08%
134.74e-0459500.00e+00772022.93%
144.74e-0462170.00e+00802922.57%
154.74e-0458090.00e+00754122.97%
164.74e-0463220.00e+00816322.55%
174.74e-0459390.00e+00768122.68%
184.74e-0459000.00e+00764922.87%
194.74e-0457740.00e+00749122.92%
204.74e-0456870.00e+00739723.12%
214.74e-0458750.00e+00762822.98%
224.74e-0460260.00e+00781522.89%
234.74e-0459160.00e+00764522.62%

Key Findings

Limitations

Conclusion

Our benchmark results suggest that the L-Mul algorithm, as proposed by Luo and Sun, shows promise in reducing computation time for cosine similarity calculations across various dimensions. However, it's important to note that these results are based on a simplified C++ implementation and may not fully reflect the algorithm's performance in more complex, real-world scenarios or on specialized hardware.

Acknowledgments

All credit for the L-Mul algorithm and its theoretical foundations goes to Hongyin Luo and Wei Sun, as presented in their 2024 paper. This benchmark is an independent exploration of their concepts and should not be considered an official implementation or extension of their work.

Citation

@misc{luo2024additionneedenergyefficientlanguage,
    title={Addition is All You Need for Energy-efficient Language Models}, 
    author={Hongyin Luo and Wei Sun},
    year={2024},
    eprint={2410.00907},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2410.00907}, 
}
        

Disclaimer

This project is for educational purposes only. It is not intended for production use and does not claim to accurately represent the full potential or limitations of the L-Mul algorithm as proposed by Luo and Sun. Users are encouraged to refer to the original paper for authoritative information.