Java’s One Billion Row Challenge: Showcasing High-Performance Programming and Community Innovation in 2024

Java’s One Billion Row Challenge: Showcasing High-Performance Programming and Community Innovation in 2024.

On January 1, 2024, Gunnar Morling, a Senior Staff Software Engineer at Decodable, initiated the One Billion Row Challenge (1BRC) for the Java community. This challenge, set to run through the end of January, seeks to identify the fastest Java code capable of processing one billion rows. As of now, the leading algorithms have completed this task in under 2.5 seconds. The competition stipulates that participants can only utilize SDK features available across any Java distribution, thereby excluding external libraries and data stores.

The challenge’s specifics and its underlying motivation. The 1BRC revolves around a seemingly straightforward task: parsing temperature measurements from a text file and calculating the minimum, maximum, and average temperatures for each listed weather station, with the twist being that the file contains one billion entries. Morling’s aim was to foster exploration of high-performance programming, new APIs (like the Vector API which exploits CPU SIMD instructions), and various Java distributions, to demonstrate Java’s enhanced speed capabilities.

Participants can join the challenge by referring to the README file and cloning the repository provided, with Morling emphasizing the importance of learning from both one’s own and others’ implementations.

Morling noted that the solutions submitted often optimized for the specific dataset’s key set, namely weather station names. This approach was effective only for this particular dataset. The solutions showcased the utilization of SIMD, the new Java native memory API, and highly optimized parsing functions, including SWAR (SIMD within a register). The top entries delved deep into native optimization, focusing on CPU instruction counting and branch misprediction analysis.

Eliot Barlas, a Principal Software Engineer at GoTo, partitioned the file in accordance with the number of available processors, assigning separate threads to compute statistics for each weather station, with final results aggregated into a comprehensive statistics table. The data was memory-mapped and accessed via MappedByteBuffer, with weather station names stored as integer sequences using sun.misc.Unsafe.

Roy van Rijn, Director at OpenValue Rotterdam, described his solution as evolutionary, initially employing basic data structures and SDK APIs and gradually incorporating Unsafe for direct memory access. His approach included parallelism, branchless code, and SWAR implementation.

Thomas Wuerthinger, Vice President of Software Development at Oracle and Founder of GraalVM, detailed a solution that involved workload division in line with the target processor’s core count, employing memory mapping for efficient direct memory access, and a branch-avoiding parsing technique.

Regarding potential improvements, Barlas expressed interest in exploring Project Panama’s foreign memory capabilities, particularly the Vector API for weather station name comparisons. Wuerthinger highlighted that further enhancements would largely depend on the target hardware, focusing on memory bandwidth, compute bandwidth, and branch prediction reliance. Van Rijn is exploring the concept of “mechanical sympathy” to optimize instruction execution for the specific testing machine.

In conclusion, Morling praised the Java community’s enthusiasm and participation in the challenge, noting the unexpected breadth of learning and collaboration it fostered. While solutions running on GraalVM dominated, there were notable entries using OpenJDK builds, Amazon Corretto, and Eclipse Temurin. The challenge also saw contributions beyond Java, including Rust, Go, C++, SQL, and Shell. Morling expressed gratitude towards the community and Decodable for providing the evaluation machine, emphasizing the challenge’s role in demonstrating the vitality of Java, its ecosystem, and its community.


The One Billion Row Challenge

Posted at Jan 1, 2024

Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit.

For folks to show-case non-Java solutions, there is a “Show & Tell” now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot.

Thanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries.

Let’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31.

Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

The text file has a simple structure with one measurement value per row:

Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
...

The program should print out the min, mean, and max values per station, alphabetically ordered like so:

{Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, 
Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...}

The goal of the 1BRC challenge is to create the fastest implementation for this task, and while doing so, explore the benefits of modern Java and find out how far you can push this platform. So grab all your (virtual) threads, reach out to the Vector API and SIMD, optimize your GC, leverage AOT compilation, or pull any other trick you can think of.

There’s a few simple rules of engagement for 1BRC (see here for more details):

  • Any submission must be written in Java
  • Any Java distribution available through SDKMan as well as early access builds from openjdk.net may be used, including EA builds for OpenJDK projects like Valhalla
  • No external dependencies may be used

To enter the challenge, clone the 1brc repository from GitHub and follow the instructions in the README file. There is a very basic implementation of the task which you can use as a baseline for comparisons and to make sure that your own implementation emits the correct result. Once you’re satisfied with your work, open a pull request against the upstream repo to submit your implementation to the challenge.

All submissions will be evaluated by running the program on a Hetzner Cloud CCX33 instance (8 dedicated vCPU, 32 GB RAM). The time program is used for measuring execution times, i.e. end-to-end times are measured. Each contender will be run five times in a row. The slowest and the fastest runs are discarded. The mean value of the remaining three runs is the result for that contender and will be added to the leaderboard. If you have any questions or would like to discuss any potential 1BRC optimization techniques, please join the discussion in the GitHub repo.

As for a prize, by entering this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner may

receive a unique 1️⃣🐝🏎️ t-shirt, too.

So don’t wait, join this challenge, and find out how fast Java can be—​I’m really curious what the community will come up with for this one. Happy 2024, coder style!

Connected through code, Choose Your Platform!

About the Author: Bernard Aybout

In the land of bytes and bits, a father of three sits, With a heart for tech and coding kits, in IT he never quits. At Magna's door, he took his stance, in Canada's wide expanse, At Karmax Heavy Stamping - Cosma's dance, he gave his career a chance. With a passion deep for teaching code, to the young minds he showed, The path where digital seeds are sowed, in critical thinking mode. But alas, not all was bright and fair, at Magna's lair, oh despair, Harassment, intimidation, a chilling air, made the workplace hard to bear. Management's maze and morale's dip, made our hero's spirit flip, In a demoralizing grip, his well-being began to slip. So he bid adieu to Magna's scene, from the division not so serene, Yet in tech, his interest keen, continues to inspire and convene.