µHash: A Tiny, High-Speed Hashing Algorithm for Embedded Systems

Building a Fast Lookup Service with µHash and Rust### Introduction

A fast lookup service is a foundational component for many systems: caching layers, key-value stores, routing tables, and deduplication services. When performance and low memory footprint matter—embedded devices, edge servers, or high-throughput APIs—selecting the right hashing strategy and implementation language is critical. This article shows how to design and implement a high-performance lookup service using µHash, a compact fast hash function, and Rust, a systems language that provides safety without sacrificing speed.

Why µHash?

µHash is a conceptual family of minimal, high-speed hash functions designed for environments where CPU cycles and memory are constrained. Its goals typically include:

Extremely small code and state size
Low-latency compute per input byte
Acceptable distribution and low collision rate for typical non-adversarial workloads

Use cases where µHash shines:

In-memory caches and hash tables on resource-constrained nodes
Approximate deduplication for streaming data
Fast fingerprinting of short keys (IDs, UUIDs, short strings)
Bloom filters and other probabilistic data structures where small, fast hashes are preferable to cryptographic hashes

Why Rust?

Rust combines performance, control, and strong compile-time guarantees:

Zero-cost abstractions and predictable performance
Ownership model eliminates many classes of memory bugs (useful in long-running services)
Portable: compiles to many targets, including embedded platforms
Rich ecosystem for async IO and networking (tokio, async-std), and for data structures (hashbrown)

Pairing µHash with Rust gives a compact, safe, and fast lookup service suitable for production use.

Design overview

A lookup service takes keys (strings or binary) and returns values (objects, pointers, or small payloads). Key design aspects:

Hash function: µHash produces a small fixed-size fingerprint (e.g., 64 bits) for keys.
Table structure: open addressing (linear/quadratic probing) or chaining. For speed and cache-efficiency, open addressing with power-of-two table sizes is preferred.
Collision handling: because µHash is not cryptographic, collisions are possible; store full keys or an additional verification hash when correctness matters.
Concurrency: reader-heavy workloads benefit from lock-free or read-optimized designs (concurrent reads with occasional writes).
Persistence and replication: optionally back the in-memory store with a write-ahead log or replication for durability.

Key choices and trade-offs

Fingerprint size: 64-bit fingerprints are usually a sweet spot. Smaller reduces memory but increases collision probability.
Storing full keys: storing the entire key alongside the fingerprint prevents false positives at the cost of memory. For short keys this is inexpensive.
Probing strategy: linear probing is simple and cache-friendly, but clustering can degrade performance at high load factors. Quadratic probing reduces clustering but slightly more complex.
Resizing: doubling the table on growth keeps average lookup time O(1) but requires rehashing. Use incremental resizing to avoid pauses.

Implementation outline in Rust

We’ll sketch an implementation focusing on clarity and performance. Key components:

µHash implementation (simple, fast 64-bit)
Hash table with open addressing and linear probing
API: insert(key, value), get(key) -> Option, remove(key)

Note: this article provides a sketch and guidance; production code requires robust testing, benchmarking, and tuning.

µHash: a minimal 64-bit hash

A common pattern for lightweight hashing is a tiny mix of multiplication and XORs with byte-wise processing. Example approach (conceptual):

Start with a seed (e.g., a 64-bit constant or derived from a runtime random)
For each aligned chunk (u64 or u32), mix with multiply and rotate
For remaining bytes, mix in a lighter way
Finalize with an avalanche step (xorshift/multiply)

This yields a compact, fast 64-bit fingerprint suitable for non-adversarial use.

Data layout

Use a contiguous vector for buckets. Each bucket contains:

fingerprint: u64 (0 reserved for empty)
key offset/length or inline key (for small keys)
value (or index to values storage)

To simplify, store keys inline up to a small size and otherwise store them in a separate slab.

Example: core structures (conceptual Rust)

use std::mem; const EMPTY_FINGERPRINT: u64 = 0; struct Bucket {     fp: u64,     key: Option<String>, // for clarity; optimize later     value: Option<Vec<u8>>, } struct LookupTable {     buckets: Vec<Bucket>,     len: usize, }

This naive layout prioritizes clarity. For high performance, replace Option fields with manual tagging and use raw arrays or packed structs to minimize padding.

Insert and lookup logic

Compute fingerprint = mu_hash(key)
Compute initial index = (fingerprint as usize) & (buckets.len() – 1)
Probe linearly: if bucket is empty, write fingerprint, key, and value; if fingerprint matches, compare full key to avoid false positive; otherwise continue probing.
On removal, use tombstone markers or backwards-shift entries to preserve probe invariants.

Resizing: when load factor exceeds threshold (e.g., 0.7), allocate a new buckets array double the size and reinsert entries.

Concurrency model

For a high-throughput service:

Use sharding: partition keyspace across N shards (each a single-threaded LookupTable). This yields lock-free reads within shards and simple mutexes per shard for writes. Shard selection via top bits of fingerprint.
For lock-free reads with occasional writes: use atomic pointers and copy-on-write per shard. Reads follow an arc pointer to the current table; writes clone and swap pointer. This is read-optimized but increases write cost and memory churn.
For full concurrency: consider using concurrent hash maps (dashmap, chashmap) or design a lock-free open-addressing map—complex and error-prone.

Networking and service layer

Use async Rust (tokio) for the service front end:

Protocols: simple binary TCP, HTTP (hyper/warp), or gRPC (tonic). Binary TCP is lowest-overhead.
Request handling: parse key, compute fingerprint, route to shard, perform lookup, return value or miss.
Batch requests: support pipelining or batched lookups to exploit locality and reduce per-request overhead.

Example tokio handler (conceptual):

async fn handle_request(req: LookupRequest, store: Arc<Store>) -> LookupResponse {     let fp = mu_hash(&req.key);     let shard = store.select_shard(fp);     shard.get(&req.key).await.into() }

Benchmarks and tuning

Important metrics:

Lookup latency (P50, P95, P99)
Throughput (ops/sec)
Memory footprint per entry
CPU cycles per lookup

Tuning knobs:

Table size and load factor threshold
Fingerprint size
Shard count (tradeoff between lock contention and memory usage)
Probing strategy and cache-line alignment

Benchmark suggestions:

Use hyperfine / criterion for microbenchmarks and wrk/hey for network-level tests.
Test with realistic key distributions (Zipfian, uniform) and key sizes.

Handling collisions and adversarial inputs

Because µHash is non-cryptographic, an attacker who can choose keys could engineer collisions. Mitigations:

Use a randomly chosen runtime seed for µHash to prevent precomputed collisions.
Store full keys and verify equality on fingerprint match. This keeps correctness even if collisions occur.
Rate-limit or authenticate clients to reduce exposure to malicious workloads.

Persistence and durability

For durability, combine the in-memory table with:

Write-ahead log (append-only): append key-value updates to disk before applying in memory; on restart replay log.
Snapshotting: periodically dump table to disk and truncate WAL.
Replication: replicate updates to one or more peers synchronously or asynchronously.

Design for eventual consistency vs strong consistency depending on application needs.

Example real-world optimizations

Compact bucket struct to 16 bytes (u64 fingerprint + u64 metadata) and store keys in separate slab to reduce cache miss footprint.
Use SIMD or unrolled loops in mu_hash inner loop for long keys.
Align buckets to cache lines and place frequently-read fields (fingerprint) first to reduce cache pollution.
Use lock striping or per-cpu shards to avoid cross-core contention.

Testing and correctness

Unit tests for insert/get/remove with many edge cases (collisions, wraparound).
Fuzz tests on hash function and table operations.
Property-based tests for invariants like probe sequence correctness and eventual findability of inserted keys.
Benchmark tests to detect regressions.

Security considerations

Treat µHash as a non-cryptographic fingerprint; do not use it for authentication, signatures, or integrity checks.
Use runtime seeds to avoid predictable collisions.
Validate input sizes and rates to avoid denial-of-service via huge keys or high-frequency writes.

Conclusion

Combining a compact high-speed fingerprint like µHash with Rust’s performance and safety produces a powerful foundation for a fast lookup service. By carefully designing the hash table layout, choosing a probing strategy, and adding practical defenses (full-key verification, runtime seeds), you can achieve low-latency, memory-efficient lookups suitable for embedded or high-throughput environments. Start with a clear API and small, well-tested core; then measure and iterate—optimizing hotspots and tuning shard counts, probe strategies, and memory layouts based on real workloads.