Fuzzy password authenticated key exchange (fuzzy PAKE) protocols enable two parties to securely exchange a session-key for further communication. The parties only need to share a low entropy password. The passwords do not even need to be identical, but can contain some errors. This may be due to typos, or because the passwords were created from noisy biometric readings. In this paper we provide an overview and comparison of existing fuzzy PAKE protocols. Furthermore, we analyze certain security properties of these protocols and argue that the protocols can be expected to be slightly more secure in practice than can be inferred from their theoretical guarantees.
Despite much progress, general-purpose secure multi-party computation (MPC) with active security may still be prohibitively expensive in settings with large input datasets. This particularly applies to the secure evaluation of graph algorithms, where each party holds a subset of a large graph. Recently, Araki et al. (ACM CCS '21) showed that dedicated solutions may provide significantly better efficiency if the input graph is sparse. In particular, they provide an efficient protocol for the secure evaluation of "message passing" algorithms, such as the PageRank algorithm. Their protocol's computation and communication complexity are both $\tilde{O}(M\cdot B)$ instead of the $O(M^2)$ complexity achieved by general-purpose MPC protocols, where $M$ denotes the number of nodes and $B$ the (average) number of incoming edges per node. On the downside, their approach achieves only a relatively weak security notion; $1$-out-of-$3$ malicious security with selective abort. In this work, we show that PageRank can instead be captured efficiently as a restricted multiplication straight-line (RMS) program, and present a new actively secure MPC protocol tailored to handle RMS programs. In particular, we show that the local knowledge of the participants can be leveraged towards the first maliciously-secure protocol with communication complexity linear in $M$, independently of the sparsity of the graph. We present two variants of our protocol. In our communication-optimized protocol, going from semi-honest to malicious security only introduces a small communication overhead, but results in quadratic computation complexity $O(M^2)$. In our balanced protocol, we still achieve a linear communication complexity $O(M)$, although with worse constants, but a significantly better computational complexity scaling with $O(M\cdot B)$. Additionally, our protocols achieve security with identifiable abort and can tolerate up to $n-1$ corruptions.
Private set intersection (PSI) is a cryptographic functionality for two parties to learn the intersection of their input sets, without leaking any other information. Circuit-PSI is a stronger PSI functionality where the parties learn only a secret-shared form of the desired intersection, thus without revealing the intersection directly. These secret shares can subsequently serve as input to a secure multiparty computation of any function on this intersection. In this paper we consider several settings in which parties take part in multiple Circuit-PSI executions with the same input set, and aim to amortize communications and computations. To that end, we build up a new framework for Circuit-PSI around generalizations of oblivious (programmable) PRFs that are extended with offline setup phases. We present several efficient instantiations of this framework with new security proofs for this setting. As a side result, we obtain a slight improvement in communication and computation complexity over the state-of-the art Circuit-PSI protocol by Bienstock et al. (USENIX '23). Additionally, we present a novel Circuit-PSI protocol from a PRF with secret-shared outputs, which has linear communication and computation complexity in the parties' input set sizes, and incidentally, it realizes ``almost malicious'' security, making it the first major step in this direction since the protocol by Huang et al. (NDSS '12). Lastly, we derive the potential amortizations over multiple protocol executions, and observe that each of the presented instantiations is favorable in at least one of the multiple-execution settings.
Third-party private set intersection (PSI) enables two parties, each holding a private set to compute their intersection and reveal the result only to an inputless third party. In this paper, we present efficient third-party PSI protocols, which significantly lower the computational workload compared to prior work. Our work is motivated by real-world applications such as contact tracing whereby expedition is essential while concurrently preserving privacy. Our construction attains a near-linear computational complexity of $O(n^{1+\varepsilon})$ for large dataset size $n$, where $\varepsilon>0$ is any fixed constant, and achieves post-quantum security. For a quantum-safe third-party PSI protocol, this significantly improves upon the current known best of $O(n^{2.5+o(1)})$. Our improvements stem from algorithmic changes and the incorporation of new techniques along with precise parameter selections to achieve a tight asymptotic bound.
The substitution box (S-box) is often used as the only nonlinear component in symmetric-key ciphers, leading to a significant impact on the implementation performance of ciphers in both classical and quantum application scenarios by S-box circuits. Taking the Pauli-X gate, the CNOT gate, and the Toffoli gate (i.e., the NCT gate set) as the underlying logic gates, this work investigates the quantum circuit implementation of S-boxes based on the SAT solver. Firstly, we propose encoding methods of the logic gates and the NCT-based circuit, based on which we construct STP models for implementing S-boxes. By applying the proposed models to the S-boxes of several well-known cryptographic algorithms, we construct optimal implementations with different criteria for the 4-bit S-boxes and provide the implementation bounds of different criteria for the 5-bit S-boxes. Since S-boxes in the same affine equivalence class share most of the important properties, we then build STP models to further investigate optimizing S-box circuits based on affine equivalence. According to the applications, for almost all the tested 4-bit S-boxes, there always exists an equivalent S-box that can be implemented with half the number of logic gates. Besides, we encode some important cryptographic properties and construct STP models to design S-boxes with given criteria configurations on implementation and properties. As an application, we find an S-box with the same cryptographic properties as the S-box of KECCAK that can be implemented with only 5 NCT gates, even though the application of our models indicates that implementing the KECCAK S-box requires more than 9 NCT gates. Notably, the inputs of the proposed models are tweakable, which makes the models possess some functions not currently available in the public tools for constructing optimized NCT-based circuits for S-boxes.
Let $\star: G \times X \rightarrow X$ be the action of a group $G$ of size $N=|G|$ on a set $X$. Let $y = g \star x \in X$ be a group action dlog instance, where our goal is to compute the unknown group element $g \in G$ from the known set elements $x,y \in X$. The Galbraith-Hess-Smart (GHS) collision finding algorithm solves the group action dlog in $N^{\frac 1 2}$ steps with polynomial memory. We show that group action dlogs are suitable for precomputation attacks. More precisely, for any $s,t$ our precomputation algorithm computes within $st$ steps a hint of space complexity $s$, which allows to solve any group action dlog in an online phase within $t$ steps. A typical instantiation is $s=t=N^{\frac 1 3}$, which gives precomputation time $N^{\frac 2 3}$ and space $N^{\frac 1 3}$, and online time only $N^{\frac 1 3}$. Moreover, we show that solving multiple group action dlog instances $y_1, \ldots , y_m$ allows for speedups. Namely, our collision finding algorithm solves $m$ group action dlogs in $\sqrt{m}N^{\frac 1 2}$ steps, instead of the straight-forward $mN^{\frac 1 2}$ steps required for running $m$ times GHS. Interestingly, our multi instance algorithm (with precomputation) can be seen as a special case of our precomputation algorithm. Our multiple instance approach can be freely combined with our precomputations, allowing for a variety of tradeoffs. Technically, our precomputation and multiple instance group action dlog attacks are adaptations of the techniques from the standard dlog setting in abelian groups. While such an adaptation seems natural, it is per se unclear which techniques transfer from the dlog to the more restricted group dlog setting, for which $X$ does not offer a group structure. Our algorithms have direct implications for all group action based cryptosystems, such as CSIDH and its variants. We provide experimental evidence that our techniques work well in the CSIDH setting.
In this short note we review the technique proposed at ToSC 2018 by Sadeghi et al. for attacks built upon several related-tweakey impossible differential trails. We show that the initial encryption queries are improper and lead the authors to misevaluating a filtering value in the key recovery phase. We identified 4 papers (from Eurocrypt, DCC, ToSC and ePrint) that follow on the results of Sadeghi et al., and in three of them the issue was propagated. We thus present a careful analysis of these types of attacks and give generic complexity formulas similar to the ones proposed by Boura et al. at Asiacrypt 2014. We apply these to the aforementioned papers and provide patched versions of their attacks. The main consequence is an increase in the memory complexity. We show that in many cases (a notable exception being quantum impossible differentials) it is possible to recover the numeric estimates of the flawed analysis, and in all cases we were able to build a correct attack reaching the same number of rounds.
In this work-in-progress, we present a series of protocols to efficiently prove statements about strings in context-free grammars (CFGs). Our main protocol for proving proofs of correct parsing for strings in a CFG flexibly accommodates different instantiations of zero-knowledge proof systems as well as accumulation schemes. While improvements in the modular cryptographic primitives can be carried over for improvements in our protocols, even simpler proof systems, which do not support state-of-the-art techniques such as permutation checks can generate proofs of correct parsing of a string of size $n$ by proving the correctness of a circuit of size $\mathcal{O}(cn)$, where $c$ is the cost of verifying a set membership proof in the chosen accumulation scheme.
Adaptor signatures can be viewed as a generalized form of the standard digital signature schemes where a secret randomness is hidden within a signature. Adaptor signatures are a recent cryptographic primitive and are becoming an important tool for blockchain applications such as cryptocurrencies to reduce on-chain costs, improve fungibility, and contribute to off-chain forms of payment in payment-channel networks, payment-channel hubs, and atomic swaps. However, currently used adaptor signature constructions are vulnerable to quantum adversaries due to Shor’s algorithm. In this work, we introduce SQIAsignHD, a new quantum-resistant adaptor signature scheme based on isogenies of supersingular elliptic curves, using SQIsignHD - as the underlying signature scheme - and exploiting the idea of the artificial orientation on the supersingular isogeny Diffie-Hellman key exchange protocol, SIDH, as the underlying hard relation. We, furthermore, show that our scheme is secure in the Quantum Random Oracle Model (QROM).
Running machine learning algorithms on encrypted data is a way forward to marry functionality needs common in industry with the important concerns for privacy when working with potentially sensitive data. While there is already a growing field on this topic and a variety of protocols, mostly employing fully homomorphic encryption or performing secure multiparty computation (MPC), we are the first to propose a protocol that makes use of a specialized encryption scheme that allows to do secure comparisons on ciphertexts and update these ciphertexts to be encryptions of the same plaintexts but under a new key. We call this notion Updatable Order-Revealing Encryption (uORE) and provide a secure construction using a key-homomorphic pseudorandom function. In a second step, we use this scheme to construct an efficient three-round protocol between two parties to compute a decision tree (or forest) on labeled data provided by both parties. The protocol is in the passively-secure setting and has some leakage on the data that arises from the comparison function on the ciphertexts. We motivate how our protocol can be compiled into an actively-secure protocol with less leakage using secure enclaves, in a graceful degradation manner, e.g. falling back to the uORE leakage, if the enclave becomes fully transparent. We also analyze the leakage of this approach, giving an upper bound on the leaked information. Analyzing the performance of our protocol shows that this approach allows us to be much more efficient (especially w.r.t. the number of rounds) than current MPC-based approaches, hence allowing for an interesting trade-off between security and performance.
Fully Homomorphic Encryption (FHE) is a powerful tool that brings privacy and security to all sorts of applications by allowing us to perform additions and multiplications directly on ciphertexts without the need of the secret key. Some applications of FHE that were previously overlooked but have recently been gaining traction are data compression and image processing. Practically, FHE enables applications such as private satellite searching, private object recognition, or even encrypted video editing. We propose a practical FHE-friendly image compression and processing pipeline where an image can be compressed and encrypted on the client-side, sent to a server which decompresses it homomorphically and then performs image processing in the encrypted domain before returning the encrypted result to the client. Inspired by JPEG, our pipeline also relies on discrete cosine transforms and quantization to simplify the representation of an image in the frequency domain, making it possible to effectively use a compression algorithm. This pipeline is designed to be compatible with existing image-processing techniques in FHE, such as pixel-wise processing and convolutional filters. Using this technique, a high-definition ($1024\times1024$) image can be homomorphically decompressed, processed with a convolutional filter and re-compressed in under $24.7$s, while using ~8GB memory.
Side-channel analysis is an important part of the security evaluations of hardware components and more specifically of those that include cryptographic algorithms. Profiling attacks are among the most powerful attacks as they assume the attacker has access to a clone device of the one under attack. Using the clone device allows the attacker to make a profile of physical leakages linked to the execution of algorithms. This work focuses on the characteristics of this profile and the information that can be extracted from its application to the attack traces. More specifically, looking at unsuccessful attacks, it shows that by carefully ordering the attack traces used and limiting their number, better results can be achieved with the same profile. Using this method allows us to consider the classical attack method, i.e. where the traces are randomly ordered, as the worst case scenario. The best case scenario is when the attacker is able to successfully order and select the best attack traces. A method for identifying efficient ordering when using deep learning models as profiles is also provided. A new loss function "Scoring loss" is dedicated to training machine learning models that give a score to the attack prediction and the score can be used to order the predictions.
Hash chain based password systems are a useful way to guarantee authentication with one-time passwords. The core idea is specified in RFC 1760 as S/Key. At CCS 2017, Kogan et al. introduced T/Key, an improved password system where one-time passwords are only valid for a limited time period. They proved security of their construction in the random oracle model under a basic modeling of the adversary. In this work, we make various advances in the analysis and instantiation of hash chain based password systems. Firstly, we describe a slight generalization called U/Key that allows for more flexibility in the instantiation and analysis, and we develop a security model that refines the adversarial strength into offline and online complexity, that can be used beyond the random oracle model, and that allows to argue multi-user security directly. Secondly, we derive a new security proof of U/Key in the random oracle model, as well as dedicated and tighter security proofs of U/Key instantiated with a sponge construction and a truncated permutation. When applied to T/Key, these results improve significantly over the earlier results: whereas the originally suggested instantiation using SHA-256 achieved 128 bits of security using a hash function with a state size of 384 bits, with a truncated permutation construction one can achieve 128 bits of security already with a state size of 256 bits.
Analyzing user data while protecting the privacy of individuals remains a big challenge. Trusted execution environments (TEEs) are a possible solution as they protect processes and Virtual Machines (VMs) against malicious hosts. However, TEEs can leak access patterns to code and to the data being processed. Furthermore, when data is stored in a TEE database, the data volume required to answer a query is another unwanted side channel that contains sensitive information. Both types of information leaks, access patterns and volume patterns, allow for database reconstruction attacks. In this paper, we present Menhir, an oblivious TEE database that hides access patterns with ORAM guarantees and volume patterns through differential privacy. The database allows range and point queries with SQL-like WHERE-clauses. It builds on the state-of-the-art oblivious AVL tree construction Oblix (S&P'18), which by itself does not protect against volume leakage. We show how volume leakage can be exploited in range queries and improve the construction to mitigate this type of attack. We prove the correctness and obliviousness of Menhir. Our evaluation shows that our approach is feasible and scales well with the number of rows and columns in the database.
We show a polynomial time quantum algorithm for solving the learning with errors problem (LWE) with certain polynomial modulus-noise ratios. Combining with the reductions from lattice problems to LWE shown by Regev [J.ACM 2009], we obtain polynomial time quantum algorithms for solving the decisional shortest vector problem (GapSVP) and the shortest independent vector problem (SIVP) for all $n$-dimensional lattices within approximation factors of $\tilde{\Omega}(n^{4.5})$. Previously, no polynomial or even subexponential time quantum algorithms were known for solving GapSVP or SIVP for all lattices within any polynomial approximation factors. To develop a quantum algorithm for solving LWE, we mainly introduce two new techniques. First, we introduce Gaussian functions with complex variances in the design of quantum algorithms. In particular, we exploit the feature of the Karst wave in the discrete Fourier transform of complex Gaussian functions. Second, we use windowed quantum Fourier transform with complex Gaussian windows, which allows us to combine the information from both time and frequency domains. Using those techniques, we first convert the LWE instance into quantum states with purely imaginary Gaussian amplitudes, then convert purely imaginary Gaussian states into classical linear equations over the LWE secret and error terms, and finally solve the linear system of equations using Gaussian elimination. This gives a polynomial time quantum algorithm for solving LWE.
Structured Encryption (StE) enables a client to securely store and query data stored on an untrusted server. Recent constructions of StE have moved beyond basic queries, and now support large subsets of SQL. However, the security of these constructions is poorly understood, and no systematic analysis has been performed. We address this by providing the first leakage-abuse attacks against StE for SQL schemes. Our attacks can be run by a passive adversary on a server with access to some information about the distribution of underlying data, a common model in prior work. They achieve partial query recovery against select operations and partial plaintext recovery against join operations. We prove the optimality and near-optimality of two new attacks, in a Bayesian inference framework. We complement our theoretical results with an empirical investigation testing the performance of our attacks against real-world data and show they can successfully recover a substantial proportion of queries and plaintexts. In addition to our new attacks, we provide proofs showing that the conditional optimality of a previously proposed leakage-abuse attack and that inference against join operations is NP-hard in general.
In this paper, we introduce a new framework for constructing linkable ring signatures (LRS). Our framework is based purely on signatures of knowledge (SoK) which allows one to issue signatures on behalf of any NP-statement using the corresponding witness. Our framework enjoys the following advantages: (1) the security of the resulting LRS depends only on the security of the underlying SoK; (2) the resulting LRS naturally supports online/offline signing (resp. verification), where the output of the offline signing (resp. verification) can be re-used across signatures of the same ring. For a ring size $n$, our framework requires a SoK of the NP statement with size $\log n$. To instantiate our framework, we adapt the well-known post-quantum secure non-interactive argument of knowledge (NIAoK), ethSTARK, into an SoK. This SoK inherents the post-quantum security and has a signature size poly-logarithmic in the size of the NP statement. Thus, our resulting LRS has a signature size of $O(\text{polylog}(\log n))$. By comparison, existing post-quantum ring signatures, regardless of linkability considerations, have signature sizes of $O(\log n)$ at best. Furthermore, leveraging online/offline verification, part of the verification of signatures on the same ring can be shared, resulting in a state-of-the-art amortized verification cost of $O(\text{polylog}(\log n))$. Our LRS also performs favourably against existing schemes in practical scenarios. Concretely, our scheme has the smallest signature size among all post-quantum ring signatures for any ring size larger than $32$. In our experiment, at $128$-bit security and ring size of $1024$, our LRS has a size of $29$KB, and an amortized verification cost of $0.3$ ms, surpassing the state-of-the-art by a significant margin. Even without considering amortization, the verification time for a single signature is $128$ ms, which is still 10x better than state-of-the-art succinct construction, marking it comparable to those featuring linear signature size. A similar performance advantage can also be seen at signing.
This paper presents an in-depth exploration of the development and deployment of a Layer 1 (L1) blockchain designed to underpin metaverse experiences. As the digital and physical realms become increasingly intertwined, the metaverse emerges as a frontier for innovation, demanding robust, scalable, and secure infrastructure. The core of our investigation centers around the challenges and insights gained from constructing a blockchain framework capable of supporting the vast, dynamic environments of the metaverse. Through the development process, we identified key areas of focus: interoperability, performance and scalability, cost, identity, privacy, security, and accessibility. Our findings indicate that most challenges can be effectively addressed through the implementation of cryptography and subnets (i.e., Avalanche architecture), which allow for segmented, optimized environments within the broader metaverse ecosystem. This approach not only enhances performance but also provides a flexible framework for managing the diverse needs of metaverse applications.
Fault attacks that exploit the propagation of effective/ineffective faults present a richer attack surface than Differential Fault Attacks, in the sense that the adversary depends on a single bit of information to eventually leak secret cryptographic material. In the recent past, a number of propagation-based fault attacks on Lattice-based Key Encapsulation Mechanisms have been proposed; many of which have no known countermeasures. In this work, we propose an orthogonal countermeasure principle that does not follow adhoc strategies (like shuffling operations on secret coefficients), but rather depends on cryptographically-backed guarantees to provide quantifiable defence against aforementioned fault attacks. Concretely, we propose a framework that uses rejection sampling (which has been traditionally used as alternatives to trapdoors) to convert otherwise deterministic algorithms to probabilistic ones. Our specific goals allow careful selection of distributions such that our framework functions with a constant number of retries (around $2-3$) for unfaulted executions. In other words, should a fault be injected, the probability of success is negligible; for correct execution however, the probability of success is overwhelmingly high. Using our framework, we hence enable probabilistic decryptions in Kyber, NewHope, and Masked Kyber, and completely cut-off fault propagation in known attacks on these constructions, allowing a sound defence against known fault attacks in literature.
MRAE security is an important goal for many AEAD applications where the nonce uniqueness cannot be maintained and security risks are significant. However, MRAE schemes can be quite expensive. Two of the SoTA MRAE-secure schemes; Deoxys-II and AES-GCM-SIV rely on internal parallelism and special instructions to achieve competitive performance. However, they both suffer from the same bottleneck, they have at least one call to the underlying primitive that cannot be parallelized to any other call. Romulus-M and LMDAE are two other more recent MRAE secure schemes based on TBCs that target low area hardware. However, they are unparallelizable so they are slower than their counterparts. In this paper, we present two new AEAD modes and four instantiations based on Tweakable Block Ciphers. These new modes target equipping high-speed applications on parallel platforms with nonce misuse resistant AEAD (MRAE). The first mode, LLSIV, targets similar performance on single-core platforms to SCT-2, while eliminating the bottlenecks that make SCT-2 not fully parallelizable. The enhanced parallelism allows LLSIV to encrypt significantly more blocks on parallel platforms, compared to SCT-2, in the same amount of time. LLSIV is based on the NaT MAC, where each ciphertext block can itself be viewed as an instance of NaT when the plaintext is prepended with $0^n$. The trade-off is that LLSIV requires the inverse function of the TBC. However, the inverse function is used only once per message and we demonstrate that for parallel implementations it represents a very small overhead. We give an instantiation of LLSIV based on the SKINNY-128-384 TBC, and a pruned scheme, dubbed pLLSIV, which targets enhanced performance compared both SCT-2 and LLSIV on all platforms, while having reduced security claims. It relies on the recently popularized prove-then-prune methodology to take full advantage of the properties of LLSIV. This leads to a significant performance improvement, making pLLSIV even faster than online TBC-based schemes that are not MRAE-secure. Last but not least, we give an instantiation that uses the primitives used in AES-GCM-SIV: the PolyVal hash function and AES. Our instantiation is faster than AES-GCM-SIV on all platforms and have better bounds. On the other hand, it relies on the ideal cipher model as it uses the ICE TBC proposed as part of the Remus AEAD design. The second mode we describe is LLDFV. It uses ideas from LLSIV combined the Decryption-Fast SIV (DFV) framework proposed recently by Minematsu. The goal is to reduce the number of calls to the TBC by one, while making the scheme as parallelizable as LLSIV. This makes the scheme faster that DFV on all platforms.
FUTURE is a recently proposed lightweight block cipher that achieved a remarkable hardware performance due to careful design decisions. FUTURE is an Advanced Encryption Standard (AES)-like Substitution-Permutation Network (SPN) with 10 rounds, whose round function consists of four components, i.e., SubCell, MixColumn, ShiftRow and AddRoundKey. Unlike AES, it is a 64-bit-size block cipher with a 128-bit secret key, and the state can be arranged into 16 cells. Therefore, the operations of FUTURE including its S-box is defined over $\mathbb{F}_2^4$. The previous studies have shown that the integral properties of 4-bit S-boxes are usually weaker than larger-size S-boxes, thus the number of rounds of FUTURE, i.e., 10 rounds only, might be too aggressive to provide enough resistance against integral cryptanalysis. In this paper, we mount the integral cryptanalysis on FUTURE. With state-of-the-art detection techniques, we identify several integral distinguishers of 7 rounds of FUTURE. By extending this 7-round distinguisher by 3 forward rounds, we manage to recover all the 128 bits secret keys from the full FUTURE cipher without the full codebook for the first time. To further achieve better time complexity, we also present a key recovery attack on full FUTURE with full codebook. Both attacks have better time complexity than existing results.
We present a solution to the open problem of designing an efficient, unbiased and timing attack-resistant shuffling algorithm for NTRU fixed-weight sampling. Although it can be implemented without timing leakages of secret data in any architecture, we illustrate with ARMv7-M and ARMv8-A implementations; for the latter, we take advantage of architectural features such as NEON and conditional instructions, which are representative of features available on architectures targeting similar systems, such as Intel. Our proposed algorithm improves asymptotically upon the current approach, which is based on constant-time sorting networks ($O(n)$ versus $O(n \log^2 n)$), and an implementation of the new algorithm is also faster in practice, by a factor of up to $6.91\ (591\%)$ on ARMv8-A cores and $12.58\ (1158\%)$ on the Cortex-M4; it also requires fewer uniform random bits. This translates into performance improvements for NTRU encapsulation, compared to state-of-the-art implementations, of up to 50% on ARMv8-A cores and 71% on the Cortex-M4, and small improvements to key generation (up to 2.7% on ARMv8-A cores and 6.1% on the Cortex-M4), with negligible impact on code size and a slight improvement in RAM usage for the Cortex-M4.
In this work we define the notion of a permutation correlation $(\pi,A,B,C)$ s.t. $\pi(A)=B+C$ for a random permutation $\pi$ of $n$ elements and vectors $A,B,C\in \mathbb{F}^n$. We demonstrate the utility of this correlation for a wide range of applications. The correlation can be derandomized to obliviously shuffle a secret-shared list, permute a secret-shared list by a secret-shared permutation, and more. Similar techniques have emerged as a popular building block for the honest majority protocols when efficient batched random access is required, e.g. collaborative filtering, sorting, database joins, graph algorithms, and many more. We present the highly flexible notion of permutation correlation and argue that it should be viewed as a first class primitive in the MPC practitioner's toolbox. We give two novel protocols for efficiently generating a random permutation correlation. The first makes use of recent advances in MPC-friendly PRFs to obtain a protocol requiring $O(n\ell)$ OTs/time and constant rounds to permute $n$ $\ell$-bit strings. Unlike the modern OT extension techniques we rely on, this was previously only achievable from relatively more expensive public-key cryptography, e.g. Paillier or LWE. We implement this protocol and demonstrate that it can generate a correlation for $n=2^{20},\ell=128$ in 19 seconds and $\sim2\ell n$ communication, a 15 \& $1.1\times$ improvement over the LWE solution of Juvekar at al. (CCS 2018). The second protocol is based on pseudo-random correlation generators and achieves an overhead that is \emph{sublinear} in the string length $\ell$, i.e. the communication and number of OTs is $O(n\log \ell)$. The latter protocol is ideal for the setting when you need to repeatedly permute secret-shared data by the same permutation, e.g. in graph algorithms. Finally, we present a suite of highly efficient protocols for performing various batched random access operations. These include a class of protocols we refer to as \emph{extraction}, which allow a user to \emph{mark} a subset of $X$ and have this subset obliviously extracted into an output list. Additionally, the parties can specify an \emph{arbitrary} selection function $\sigma:[n]\rightarrow[n]$ and obtain shares of $\sigma(X)=(X_{\sigma(1)},\ldots,X_{\sigma(n)})$ from $X$. We implement these protocols and report on their performance.
Nextcloud is a leading cloud storage platform with more than 20 million users. Nextcloud offers an end-to-end encryption (E2EE) feature that is claimed to be able “to keep extremely sensitive data fully secure even in case of a full server breach”. They also claim that the Nextcloud server “has Zero Knowledge, that is, never has access to any of the data or keys in unencrypted form”. This is achieved by having encryption and decryption operations that are done using file keys that are only available to Nextcloud clients, with those file keys being protected by a key hierarchy that ultimately relies on long passphrases known exclusively to the users. We provide the first detailed documentation and security analysis of Nextcloud's E2EE feature. Nextcloud's strong security claims motivate conducting the analysis in the setting where the server itself is considered malicious. We present three distinct attacks against the E2EE security guarantees in this setting. Each one enables the confidentiality and integrity of all user files to be compromised. All three attacks are fully practical and we have built proof-of-concept implementations for each. The vulnerabilities make it trivial for a malicious Nextcloud server to access and manipulate users' data. We have responsibly disclosed the three vulnerabilities to Nextcloud. The second and third vulnerabilities have been remediated. The first was addressed by temporarily disabling file sharing from the E2EE feature until a redesign of the feature can be made. We reflect on broader lessons that can be learned for designers of E2EE systems.
Despite ensuring both consistency and liveness, state machine replication protocols remain vulnerable to adversaries who manipulate the transaction order. To address this, researchers have proposed order-fairness techniques that rely either on building dependency graphs between transactions, or on assigning sequence numbers to transactions. Existing protocols that handle dependency graphs suffer from sub-optimal performance, resilience, or security. On the other hand, Pompe (OSDI '20) introduced the novel ordering notion of ordering linearizability that uses sequence numbers. However, Pompe's ordering only applies to committed transactions, opening the door to order-fairness violation when there are network delays, and vulnerability to performance downgrade when there are Byzantine attackers. A stronger notion, fair separability, was introduced to require ordering on all observed transactions. However, no implementation of fair separability exists. In this paper, we introduce a protocol for state machine replication with fair separability ($\mathsf{SMRFS}$); moreover, our protocol has communication complexity $\mathcal{O}(n\ell+\lambda n^2)$, where $n$ is the number of processes, $\ell$ is the input (transaction) size, and $\lambda$ is the security parameter. This is optimal when $\ell\geq \lambda n$, while previous works have cubic communication. To the best of our knowledge, $\mathsf{SMRFS}$ is the first protocol to achieve fair separability, and the first implementation of fair ordering that has optimal communication complexity and optimal Byzantine resilience.
A distributed OPRF allows a client to evaluate a pseudorandom function on an input chosen by the client using a distributed key shared among multiple servers. This primitive ensures that the servers learn nothing about the input nor the output, and the client learns nothing about the key. We present a post-quantum OPRF in a distributed server setting, which can be computed in a single round of communication between a client and the servers. The only server-to-server communication occurs during a precomputation phase. The algorithm is based on the Legendre PRF which can be computed from a single MPC multiplication among the servers. To this end we propose two MPC approaches to evaluate the Legendre PRF based on replicated and optimised secret sharing, respectively. Furthermore, we propose two methods that allows us to perform MPC multiplication in an efficient way that are of independent interest. By employing the latter, we are able to evaluate the Legendre OPRF in a fashion that is quantum secure, verifiable and secure against malicious adversaries under a threshold assumption, as well as computable in a single round of interaction. To the best of our knowledge, our proposed distributed OPRFs are the first post-quantum secure offering such properties. We also provide an implementation of our protocols, and benchmark it against existing OPRF constructions.
Common random string model is a popular model in classical cryptography with many constructions proposed in this model. We study a quantum analogue of this model called the common Haar state model, which was also studied in an independent work by Chen, Coladangelo and Sattath (arXiv 2024). In this model, every party in the cryptographic system receives many copies of one or more i.i.d Haar states. Our main result is the construction of a statistically secure PRSG with: (a) the output length of the PRSG is strictly larger than the key size, (b) the security holds even if the adversary receives $O\left(\frac{\lambda}{(\log(\lambda))^{1.01}} \right)$ copies of the pseudorandom state. We show the optimality of our construction by showing a matching lower bound. Our construction is simple and its analysis uses elementary techniques.
At S$\&$P 2023, a family of secure three-party computing protocols called Bicoptor was mainly proposed by Huawei Technology in China, which is used to compute non-linear functions in privacy preserving machine learning. In these protocols, two parties $P_0, P_1$ respectively hold the corresponding shares of the secret, while a third party $P_2$ acts as an assistant. The authors claimed that neither party in the Bicoptor can independently compromise the confidentiality of the input, intermediate, or output. In this paper, we point out that this claim is incorrect. The assistant $P_2$ can recover the secret in the DReLU protocol, which is the basis of Bicoptor. The restoration of its secret will result in the security of the remaining protocols in Bicoptor being compromised. Specifically, we provide two secret recovery attacks regarding the DReLU protocol. The first attack method belongs to a clever enumeration method, which is mainly due to the derivation of the modular equation about the secret and its share. The key of the second attack lies in solving the small integer root problem of a modular equation, as the lattices involved are only 3 or 4 dimensions, the LLL algorithm can effectively work. For the system settings selected by Bicoptor, our experiment shows that the desired secret in the DReLU protocol can be restored within one second on a personal computer. Therefore, when using cryptographic protocols in the field of privacy preserving machine learning, it is not only important to pay attention to design overhead, but also to be particularly careful of potential security threats.
The MPC-in-the-Head (MPCitH) paradigm is widely used for building post-quantum signature schemes, as it provides a versatile way to design proofs of knowledge based on hard problems. Over the years, the MPCitH landscape has changed significantly, with the most recent improvement coming from VOLE-in-the-Head (VOLEitH) and Threshold-Computation-in-the-Head (TCitH). While a straightforward application of these frameworks already improve the existing MPCitH-based signatures, we show in this work that we can adapt the arithmetic constraints representing the underlying security assumptions (here called the modeling) to achieve smaller sizes using these new techniques. More precisely, we explore existing modelings for the rank syndrome decoding (RSD) and MinRank problems and we introduce a new modeling, named dual support decomposition, which achieves better sizes with the VOLEitH and TCitH frameworks by minimizing the size of the witnesses. While this modeling is naturally more efficient than the other ones for a large set of parameters, we show that it is possible to go even further and explore new areas of parameters. With these new modeling and parameters, we obtain low-size witnesses which drastically reduces the size of the ``arithmetic part'' of the signature. We apply our new modeling to both TCitH and VOLEitH frameworks and compare our results to RYDE, MiRitH, and MIRA signature schemes. We obtain signature sizes below 4 kB for 128 bits of security with N=256 parties (a.k.a. leaves in the GGM trees) and going as low as $\approx$ 3.5 kB with N=2048, for both RSD and MinRank. This represents an improvement of more than 1.5 kB compared to the original submissions to the 2023 NIST call for additional signatures. We also note that recent techniques optimizing the sizes of GGM trees are applicable to our schemes and further reduce the signature sizes by a few hundred bytes, bringing them arround 3 kB (for 128 bits of security with N=2048).
Timed cryptography studies primitives that retain their security only for a predetermined amount of time, such as proofs of sequential work and time-lock puzzles. This feature has proven to be useful in a large number of practical applications, e.g. randomness generation, sealed-bid auctions, and fair multi-party computation. However, the current state of affairs in timed cryptography is unsatisfactory: Virtually all efficient constructions rely on a single sequentiality assumption, namely that repeated squaring in unknown order groups cannot be parallelised. This is a single point of failure in the classical setting and is even false against quantum adversaries. In this work we put forward a new sequentiality assumption, which essentially says that a repeated application of the standard lattice-based hash function cannot be parallelised. We provide concrete evidence of the validity of this assumption and perform some initial cryptanalysis. We also propose a new template to construct proofs of sequential work, based on lattice techniques.
In this note we propose a variant (with four sub-variants) of the Charles--Goren--Lauter (CGL) hash function using Lattès maps over finite fields. These maps define dynamical systems on the projective line. The underlying idea is that these maps ``hide'' the $j$-invariants in each step in the isogeny chain, similar to the Merkle--Damgård construction. This might circumvent the problem concerning the knowledge of the starting (or ending) curve's endomorphism ring, which is known to create collisions in the CGL hash function. Let us, already in the abstract, preface this note by remarking that we have not done any explicit computer experiments and benchmarks (apart from a small test on the speed of computing the orbits), nor do we make any security claims. Part of the reason for this is the author's lack of competence in complexity theory and evaluation of security claims. Instead this note is only meant as a presentation of the main idea, the hope being that someone more competent will find it interesting enough to pursue further.
In general the discrete logarithm problem is a hard problem in the elliptic curve cryptography, and the best known solving algorithm have exponential running time. But there exists a class of curves, i.e. supersingular elliptic curves, whose discrete logarithm problem has a subexponential solving algorithm called the MOV attack. In 1999, the cost of the MOV reduction is still computationally expensive due to the power of computers. We analysis the cost of the MOV reduction and the discrete logarithm problem of the curves in \cite{HSSI99} using Magma with an ordinary computer.
With the growing adoption of cloud computing, the ability to store data and delegate computations to powerful and affordable cloud servers have become advantageous for both companies and individual users. However, the security of cloud computing has emerged as a significant concern. Particularly, Cloud Service Providers (CSPs) cannot assure data confidentiality and computations integrity in mission-critical applications. In this paper, we propose a confidential and verifiable delegation scheme that advances and overcomes major performance limitations of existing Secure Multiparty Computation (MPC) and Zero Knowledge Proof (ZKP). Secret-shared Data and delegated computations to multiple cloud servers remain completely confidential as long as there is at least one honest MPC server. Moreover, results are guaranteed to be valid even if all the participating servers are malicious. Specifically, we design an efficient protocol based on interactive proofs, such that most of the computations generating the proof can be done locally on each server. In addition, we propose a special protocol for matrix multiplication where the overhead of generating the proof is asymptotically smaller than the time to evaluate the result in MPC. Experimental evaluation demonstrates that our scheme significantly outperforms prior work, with the online prover time being 1-2 orders of magnitude faster. Notably, in the matrix multiplication protocol, only a minimal 2% of the total time is spent on the proof generation. Furthermore, we conducted tests on machine learning inference tasks. We executed the protocol for a fully-connected neural network with 3 layers on the MNIST dataset and it takes 2.6 seconds to compute the inference in MPC and generate the proof, 88× faster than prior work. We also tested the convolutional neural network of Lenet with 2 convolution layers and 3 dense layers and the running time is less than 300 seconds across three servers.
We study the possibility of schemes whose public parameters have been generated along with a backdoor. We consider the goal of the big-brother adversary to be two-fold: It desires utility (it can break the scheme) but also exclusivity (nobody else can). Starting with hash functions, we give new, strong definitions for these two goals, calling the combination high effectiveness. We then present a construction of a backdoored hash function that is highly effective, meaning provably meets our new definition. As an application, we investigate forgery of X.509 certificates that use this hash function. We then consider signatures, again giving a definition of high effectiveness, and showing that it can be achieved. But we also give some positive results, namely that for the Okamoto and Katz-Wang signature schemes, certain natural backdoor strategies are provably futile. Our backdoored constructions serve to warn that backdoors can be more powerful and damaging than previously conceived, and to help defenders and developers identify potential backdoors by illustrating how they might be built. Our positive results illustrate that some schemes do offer more backdoor resistance than others, which may make them preferable.
The Gradient Boosting Decision Tree (GBDT) is a well-known machine learning algorithm, which achieves high performance and outstanding interpretability in real-world scenes such as fraud detection, online marketing and risk management. Meanwhile, two data owners can jointly train a GBDT model without disclosing their private dataset by executing secure Multi-Party Computation (MPC) protocols. In this work, we propose NodeGuard, a highly efficient two party computation (2PC) framework for large-scale GBDT training and inference. NodeGuard guarantees that no sensitive intermediate results are leaked in the training and inference. The efficiency advantage of NodeGuard is achieved by applying a novel keyed bucket aggregation protocol, which optimizes the communication and computation complexity globally in the training. Additionally, we introduce a probabilistic approximate division protocol with an optimization for re-scaling, when the divisor is publicly known. Finally, we compare NodeGuard to state-of-the-art frameworks, and we show that NodeGuard is extremely efficient. It can improve the privacy preserving GBDT training performance by a factor of 5.0 to 131 in LAN and 2.7 to 457 in WAN.
Cryptographic protocols are hard to design and prove correct, as witnessed by the ever-growing list of attacks even on protocol standards. Symbolic models of cryptography enable automated formal security proofs of such protocols against an idealized cryptographic model, which abstracts away from the algebraic properties of cryptographic schemes and thus misses attacks. Computational models of cryptography yield rigorous guarantees but support at present only interactive proofs and/or restricted classes of protocols (e.g., stateless ones). A promising approach is given by the computationally complete symbolic attacker (CCSA) model, formalized in the BC Logic, which aims at bridging and getting the best of the two worlds, obtaining cryptographic guarantees by symbolic protocol analysis. The BC Logic is supported by a recently developed interactive theorem prover, namely Squirrel, which enables machine-checked interactive security proofs, as opposed to automated ones, thus requiring expert knowledge both in the cryptographic space as well as on the reasoning side. In this paper, we introduce the CryptoVampire cryptographic protocol verifier, which for the first time fully automates proofs of trace properties in the BC Logic. The key technical contribution is a first-order formalization of protocol properties with tailored handling of subterm relations. As such, we overcome the burden of interactive proving in higher-order logic and automatically establish soundness of cryptographic protocols using only first-order reasoning. Our first-order encoding of cryptographic protocols is challenging for various reasons. On the theoretical side, we restrict full first-order logic with cryptographic axioms to ensure that, by losing the expressivity of the higher-order BC Logic, we do not lose soundness of cryptographic protocols in our first-order encoding. On the practical side, CryptoVampire integrates dedicated proof techniques using first-order saturation algorithms and heuristics, which all together enable leveraging the state-of-the-art Vampire first-order automated theorem prover as the underlying proving engine of CryptoVampire. Our experimental results showcase the effectiveness of CryptoVampire as a standalone verifier as well as in terms of automation support for Squirrel.
Cache side-channels are a major threat to cryptographic implementations, particularly block ciphers. Traditional manual hardening methods transform block ciphers into Boolean circuits, a practice refined since the late 90s. The only existing automatic approach based on Boolean circuits achieves security but suffers from performance issues. This paper examines the use of Lookup Tables (LUTs) for automatic hardening of block ciphers against cache side-channel attacks. We present a novel method combining LUT-based synthesis with quantitative static analysis in our HyCaMi framework. Applied to seven block cipher implementations, HyCaMi shows significant improvement in efficiency, being 9.5$\times$ more efficient than previous methods, while effectively protecting against cache side-channel attacks. Additionally, for the first time, we explore balancing speed with security by adjusting LUT sizes, providing faster performance with slightly reduced leakage guarantees, suitable for scenarios where absolute security and speed must be balanced.
We reflect on our experiences analysing cryptography deployed “in the wild” and give recommendations to fellow researchers about this process.
In 2021, Sterner proposed a commitment scheme based on supersingular isogenies. For this scheme to be binding, one relies on a trusted party to generate a starting supersingular elliptic curve of unknown endomorphism ring. In fact, the knowledge of the endomorphism ring allows one to compute an endomorphism of degree a power of a given small prime. Such an endomorphism can then be split into two to obtain two different messages with the same commitment. This is the reason why one needs a curve of unknown endomorphism ring, and the only known way to generate such supersingular curves is to rely on a trusted party or on some expensive multiparty computation. We observe that if the degree of the endomorphism in play is well chosen, then the knowledge of the endomorphism ring is not sufficient to efficiently compute such an endomorphism and in some particular cases, one can even prove that endomorphism of a certain degree do not exist. Leveraging these observations, we adapt Sterner's commitment scheme in such a way that the endomorphism ring of the starting curve can be known and public. This allows us to obtain isogeny-based commitment schemes which can be instantiated without trusted setup requirements.
In this article, we focus on deriving an easily implementable and efficient method of constructing units of the group ring of dihedral group. We provide a necessary and sufficient condition that relates the units in the group ring of dihedral group with the units in the group ring of cyclic group. Using this relation and the methods available for inversion in the group ring of the cyclic group, we introduce an algorithm to construct units efficiently and check its performance experimentally.
This paper introduces a new method for training decision trees and random forests using CKKS homomorphic encryption (HE) in cloud environments, enhancing data privacy from multiple sources. The innovative Homomorphic Binary Decision Tree (HBDT) method utilizes a modified Gini Impurity index (MGI) for node splitting in encrypted data scenarios. Notably, the proposed training approach operates in a single cloud security domain without the need for decryption, addressing key challenges in privacy-preserving machine learning. We also propose an efficient method for inference utilizing only addition for path evaluation even when both models and inputs are encrypted, achieving O(1) multiplicative depth. Experiments demonstrate that this method surpasses the previous study by Akavia et al.'s by at least 3.7 times in the speed of inference. The study also expands to privacy-preserving random forests, with GPU acceleration ensuring feasibly efficient performance in both training and inference.
Determining the complexity of computing Gröbner bases is an important problem both in theory and in practice, and for that the solving degree plays a key role. In this paper, we study the solving degrees of affine semi-regular sequences and their homogenized sequences. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gröbner bases of the ideal generated by an affine semi-regular sequence. This paper is a sequel of the authors’ previous work and gives additional results on the solving degrees and important behaviors of Gröbner basis computation.
Fischlin's transform (CRYPTO 2005) is an alternative to the Fiat-Shamir transform that enables straight-line extraction when proving knowledge. In this work we focus on the problem of using the Fischlin transform to construct UC-secure zero-knowledge from Sigma protocols, since UC security -- that guarantees security under general concurrent composition -- requires straight-line (non-rewinding) simulators. We provide a slightly simplified transform that is much easier to understand, and present algorithmic and implementation optimizations that significantly improve the running time. It appears that the main obstacles to the use of Fischlin in practice is its computational cost and implementation complexity (with multiple parameters that need to be chosen). We provide clear guidelines and a simple methodology for choosing parameters, and show that with our optimizations the running-time is far lower than expected. For just one example, on a 2023 MacBook, the cost of proving the knowledge of discrete log with Fischlin is only 0.41ms (on a single core). We also extend the transform so that it can be applied to batch proofs, and show how this can be much more efficient than individually proving each statement. As a contribution of independent interest, we present a new algorithm for polynomial evaluation on any series of sequential points that does not require roots of unity. We hope that this paper will both encourage and help practitioners implement the Fischlin transform where relevant.
Biometric authentication eliminates the need for users to remember secrets and serves as a convenient mechanism for user authentication. Traditional implementations of biometric-based authentication store sensitive user biometry on the server and the server becomes an attractive target of attack and a source of large-scale unintended disclosure of biometric data. To mitigate the problem, we can resort to privacy-preserving computation and store only protected biometrics on the server. While a variety of secure computation techniques is available, our analysis of privacy-preserving biometric computation and biometric authentication constructions revealed that available solutions fall short of addressing the challenges of privacy-preserving biometric authentication. Thus, in this work we put forward new constructions to address the challenges. Our solutions employ a helper server and use strong threat models, where a client is always assumed to be malicious, while the helper server can be semi-honest or malicious. We also determined that standard secure multi-party computation security definitions are insufficient to properly demonstrate security in the two-phase (enrollment and authentication) entity authentication application. We thus extend the model and formally show security in the multi-phase setting, where information can flow from one phase to another and the set of participants can change between the phases. We implement our constructions and show that they exhibit practical performance for authentication in real time.
The sumcheck protocol is an interactive protocol for verifying the sum of a low-degree polynomial over a hypercube. This protocol is widely used in practice, where an efficient implementation of the (honest) prover algorithm is paramount. Prior work contributes highly-efficient prover algorithms for the notable special case of multilinear polynomials (and related settings): [CTY11] uses logarithmic space but runs in superlinear time; in contrast, [VSBW13] runs in linear time but uses linear space. In this short note, we present a family of prover algorithms for the multilinear sumcheck protocol that offer new time-space tradeoffs. In particular, we recover the aforementioned algorithms as special cases. Moreover, we provide an efficient implementation of the new algorithms, and our experiments show that the asymptotics translate into new concrete efficiency tradeoffs.
In "Keeping up with the KEMs" Cremers et al. introduced various binding models for KEMs. The authors show that ML-KEM is LEAK-BIND-K-CT and LEAK-BIND-K-PK, i.e. binding the ciphertext and the public key in the case of an adversary having access, but not being able to manipulate the key material. They further conjecture that ML-KEM also has MAL-BIND-K-PK, but not MAL-BIND-K-CT, the binding of public key or ciphertext to the shared secret in the case of an attacker with the ability to manipulate the key material. This short paper demonstrates that ML-KEM does neither have MALBIND-K-CT nor MAL-BIND-K-PK, due to the attacker being able to produce mal-formed private keys, giving concrete examples for both. We also suggest mitigations, and sketch a proof for binding both ciphertext and public key when the attacker is not able to manipulate the private key as liberally.
In their paper, Wei et al. proposed a lightweight protocol for conditional privacy-preserving authentication in VANET. The protocol aims to achieve ultra-low transmission delay and efficient system secret key (SSK) updating. Their protocol uses a signature scheme with message recovery to authenticate messages. This scheme provides security against adaptively chosen message attacks. However, our analysis reveals a critical vulnerability in the scheme. It is susceptible to replay attacks, meaning a malicious vehicle can replay a message multiple times at different timestamps. This action undermines the overall effectiveness of conditional privacy. We suggest possible solutions to address these vulnerabilities and enhance the security of VANET communication.
Biextensions associated to line bundles on abelian varieties allows to reinterpret the usual Weil, Tate, Ate, optimal Ate, \ldots, pairings as monodromy pairings. We introduce a cubical arithmetic, derived from the canonical cubical torsor structure of these line bundles, to obtain an efficient arithmetic of these biextensions. This unifies and extends Miller's standard algorithm to compute pairings along with other algorithms like elliptic nets and theta functions, and allows to adapt these algorithms to pairings on any model of abelian varieties with a polarisation $\Phi_D$, as long as we have an explicit theorem of the square for $D$. In particular, we give explicit formulas for the arithmetic of the biextension (and cubical torsor structure) associated to the divisor $D=2(0_E)$ on an elliptic curve. We derive very efficient pairing formulas on elliptic curves and Kummer lines. Notably for generic pairings on Montgomery curves, our cubical biextension ladder algorithm to compute pairings costs only $15M$ by bits, which as far as I know is faster than any pairing doubling formula in the literature.
Quantum computers have the potential to solve hard problems that are nearly impossible to solve by classical computers, this has sparked a surge of research to apply quantum technology and algorithm against the cryptographic systems to evaluate for its quantum resistance. In the process of selecting post-quantum standards, NIST categorizes security levels based on the complexity that quantum computers would require to crack AES encryption (levels 1, 3, and 5) and SHA-2 or SHA-3 (levels 2 and 4). In assessing the security strength of cryptographic algorithms against quantum threats, accurate predictions of quantum resources are crucial. Following the work of Jaques et al. in Eurocrypt 2020, NIST estimated security levels 1, 3, and 5, corresponding to quantum circuit size for finding the key for AES-128, AES-192, and AES-256, respectively. This work has been recently followed-up by Huang et al. (Asiacrypt'22) and Liu et al. (Asiacrypt'23). However, for levels 2 and 4, which relate to the collision finding for the SHA-2 and SHA-3 hash functions, quantum attack complexities are probably not well-studied. In this paper, we present novel techniques for optimizing the quantum circuit implementations for SHA-2 and SHA-3 algorithms in all the categories specified by NIST. After that, we evaluate the quantum circuits of target cryptographic hash functions for quantum collision search. Finally, we define the quantum attack complexity for levels 2 and 4, and comment on the security strength of the extended level. We present new concepts to optimize the quantum circuits at the component level and the architecture level.
As we enter 2024, the post-quantum cryptographic algorithm Dilithium, which emerged from the National Institute of Standards and Technology post-quantum cryptography competition, has now reached the deployment stage. This paper focuses on the practical security of Dilithium. We performed practical attacks on Dilithium2 on an STM32F4 platform. Our results indicate that an attack can be executed with just two signatures within five minutes, with a single signature offering a 60% probability of recovering the private key within one hour. Specifically, we analyze the polynomial addition $z=y+\mathbf{cs}_1$. The attack is conducted in two phases: initially applying side-channel analysis to recover the values of $y$ or $\mathbf{cs}_1$, followed by solving an equation system of $\mathbf{cs}_1$ with error. We introduce using Linear Regression-based profiled attack to recover $y$, leveraging the mathematical properties of adding large and small numbers, requiring only one trace to achieve a 40% success rate. In contrast, a CNN-based template attack, trained with leakage from 200 signatures, enables $\mathbf{cs}_1$ recovery from a single trace with a 74% success rate. Further, by exploiting the constraint $z=y+\mathbf{cs}_1$, the combined leakages of $y$ and $\mathbf{cs}_1$ increase the success rate for $\mathbf{cs}_1$ recovery to 92%. Additionally, we propose a constrained optimization-based residual analysis to solve the equation set $\mathbf{cs}_1 = b$ with error. This method can function independently or as a preprocessing step in combination with Belief Propagation or Integer Linear Programming. Experimental results show that with a 95% correctness rate in the equation set, this method can directly recover the private key $\mathbf{s}_1$ with an 83% success rate in just five seconds. Even with a correctness rate as low as 5%, the method can still recover the private key $\mathbf{s}_1$ in 5 minutes using the system of equations generated by about 200 signatures.
Banks publish daily a list of available securities/assets (axe list) to selected clients to help them effectively locate Long (buy) or Short (sell) trades at reduced financing rates. This reduces costs for the bank, as the list aggregates the bank's internal firm inventory per asset for all clients of long as well as short trades. However, this is somewhat problematic: (1) the bank's inventory is revealed; (2) trades of clients who contribute to the aggregated list, particularly those deemed large, are revealed to other clients. Clients conducting sizable trades with the bank and possessing a portion of the aggregated asset exceeding $50\%$ are considered to be concentrated clients. This could potentially reveal a trading concentrated client's activity to their competitors, thus providing an unfair advantage over the market. Atlas-X Axe Obfuscation, powered by new differential private methods, enables a bank to obfuscate its published axe list on a daily basis while under continual observation, thus maintaining an acceptable inventory Profit and Loss (P\&L) cost pertaining to the noisy obfuscated axe list while reducing the clients' trading activity leakage. Our main differential private innovation is a differential private aggregator for streams (time series data) of both positive and negative integers under continual observation. For the last two years, Atlas-X system has been live in production across three major regions—USA, Europe, and Asia—at J.P. Morgan, a major financial institution, facilitating significant profitability. To our knowledge, it is the first differential privacy solution to be deployed in the financial sector. We also report benchmarks of our algorithm based on (anonymous) real and synthetic data to showcase the quality of our obfuscation and its success in production.
In this work we demonstrate for the first time that a full FHE bootstrapping operation can be proven using a SNARK in practice. We do so by designing an arithmetic circuit for the bootstrapping operation and prove it using plonky2. We are able to prove the circuit on an AWS C6i.metal instance in about 20 minutes. Proof size is about 200kB and verification takes less than 10ms. As the basis of our bootstrapping operation we use TFHE's programmable bootstrapping and modify it in a few places to more efficiently represent it as an arithmetic circuit (while maintaining full functionality and security). In order to achieve our results in a memory-efficient way, we take advantage of the structure of the computation and plonky2's ability to efficiently prove its own verification circuit to implement a recursion-based IVC scheme.
The threshold implementation technique has been proposed in 2006 by Nikova et al. as a countermeasure to mitigate cryptographic side-channel attacks on hardware implementations when the effect of glitches is taken into account. This technique is based on Boolean sharing (also called masking) and it was developed for securing symmetric ciphers such as AES. In 2023, Piccione et al. proposed a general construction of threshold implementations that is universal for S-boxes that are bijective vectorial Boolean function (functions from a binary vector space $\mathbb{F}_{2}^n$ into itself). In this paper, we further generalize the construction and we propose a general theory of threshold implementations for any type of S-boxes. We investigate the case of functions (also not necessarily bijective) that are defined between two finite Abelian groups and we use the definition of threshold implementation given by Dhooghe et al. in 2019 with additive sharing. To show that this generalized notion is as useful as the one for Boolean sharing, we prove that many classical results still hold. An important tool in this theory is the notion of functional degree introduced by Aichinger and Moosbauer in 2021 which generalizes the algebraic degree of a vectorial Boolean function. We show that if $F$ has functional degree (at most) $d$ and the cardinality of the domain is divisible by the cardinality of the codomain, then $F$ admits a threshold implementation $\mathcal{F}$ with $s\geq d+2$ shares in input and $d+2$ shares in output. Moreover, we provide a complete overview on which are the available tools for studying the functional degree and how to represent those functions using a Integer-Valued (IV) polynomial representation. Then we apply our theory for the following applications: defining the inner product masking in our setting, providing a threshold implementation of any multiplication map, and computing the functional degree and the IV polynomial representations of the conversion maps between $\mathbb{F}_p^n$ and $\mathbb{Z}_{p^n}$.
The Fiat-Shamir with Aborts paradigm (FSwA) uses rejection sampling to remove a secret’s dependency on a given source distribution. Recent results revealed that unlike the uniform distribution in the hypercube, both the continuous Gaussian and the uniform distribution within the hypersphere minimise the rejection rate and the size of the proof of knowledge. However, in practice both these distributions suffer from the complexity of their sampler. So far, those three distributions are the only available alternatives, but none of them offer the best of all worlds: competitive proof of knowledge size and rejection rate with a simple sampler. We introduce a new generic framework for FSwA using polytope based rejection sampling to enable a wider variety of constructions. As a matter of fact, this framework is the first to generalise these results to integral distributions. To complement the lack of alternatives, we also propose a new polytope construction, whose uniform sampler approaches in simplicity that of the hypercube. At the same time, it provides competitive proof of knowledge size compared to that obtained from the Gaussian distribution. Concurrently, we share some experimental improvements of our construction to further reduce the proof size. Finally, we propose a signature based on the FSwA paradigm using both our framework and construction. We prove it to be competitive with Haetae in signature size and with Dilithium on sampler simplicity.
A compilation of various notes written over the years on some algorithms on abelian varieties.
The XOR of two independent permutations (XoP) is a well-known construction for achieving security beyond the birthday bound when implementing a pseudorandom function using a block cipher (i.e., a pseudorandom permutation). The idealized construction (where the permutations are uniformly chosen and independent) and its variants have been extensively analyzed over nearly 25 years. The best-known asymptotic information-theoretic indistinguishability bound for the XoP construction is $O(q/2^{1.5n})$, derived by Eberhard in 2017, where $q$ is the number of queries and $n$ is the block length. A generalization of the XoP construction outputs the XOR of $r \geq 2$ independent permutations, and has also received significant attention in both the single-user and multi-user settings. In particular, for $r = 3$, the best-known bound (obtained by Choi et al. [ASIACRYPT'22]) is about $q^2/2^{2.5 n}$ in the single-user setting and $\sqrt{u} q_{\max}^2/2^{2.5 n}$ in the multi-user setting (where $u$ is the number of users and $q_{\max}$ is the number of queries per user). In this paper, we prove an indistinguishability bound of $q/2^{(r - 0.5)n}$ for the (generalized) XoP construction in the single-user setting, and a bound of $\sqrt{u} q_{\max}/2^{(r - 0.5)n}$ in the multi-user setting. In particular, for $r=2$, we obtain the bounds $q/2^{1.5n}$ and $\sqrt{u} q_{\max}/2^{1.5n}$ in single-user and multi-user settings, respectively. For $r=3$ the corresponding bounds are $q/2^{2.5n}$ and $\sqrt{u} q_{\max}/2^{2.5n}$. All of these bounds hold assuming $q < 2^{n}/2$ (or $q_{\max} < 2^{n}/2$). Compared to previous works, we improve all the best-known bounds for the (generalized) XoP construction in the multi-user setting, and the best-known bounds for the generalized XoP construction for $r \geq 3$ in the single-user setting (assuming $q \geq 2^{n/2}$). For the basic two-permutation XoP construction in the single-user setting, our concrete bound of $q/2^{1.5n}$ stands in contrast to the asymptotic bound of $O(q/2^{1.5n})$ by Eberhard. Since all of our bounds are matched (up to constant factors) for $q > 2^{n/2}$ by attacks published by Patarin in 2008 (and their generalizations to the multi-user setting), they are all tight. We obtain our results by Fourier analysis of Boolean functions. Most of our technical work involves bounding (sums of) Fourier coefficients of the density function associated with sampling without replacement. While the proof of Eberhard relies on similar bounds, our proof is elementary and simpler.
In 1994, Langford and Hellman introduced differential-linear (DL) cryptanalysis, with the idea of decomposing the block cipher E into two parts, EU and EL, such that EU exhibits a high-probability differential trail, while EL has a high-correlation linear trail.Combining these trails forms a distinguisher for E, assuming independence between EU and EL. The dependency between the two parts of DL distinguishers remained unaddressed until EUROCRYPT 2019, where Bar-On et al. introduced the DLCT framework, resolving the issue up to one S-box layer. However, extending the DLCT framework to formalize the dependency between the two parts for multiple rounds remained an open problem. In this paper, we first tackle this problem from the perspective of boomerang analysis. By examining the relationships between DLCT, DDT, and LAT, we introduce a set of new tables facilitating the formulation of dependencies between the two parts of the DL distinguisher across multiple rounds. Then, as the main contribution, we introduce a highly versatile and easy-to-use automatic tool for exploring DL distinguishers, inspired by automatic tools for boomerang distinguishers. This tool considers the dependency between differential and linear trails across multiple rounds. We apply our tool to various symmetric-key primitives, and in all applications, we either present the first DL distinguishers or enhance the best-known ones. We achieve successful results against Ascon, AES, SERPENT, PRESENT, SKINNY, TWINE, CLEFIA, WARP, LBlock, Simeck, and KNOT. Furthermore, we demonstrate that, in some cases, DL distinguishers outperform boomerang distinguishers significantly.
Secure merge considers the problem of combining two sorted lists into a single sorted secret-shared list. Merge is a fundamental building block for many real-world applications. For example, secure merge can implement a large number of SQL-like database joins, which are essential for almost any data processing task such as privacy-preserving fraud detection, ad conversion rates, data deduplication, and many more. We present two constructions with communication bandwidth and rounds tradeoff. Logstar, our bandwidth-optimized construction, takes inspiration from Falk and Ostrovsky (ITC, 2021) and runs in $O(n\log^*n)$ time and communication with $O(\log n)$ rounds. In particular, for all conceivable $n$, the $\log^*n$ factor will be equal to the constant $2$, and therefore we achieve a near-linear running time. Median, our rounds-optimized construction, builds on the classic parallel medians-based insecure merge approach of Valiant (SIAM J. Comput., 1975), later explored in the secure setting by Blunk et al. (2022), and requires $O(n \log^c n)$, $1<c<2$, communication with $O(\log \log n)$ rounds. We introduce two additional constructions that merge input lists of different sizes. SquareRootMerge, merges lists of sizes $n^{\frac{1}{2}}$ and $n$, and runs in $O(n)$ time and communication with $O(\log n)$ rounds. CubeRootMerge is closely inspired by Blunk et al.'s (2022) construction and merges lists of sizes $n^{\frac{1}{3}}$ and $n$. It runs in $O(n)$ time and communication with $O(1)$ rounds. We optimize our constructions for concrete efficiency. Today, concretely efficient secure merge protocols rely on standard techniques such as GMW or generic sorting. These approaches require an $O(n \log n)$ size circuit of $O(\log n)$ depth. In contrast, our constructions are more efficient and also achieve superior asymptotics. We benchmark our constructions and obtain significant improvements. For example, Logstar reduces bandwidth costs $\approx 3.3\times$ and Median reduces rounds $\approx2.22\times$.
To enable parallel processing, the Directed Acyclic Graph (DAG) structure is introduced to the design of asynchronous Byzantine Fault Tolerant (BFT) consensus protocols, known as DAG-based BFT. Existing DAG-based BFT protocols operate in successive waves, with each wave containing three or four Reliable Broadcast (RBC) rounds to broadcast data, resulting in high latency due to the three communication steps required in each RBC. For instance, Tusk, a state-of-the-art DAG-based BFT protocol, has a good-case latency of 7 communication steps and an expected worst latency of 21 communication steps. To reduce latency, we propose GradedDAG, a new DAG-based BFT consensus protocol based on our adapted RBC called Graded RBC (GRBC) and the Consistent Broadcast (CBC), with each wave consisting of only one GRBC round and one CBC round. Through GRBC, a replica can deliver data with a grade of 1 or 2, and a non-faulty replica delivering the data with grade 2 can ensure that more than 2/3 of replicas have delivered the same data. Meanwhile, through CBC, data delivered by different non-faulty replicas must be identical. In each wave, a block in the GRBC round will be elected as the leader. If a leader block has been delivered with grade 2, it and all its ancestor blocks can be committed. GradedDAG offers a good-case latency of 4 communication steps and an expected worst latency of 7.5 communication steps, significantly lower than the state-of-theart. Experimental results demonstrate GradedDAG’s feasibility and efficiency.
A tweakable wide blockcipher is a construction which behaves in the same way as a tweakable blockcipher, with the difference that the actual block size is flexible. Due to this feature, a tweakable wide blockcipher can be directly used as a strong encryption scheme that provides full diffusion when encrypting plaintexts to ciphertexts and vice versa. Furthermore, it can be the basis of authenticated encryption schemes fulfilling the strongest security notions. In this paper, we present two instantiations of the docked double decker tweakable wide blockcipher: $\mathit{ddd}\text{-}\mathit{AES}$ and $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$. Both instances exclusively use similar building blocks as AES-GCM (AES and finite field multiplication), are designed for maximal parallelism, and hence, can make efficient use of existing hardware accelerators. Moreover, $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$ builds upon a novel conditionally beyond birthday bound secure pseudorandom function, a tweakable variant of the XOR of permutations, facilitating in the need to include a tweak in the AES evaluations without sacrificing flexibility in docked double decker. We furthermore introduce an authenticated encryption mode $\mathit{aaa}$ specifically tailored to be instantiated with $\mathit{ddd}\text{-}\mathit{AES}$ and $\mathit{bbb}\text{-}\mathit{ddd}\text{-}\mathit{AES}$, where special attentions is given to how the nonce and associated data can be processed. We prove that this mode is secure in the nonce-respecting setting, in the nonce-misuse setting, as well as in the setting where random nonces are used.
BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST's standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Chen et al. in SAC 2015. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances. In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE, we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model. More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, we show that masking at order 1, 2, 3, 4 and 5 implies a performance penalty of x5.8, x14.2, x24.4, x38 and x55.6 compared to order 0 (unmasked and unoptimized BIKE). This scaling is encouraging and no Boolean to Arithmetic conversion has been used.
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may obtain additional information about the secret when executing cryptographic protocols. This paper aims to answer the question of whether UCs leak information unintentionally, which can be leveraged by the adversary to disclose the configuration bits. In this regard, we propose the first photon emission analysis against UCs relying on computer vision-based approaches. We demonstrate that the adversary can utilize a cost-effective solution to take images to be processed by off-the-shelf algorithms to extract configuration bits. We examine the efficacy of our method in two scenarios: (1) the design is small enough to be captured in a single image during the attack phase, and (2) multiple images should be captured to launch the attack by deploying a divide-and-conquer strategy. To evaluate the effectiveness of our attack, we use metrics commonly applied in side-channel analysis, namely rank and success rate. By doing so, we show that our profiled photon emission analysis achieves a success rate of 1 by employing a few templates (concretely, only 18 images were used as templates).
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although state-of-the-art zkSNARKs are highly efficient for the verifier, the computational overhead for the prover is still orders of magnitude too high to warrant use in many applications. This overhead arises from several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and especially the multi-scalar multiplication (MSM) which constitutes the largest proportion. Therefore, further efficiency improvements are needed. In this paper, we focus on comprehensive optimization of running time and storage space required by the MSM algorithm on GPUs. Specifically, we propose a novel, modular and adaptive parameter configuration technique—elastic MSM to enable us to adjust the scale of MSM according to our own wishes by performing a corresponding amount of preprocessing. This technique enables us to fully unleash the potential of various efficient parallel MSM algorithms. We have implemented and tested elastic MSM over three prevailing parallel Pippenger algorithms on GPUs. Given a range of practical parameters, across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 90%, 8% and 36% (2.58×, 39% and 91%) speedup versus three state-of-the-art parallel Pippenger algorithms on GPUs, respectively. From another perspective, elastic MSM could also be regarded as a preprocessing technique over the well-known Pippenger algorithm, which is modular and could be used to accelerate almost all the most advanced parallel Pippenger algorithms on GPUs. Meanwhile, elastic MSM provides an adaptive trade-off between the running time and the extra storage space needed by parallel Pippenger algorithms on GPUs. This is the first preprocessing technique to retain the improved MSM computation brought by preprocessing under varying storage space limitations. Specifically, given a range of practical parameters, across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 192× and 223× (159× and 174×) speedup versus two state-of-the-art preprocessing parallel Pippenger algorithms on GPUs, respectively.
Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions, coprocessors and special-purpose cores to accelerate such operations. In particular, Apple ships an undocumented matrix-multiplication coprocessor, AMX, in hundreds of millions of mobile phones, tablets and personal computers. Our work repurposes AMX to implement polynomial multiplication and applies it to the NTRU cryptosystem, setting new speed records on the Apple M1 and M3 systems-on-chip (SoCs): polynomial multiplication, key generation, encapsulation and decapsulation are sped up by $1.54$-$3.07\times$, $1.08$-$1.33\times$, $1.11$-$1.50\times$ and $1.20$-$1.98\times$, respectively, over the previous state-of-the-art.
Key Encapsulation Mechanisms (KEMs) are a critical building block for hybrid encryption and modern security protocols, notably in the post-quantum setting. Given the asymmetric public key of a recipient, the primitive establishes a shared secret key between sender and recipient. In recent years, a large number of abstract designs and concrete implementations of KEMs have been proposed, e.g., in the context of the NIST process for post-quantum primitives. In this work, we (i) establish stronger security notions for KEMs, and (ii) develop a symbolic analysis method to analyze security protocols that use KEMs. First, we generalize existing security notions for KEMs in the computational setting, introduce several stronger security notions, and prove their relations. Our new properties formalize in which sense outputs of the KEM uniquely determine, i.e., bind, other values. Our new binding properties can be used, e.g., to prove the absence of attacks that were not captured by prior security notions, such as re-encapsulation attacks. Second, we develop a family of fine-grained symbolic models that correspond to our hierarchy of computational security notions, and are suitable for the automated analysis of KEM-based security protocols. We encode our models as a library in the framework of the Tamarin prover. Given a KEM-based protocol, our approach can automatically derive the minimal binding properties required from the KEM; or, if also given a concrete KEM, can analyze if the protocols meets its security goals. In case studies, Tamarin automatically discovers, e.g., that the key exchange protocol proposed in the original Kyber paper requires stronger properties from the KEM than were proven in the paper.
The advent of transformers has brought about significant advancements in traditional machine learning tasks. However, their pervasive deployment has raised concerns about the potential leakage of sensitive information during inference. Existing approaches using secure multiparty computation (MPC) face limitations when applied to transformers due to the extensive model size and resource-intensive matrix-matrix multiplications. In this paper, we present BOLT, a privacy-preserving inference framework for transformer models that supports efficient matrix multiplications and nonlinear computations. Combined with our novel machine learning optimizations, BOLT reduces the communication cost by 10.91x. Our evaluation on diverse datasets demonstrates that BOLT maintains comparable accuracy to floating-point models and achieves 4.8-9.5x faster inference across various network settings compared to the state-of-the-art system.
QARMAv2 is a general-purpose and hardware-oriented family of lightweight tweakable block ciphers (TBCs) introduced in ToSC 2023. QARMAv2, as a redesign of QARMAv1 with a longer tweak and tighter security margins, is also designed to be suitable for cryptographic memory protection and control flow integrity. The designers of QARMAv2 provided a relatively comprehensive security analysis in the design specification, e.g., some bounds for the number of attacked rounds in differential and boomerang analysis, together with some concrete impossible differential, zero-correlation, and integral distinguishers. As one of the first third-party cryptanalysis of QARMAv2, Hadipour et al. significantly improved the integral distinguishers of QARMAv2, and provided the longest concrete distinguishers of QARMAv2 up to now. However, they provided no key recovery attack based on their distinguishers. This paper delves into the cryptanalysis of QARMAv2 to enhance our understanding of its security. Given that the integral distinguishers of QARMAv2 are the longest concrete distinguishers for this cipher so far, we focus on integral attack. To this end, we first further improve the automatic tool introduced by Hadipour et al. for finding integral distinguishers of TBCs following the TWEAKEY framework. This new tool exploits the MixColumns property of QARMAv2 to find integral distinguishers more suitable for key recovery attacks. Then, we combine several techniques for integral key recovery attacks, e.g., Meet-in-the-middle and partial-sum techniques to build a fine-grained integral key recovery attack on QARMAv2. Notably, we demonstrate how to leverage the low data complexity of the integral distinguishers of QARMAv2 to reduce the memory complexity of the meet-in-the-middle technique. As a result, we successfully present the first concrete key recovery attacks on reduced-round versions of QARMAv2. This includes attacking 13 rounds of QARMAv2-64-128 with a single tweak block ($\mathscr{T} = 1$), 14 rounds of QARMAv2-64-128 with two independent tweak blocks ($\mathscr{T} = 2$), and 16 rounds of QARMAv2-128-256 with two independent tweak blocks ($\mathscr{T} = 2$), all in an unbalanced setting. Our attacks do not compromise the claimed security of QARMAv2, but they shed more light on the cryptanalysis of this cipher.
DCT is a beyond-birthday-bound~(BBB) deterministic authenticated encryption~(DAE) mode proposed by Forler et al. in ACISP 2016, ensuring integrity by redundancy. The instantiation of DCT employs the BRW polynomial, which is more efficient than the usual polynomial in GCM by reducing half of the multiplication operations. However, we show that DCT suffers from a small stretch problem similar to GCM. When the stretch length $\tau$ is small, choosing a special $m$-block message, we can reduce the number of queries required by a successful forgery to $\mathcal{O}(2^{\tau}/m)$. We emphasize that this attack efficiently balances space and time complexity but does not contradict the security bounds of DCT. Finally, we propose an improved scheme named Robust DCT~(RDCT) with a minor change to DCT, which improves the security when $\tau$ is small and makes it resist the above attack.
A multisignature scheme is used to aggregate signatures by multiple parties on a common message $m$ into a single short signature on $m$. Multisignatures are used widely in practice, most notably, in proof-of-stake consensus protocols. In existing multisignature schemes, the verifier needs the public keys of all the signers in order to verify a multisignature issued by some subset of signers. We construct new practical multisignature schemes with three properties: (i) the verifier only needs to store a constant size public key in order to verify a multisignature by an arbitrary subset of parties, (ii) signature size is constant beyond the description of the signing set, and (iii) signers generate their secret signing keys locally, that is, without a distributed key generation protocol. Existing schemes satisfy properties (ii) and (iii). The new capability is property (i) which dramatically reduces the verifier's memory requirements from linear in the number of signers to constant. We give two pairing-based constructions: one in the random oracle model and one in the plain model. We also show that by relaxing property (iii), that is, allowing for a simple distributed key generation protocol, we can further improve efficiency while continuing to satisfy properties (i) and (ii). We give a pairing-based scheme and a lattice-based scheme in this relaxed model. Our pairing based constructions are closely related to a multisignature scheme due to Boneh, Drijvers, and Neven (Asiacrypt 2018), but with several key differences.
SPHINCS+ is a signature scheme included in the first NIST post-quantum standard, that bases its security on the underlying hash primitive. As most of the runtime of SPHINCS+ is caused by the evaluation of several hash- and pseudo-random functions, instantiated via the hash primitive, offloading this computation to dedicated hardware accelerators is a natural step. In this work, we evaluate different architectures for hardware acceleration of such a hash primitive with respect to its use-case and evaluate them in the context of SPHINCS+. We attach hardware accelerators for different hash primitives (SHAKE256 and Asconxof for both full and round-reduced versions) to CPU interfaces having different transfer speeds. We show, that for most use-cases, data transfer determines the overall performance if accelerators are equipped with FIFOs.
It is well-known that Atomic Broadcast (AB) in asynchronous networks requires randomisation and that at most $t < n/3$ out of $n$ players are Byzantine corrupted. This is opposed to synchronous AB which can tolerate $t < n/2$ corruptions and can be deterministic. We show that these requirements can be conceptually separated by constructing an asynchronous AB protocol which tolerates $t < n/2$ corruptions from blackbox use of Common Coin and Reliable Broadcast (RB). We show the power of this conceptually simple contribution by instantiating RB under various assumptions to get AB under the same assumptions. Using this framework we obtain the first asynchronous AB with sub-quadratic communication and optimal corruption threshold $t < n/3$, and the first network agnostic AB which is optimistically responsive. The latter result is secure in a relaxed synchronous model where parties locally decide timeouts and do not have synchronized clocks. Finally, we provide asynchronous ABs with covert security and mixed adversary structures.
Integral, impossible-differential (ID), and zero-correlation (ZC) attacks are three of the most important attacks on block ciphers. However, manually finding these attacks can be a daunting task, which is why automated methods are becoming increasingly important. Most automatic tools regarding integral, ZC, and ID attacks have focused only on finding distinguishers rather than complete attacks. At EUROCRYPT 2023, Hadipour et al. proposed a generic and efficient constraint programming (CP) model based on satisfiability for finding ID, ZC, and integral distinguishers. This new model can be extended to a unified CP model for finding full key recovery attacks. However, it has limitations, including determining the contradiction location beforehand and a cell-wise model unsuitable for weakly aligned ciphers like Ascon and PRESENT. They also deferred developing a CP model for the partial-sum technique in key recovery as future work. In this paper, we enhance Hadipour et al.'s method in several ways. First, we remove the limitation of determining the contradiction location in advance. Second, we show how to extend the distinguisher model to a bit-wise model, considering the internal structure of S-boxes and keeping the model based on satisfiability. Third, we introduce a CP model for the partial-sum technique for the first time. To show the usefulness and versatility of our approach, we apply it to various designs, from strongly aligned ones like ForkSKINNY and QARMAv2 to weakly aligned ones such as Ascon and PRESENT, yielding significantly improved results. To mention a few of our results, we improve the integral distinguisher of QARMAv2-128 (resp. QARMAv2-64) by 7 (resp. 5) rounds, and the integral distinguisher of ForkSKINNY by 1 round, only thanks to our cell-wise distinguishe modelings. By using our new bit-wise modeling, our tool can find a group of $2^{155}$ 5-round ID and ZC distinguishers for Ascon in only one run, taking a few minutes on a regular laptop. The new CP model for the partial-sum technique enhances integral attacks on all SKINNY variants, notably improving the best attack on SKINNY-$n$-$n$ in the single-key setting by 1 round. We also enhance ID attacks on ForkSKINNY and provide the first analysis of this cipher in a limited reduced-round setting. Our methods are generic and applicable to other block ciphers.
This paper presents a procedure to construct parameterized families of prime-order endomorphism-equipped elliptic curves that are defined over the scalar field of pairing-friendly elliptic curve families such as Barreto–Lynn–Scott (BLS), Barreto–Naehrig (BN) and Kachisa–Schaefer–Scott (KSS), providing general formulas derived from the curves’ seeds. These so-called “embedded curves” are of major interest in SNARK applications that prove statements involving elliptic curve arithmetic i.e. digital signatures. In this paper, the mathematical groundwork is laid, and advantages of these embeddings are discussed. Additionally, practical examples are included at the end.
$\Sigma$-protocols are a widely utilized, relatively simple and well understood type of zero-knowledge proofs. However, the well known Schnorr $\Sigma$-protocol for proving knowledge of discrete logarithm in a cyclic group of known prime order, and similar protocols working over this type of groups, are hard to generalize to dealing with other groups. In particular with hidden order groups, due to the inability of the knowledge extractor to invert elements modulo the order. In this paper, we introduce a universal construction of $\Sigma$-protocols designed to prove knowledge of preimages of group homomorphisms for any abelian finite group. In order to do this, we first establish a general construction of a $\Sigma$-protocol for $\mathfrak{R}$-module homomorphism given only a linear secret sharing scheme over the ring $\mathfrak{R}$, where zero knowledge and special soundness can be related to the privacy and reconstruction properties of the secret sharing scheme. Then, we introduce a new construction of 2-out-of-$n$ packed black-box secret sharing scheme capable of sharing $k$ elements of an arbitrary (abelian, finite) group where each share consists of $k+\log n-3$ group elements. From these two elements we obtain a generic ``batch'' $\Sigma$-protocol for proving knowledge of $k$ preimages of elements via the same group homomorphism, which communicates $k+\lambda-3$ elements of the group to achieve $2^{-\lambda}$ knowledge error. For the case of class groups, we show that our $\Sigma$-protocol improves in several aspects on existing proofs for knowledge of discrete logarithm and other related statements that have been used in a number of works. Finally, we extend our constructions from group homomorphisms to the case of ZK-ready functions, introduced by Cramer and Damg\aa rd in Crypto 09, which in particular include the case of proofs of knowledge of plaintext (and randomness) for some linearly homomorphic encryption schemes such as Joye-Libert encryption. However, in the case of Joye-Libert, we show an even better alternative, using Shamir secret sharing over Galois rings, which achieves $2^{-k}$ knowledge soundness by communicating $k$ ciphertexts to prove $k$ statements.
We present staircase attack, the first attack on the incentive mechanism of the Proof-of-Stake (PoS) protocol used in Ethereum 2.0 beacon chain. Our attack targets the penalty of the incentive mechanism that penalizes inactive participation. Our attack can make honest validators suffer from penalties, even if they strictly follow the specification of the protocol. We show both theoretically and experimentally that if the adversary controls 29.6% stake in a moderate-size system, the attack can be launched continuously, so eventually all honest validators will lose their incentives. In contrast, the adversarial validators can still receive incentives, and the stake owned by the adversary can eventually exceed the $1/3$ threshold (system assumption), posing a threat to the security properties of the system. In practice, the attack feasibility is directly related to two parameters: the number of validators and the parameter MAX_ATTESTATION, the maximum number of attestations (i.e., votes) that can be included in each block. We further modify our attack such that, with current system setup (850,000 validators and MAX_ATTESTATION=128), our attack can be launched continuously with a probability of 80.25%. As a result, the incentives any honest validator receives are only 28.9% of its fair share.
Fully Homomorphic Encryption (FHE) is a powerful tool to achieve non-interactive privacy preserving protocols with optimal computation/communication complexity. However, the main disadvantage is that the actual communication cost (bandwidth) is high due to the large size of FHE ciphertexts. As a solution, a technique called transciphering (also known as Hybrid Homomorphic Encryption) was introduced to achieve almost optimal bandwidth for such protocols. However, all of existing works require clients to fix a precision for the messages or a mathematical structure for the message space beforehand. It results in unwanted constraints on the plaintext size or underlying structure of FHE based applications. In this article, we introduce a new approach for transciphering which does not require fixed message precision decided by the client, for the first time. In more detail, a client uses any kind of FHE-friendly symmetric cipher for $\{0,1\}$ to send its input data encrypted bit-by-bit, then the server can choose a precision $p$ depending on the application and homomorphically transforms the encrypted bits into FHE ciphertexts encrypting integers in $\mathbb{Z}_p$. To illustrate our new technique, we evaluate a transciphering using FiLIP cipher and adapt the most practical homomorphic evaluation technique [CCS'22] to keep the practical latency. As a result, our proof-of-concept implementation for $p$ from $2^2$ to $2^8$ takes only from $13$ ms to $137$ ms.
We present a construction, called Kirby, for building a variable-input-length pseudorandom function (VIL-PRF) from a $b$-bit permutation. For this construction we prove a tight bound of $b/2$ bits of security on the PRF distinguishing advantage in the random permutation model and in the multi-user setting. Similar to full-state keyed sponge/duplex, it supports full-state absorbing and additionally supports full-state squeezing, while the sponge/duplex can squeeze at most $b-c$ bits per permutation call, for a security level of $c$ bits. This advantage is especially relevant on constrained platforms when using a permutation with small width $b$. For instance, for $b=256$ at equal security strength the squeezing rate of Kirby is twice that of keyed sponge/duplex. This construction could be seen as a generalization of the construction underlying the stream cipher family Salsa. Furthermore, we define a simple mode on top of Kirby that turns it into a deck function with parallel expansion. This is similar to Farfalle but it has a much smaller memory footprint. Furthermore we prove that in the Kirby construction, the leakage of intermediate states does not allow recovering the key or earlier states.
Truncated differential cryptanalyses were introduced by Knudsen in 1994. They are a well-known family of attacks that has arguably received less attention than some other variants of differential attacks. This paper gives some new insights into the theory of truncated differential attacks, specifically the provable security of SPN ciphers with MDS diffusion matrices against this type of attack. Furthermore, our study extends to various versions within the QARMA family of block ciphers, unveiling the only valid instances of single-tweak attacks on 10-round QARMAv1-64, 10-round QARMAv1-128, and 10- and 11-round QARMAv2-64. These attacks benefit from the optimal truncated differential distinguishers as well as some evolved key-recovery techniques.
Much recent work has developed efficient protocols for threshold signatures, where $n$ parties share a signing key and some threshold $t$ of those parties must interact to produce a signature. Yet efficient threshold signatures with post-quantum security have been elusive, with the state-of-the-art being a two-round scheme by Damgård et al. (PKC'21) based on lattices that supports only the full threshold case (i.e., $t=n$). We show here a two-round threshold signature scheme based on standard lattice assumptions that supports arbitrary thresholds $t\leq n$. Estimates of our scheme's performance at the $128$-bit security level show that in the 3-out-of-5 case, we obtain signatures of size $46.6$ KB and public keys of size $13.6$ KB. We achieve $\approx 5\times$ improved parameters if only a small number of signatures are ever issued with the same key. As an essential building block and independent contribution, we construct an actively secure threshold (linearly) homomorphic encryption scheme that supports arbitrary thresholds $t \leq n$.
An important consideration in the context of distributed ledger protocols is fairness in terms of transaction ordering. Recent work [Crypto 2020] revealed a connection of (receiver) order fairness to social choice theory and related impossibility results arising from the Condorcet paradox. As a result of the impossibility, various relaxations of order fairness were proposed in prior works. Given that distributed ledger protocols, especially those processing smart contracts, must serialize the input transactions, a natural objective is to minimize the distance (in terms of number of transactions) between any pair of unfairly ordered transactions in the output ledger — a concept we call bounded unfairness. In state machine replication (SMR) parlance this asks for minimizing the number of unfair state updates occurring before the processing of any request. This unfairness minimization objective gives rise to a natural class of parametric order fairness definitions that has not been studied before. As we observe, previous realizable relaxations of order fairness do not yield good unfairness bounds. Achieving optimal order fairness in the sense of bounded unfairness turns out to be connected to the graph theoretic properties of the underlying transaction dependency graph and specifically the bandwidth metric of strongly connected components in this graph. This gives rise to a specific instance of the definition that we call “directed bandwidth order-fairness” which we show that it captures the best possible that any ledger protocol can achieve in terms of bounding unfairness. We prove ordering transactions in this fashion is NP-hard and non-approximable for any constant ratio. Towards realizing the property, we put forth a new distributed ledger protocol called Taxis that achieves directed bandwidth order-fairness. We present two variations, one that matches the property perfectly but (necessarily) lacks in performance and liveness, and another that achieves liveness and better complexity while offering a slightly relaxed version of the property. Finally, we comment on applications of our work to social choice, a direction which we believe to be of independent interest.
As interest in metadata-hiding communication grows in both research and practice, a need exists for stronger abuse reporting features on metadata-hiding platforms. While message franking has been deployed on major end-to-end encrypted platforms as a lightweight and effective abuse reporting feature, there is no comparable technique for metadata-hiding platforms. Existing efforts to support abuse reporting in this setting, such as asymmetric message franking or the Hecate scheme, require order of magnitude increases in client and server computation or fundamental changes to the architecture of messaging systems. As a result, while metadata-hiding communication inches closer to practice, critical content moderation concerns remain unaddressed. This paper demonstrates that, for broad classes of metadata-hiding schemes, lightweight abuse reporting can be deployed with minimal changes to the overall architecture of the system. Our insight is that much of the structure needed to support abuse reporting already exists in these schemes. By taking a non-generic approach, we can reuse this structure to achieve abuse reporting with minimal overhead. In particular, we show how to modify schemes based on secret sharing user inputs to support a message franking-style protocol. Compared to prior work, our shared franking technique more than halves the time to prepare a franked message and gives order of magnitude reductions in server-side message processing times, as well as in the time to decrypt a message and verify a report.
Recently, a number of highly optimized threshold signing protocols for Schnorr signatures have been proposed. While these proposals contain important new techniques, some of them present and analyze these techniques in very specific contexts, making it less than obvious how these techniques can be adapted to other contexts (and perhaps combined with one another). The main goal of this paper is to abstract out and extend in various ways some of these techniques, building a toolbox of techniques that can be easily combined in different ways and in different contexts. To this end, we present security results for various "enhanced" modes of attack on the Schnorr signature scheme in the non-distributed setting. These results support a modular approach to protocol design and analysis in which one reduces the security of a distributed signing protocol to such an enhanced attack mode in the non-distributed setting. We show how these results can be used to easily design new threshold Schnorr protocols that enjoy better security and/or performance properties than existing ones.
{\em Range searching} is the problem of preprocessing a set of points $P$, such that given a query range $\gamma$ we can efficiently compute some function $f(P\cap\gamma)$. For example, in a 1 dimensional {\em range counting} query, $P$ is a set of numbers, $\gamma$ is a segment and we need to count how many numbers of $P$ are in $\gamma$. In higher dimensions, $P$ is a set of $d$ dimensional points and the query range is some volume in $R^d$. In general, we want to compute more than just counting, for example, the average of $P\cap\gamma$. Range searching has applications in databases where some SELECT queries can be translated to range queries. It had received a lot of attention in computational geometry where a data structure called {\em partition tree} was shown to solve range queries in time sub-linear in $|P|$ using space only linear in $|P|$. In this paper we consider partition trees under FHE where we answer range queries without learning the value of the points or the parameters of the range. We show how partition trees can be securely traversed with $O(t \cdot n^{1-\frac{1}{d}+\epsilon} + n^{1+\epsilon})$ operations, where $n=|P|$, $t$ is the number of operations needed to compare to $\gamma$ and $\epsilon>0$ is a parameter. As far as we know, this is the first non-trivial bound on range searching under FHE and it improves over the na\"ive solution that needs $O(t\cdot n)$ operations. Our algorithms are independent of the encryption scheme but as an example we implemented them using the CKKS FHE scheme. Our experiments show that for databases of sizes $2^{23}$ and $2^{25}$, our algorithms run $\times 2.8$ and $\times 4.7$ (respectively) faster than the na\"ive algorithm. The improvement of our algorithm comes from a method we call copy-and-recurse. With it we efficiently traverse a $r$-ary tree (where each inner node has $r$ children) that also has the property that at most $\xi$ of them need to be recursed into when traversing the tree. We believe this method is interesting in its own and can be used to improve traversals in other tree-like structures.
The decentralized finance (DeFi) ecosystem has proven to be popular in facilitating financial operations, such as token exchange and lending. The public availability of DeFi platforms’ code, together with real-time data on all user interactions with them, has given rise to complex tools that find and seize profit opportunities on behalf of users. In this work, we show that both users and the aforementioned tools sometimes act suboptimally. In specific instances which we examine, their profits can be increased by more than 100%, with the highest amount of missed revenue by a suboptimal action reaching 428.14ETH ($517K). To reach these findings, we examine core DeFi primitives which are responsible for a daily volume of over 100 million USD in Ethereum alone: (1) lending and borrowing funds, (2) using flashswaps to close arbitrage opportunities between decentralized exchanges (DEXs), (3) liquidation of insolvent loans using flashswaps. The profit which can be made from each primitive is then cast as an optimization problem that can be solved. We show that missed opportunities to make a profit are noticed by others, and are sometimes followed by back-running transactions which extract profits using similar actions. By analyzing these events, we find that some transactions are circumstantially tied to specific miners, and hypothesize they use their knowledge of private orderflow for a profit. Essentially, this is an instance of miner-extractable value (MEV) “in action”.
This paper studies the hardness of decision Module Learning with Errors (\MLWE) under linear leakage, which has been used as a foundation to derive more efficient lattice-based zero-knowledge proofs in a recent paradigm of Lyubashevsky, Nguyen, and Seiler (PKC 21). Unlike in the plain \LWE~setting, it was unknown whether this problem remains provably hard in the module/ring setting. This work shows a reduction from the search \MLWE~to decision \MLWE~with linear leakage. Thus, the main problem remains hard asymptotically as long as the non-leakage version of \MLWE~is hard. Additionally, we also refine the paradigm of Lyubashevsky, Nguyen, and Seiler (PKC 21) by showing a more fine-grained tradeoff between efficiency and leakage. This can lead to further optimizations of lattice proofs under the paradigm.
We conduct a systematic examination of vector arithmetic for polynomial multiplications in software. Vector instruction sets and extensions typically specify a fixed number of registers, each holding a power-of-two number of bits, and support a wide variety of vector arithmetic on registers. Programmers then try to align mathematical computations with the vector arithmetic supported by the designated instruction set or extension. We delve into the intricacies of this process for polynomial multiplications. In particular, we introduce “vectorization-friendliness” and “permutation-friendliness”, and review “Toeplitz matrix-vector product” to systematically identify suitable mappings from homomorphisms to vectorized implementations. To illustrate how the formalization works, we detail the vectorization of polynomial multiplication in $\mathbb{Z}_{4591}[x]/\left\langle x^{761} − x − 1 \right\rangle$ used in the parameter set sntrup761 of the NTRU Prime key encapsulation mechanism. For practical evaluation, we implement vectorized polynomial multipliers for the ring $\mathbb{Z}_{4591}[x]/\left\langle \Phi_{17} (x^{96}) \right\rangle$ with AVX2 and Neon. We benchmark our AVX2 implementation on Haswell and Skylake and our Neon implementation on Cortex-A72 and the “Firestorm” core of Apple M1 Pro. Our AVX2-optimized implementation is 1.99−2.16 times faster than the state-of-the-art AVX2-optimized implementation by [Bernstein, Brum- ley, Chen, and Tuveri, USENIX Security 2022] on Haswell and Skylake, and our Neon-optimized implementation is 1.29−1.36 times faster than the state-of-the-art Neon-optimized implementation by [Hwang, Liu, and Yang, ACNS 2024] on Cortex-A72 and Apple M1 Pro. For the overall scheme with AVX2, we reduce the batch key generation cycles (amortized with batch size 32) by 7.9%−12.0%, encapsulation cycles by 7.1%−10.3%, and decapsulation cycles by 10.7%−13.3% on Haswell and Skylake. For the overall performance with Neon, we reduce the encapsulation cycles by 3.0%−6.6% and decapsulation cycles by 12.8%−15.1% on Cortex-A72 and Apple M1 Pro.
Private messaging systems that use a bulletin board, like privacy-preserving blockchains, have been a popular topic during the last couple of years. In these systems, typically a private message is posted on the board for a recipient and the privacy requirement is that no one can determine the sender and the recipient of the message. Until recently, the efficiency of these recipients was not considered, and the party had to perform a naive scan of the board to retrieve their messages. More recently, works like Fuzzy Message Detection (FMD), Private Signaling (PS), and Oblivious Message Retrieval (OMR) have studied the problem of protecting recipient privacy by outsourcing the message retrieval process to an untrusted server. However, FMD only provides limited privacy guarantees, and PS and OMR greatly lack scalability. In this work, we present a new construction for private signaling which is both asymptotically superior and concretely orders of magnitude faster than all prior works while providing full privacy. Our constructions make use of a trusted execution environment (TEE) and an Oblivious RAM to improve the computation complexity of the server. We also improve the privacy guarantees by keeping the recipient hidden even during the retrieval of signals from the server. Our proof-of-concept open-source implementation shows that for a server serving a million recipients and ten million messages, it only takes $< 60$ milliseconds to process a sent message, and $< 6$ seconds to process a retrieval (of 100 signals) request from a recipient.
We introduce a novel template attack for secret key recovery in Kyber, leveraging side-channel information from polynomial multiplication during decapsulation. Conceptually, our attack exploits that Kyber's incomplete number-theoretic transform (NTT) causes each secret coefficient to be used multiple times, unlike when performing a complete NTT. Our attack is a single trace \emph{known} ciphertext attack that avoids machine-learning techniques and instead relies on correlation-matching only. Additionally, our template generation method is very simple and easy to replicate, and we describe different attack strategies, varying on the number of templates required. Moreover, our attack applies to both masked implementations as well as designs with multiplication shuffling. We demonstrate its effectiveness by targeting a masked implementation from the \emph{mkm4} repository. We initially perform simulations in the noisy Hamming-Weight model and achieve high success rates with just $13\,316$ templates while tolerating noise values up to $\sigma=0.3$. In a practical setup, we measure power consumption and notice that our attack falls short of expectations. However, we introduce an extension inspired by known online template attacks, enabling us to recover $128$ coefficient pairs from a single polynomial multiplication. Our results provide evidence that the incomplete NTT, which is used in Kyber-768 and similar schemes, introduces an additional side-channel weakness worth further exploration.
OpenPubkey makes a client-side modification to OpenID Connect so that an ID Token issued by an OpenID Provider commits to a user held public key. This transforms an ID Token into a certificate that cryptographically binds an OpenID Connect identity to a public key. We call such an ID Token, a PK Token. The user can then sign messages with their signing key and these signatures can be authenticated and attributed to the user’s OpenID Connect identity. This allows OpenPubkey to upgrade OpenID Connect from Bearer Authentication to Proof-of-Possession, eliminating trust assumptions in OpenID Connect and defeating entire categories of attacks present in OpenID Connect. OpenPubkey was designed to satisfy a decade-long need for this functionality. Prior to OpenPubkey, OpenID Connect did not have a secure way for users to sign statements under their OpenID identities. OpenPubkey is transparent to users and OpenID Providers. An OpenID Provider can not even determine that OpenPubkey is being used. This makes OpenPubkey fully compatible with existing OpenID Providers. In fact a variant of OpenPubkey is currently deployed and used to authenticate signed messages and identities for users with accounts on Google, Microsoft, Okta, and Onelogin. OpenPubkey does not add new trusted parties to OpenID Connect and reduces preexisting trust assumptions. If used in tandem with our MFA-cosigner, OpenPubkey can maintain security even against a malicious OpenID Provider (the most trusted party in OpenID Connect).
Oblivious RAM (ORAM) allows a client to outsource storage to a remote server while hiding the data access pattern from the server. Many ORAM designs have been proposed to reduce the computational overhead and bandwidth blowup for the client. A recent work, Onion Ring ORAM (CCS'19), is able to achieve $O(1)$ bandwidth blowup in the online phase using fully homomorphic encryption (FHE) techniques, at the cost of a computationally expensive client-side offline phase. Furthermore, such a scheme can be categorized as a stateful construction, meaning that the client has to locally maintain a dynamic state representing the order of remote database elements. We present Panacea: a novel design of ORAM based on FHE techniques, which is non-interactive and stateless, achieves $O(1)$ bandwidth blowup, and does not require an expensive offline phase for the client to perform; in that sense, our design is the first of its kind among other ORAM designs. To provide the client with such performance benefits, our design delegates all expensive computation to the resourceful server. We additionally show how to boost the server performance significantly using probabilistic batch codes at the cost of only 1.5x in additional bandwidth blowup and 3x expansion in server storage, but less amortized bandwidth. Our experimental results show that our design, with the batching technique, is practical in terms of server computation overhead as well. Specifically, for a database size of $2^{19}$, it takes only $1.16$ seconds of amortized computation time for a server to respond to a query. As a result of the statelessness and low computational overhead on the client, and reasonable computational overhead on the server, our design is very suitable to be deployed as a cloud-based privacy-preserving storage outsourcing solution with a portable client running on a lightweight device.
Partially Oblivious Pseudorandom Functions (POPRFs) are 2-party protocols that allow a client to learn pseudorandom function (PRF) evaluations on inputs of its choice from a server. The client submits two inputs, one public and one private. The security properties ensure that the server cannot learn the private input, and the client cannot learn more than one evaluation per POPRF query. POPRFs have many applications including password-based key exchange and privacy-preserving authentication mechanisms. However, most constructions are based on classical assumptions, and those with post quantum security suffer from large eﬀiciency drawbacks. In this work, we construct a novel POPRF from lattice assumptions and the “Crypto Dark Matter” PRF candidate (TCC’18) in the random oracle model. At a conceptual level, our scheme exploits the alignment of this family of PRF candidates, relying on mixed modulus computations, and programmable bootstrapping in the torus fully homomorphic encryption scheme (TFHE). We show that our construction achieves malicious client security based on circuit-private FHE, and client privacy from the semantic security of the FHE scheme. We further explore a heuristic approach to extend our scheme to support verifiability, based on the difficulty of computing cheating circuits in low depth. This would yield a verifiable (P)OPRF. We provide a proof-of-concept implementation and preliminary benchmarks of our construction. For the core online OPRF functionality, we require amortised 10.0KB communication per evaluation and a one-time per-client setup communication of 2.5MB.
Profiling side-channel analysis, recognized for its robust attack performance in worst-case scenarios, necessitates adversaries to have a cloned device for profiling measurements and secret information for data labeling. On the other hand, non-profiling attacks eschew these requirements by trying all key guesses. Although more suitable for real-world attack scenarios, they may suffer from mediocre attack performance due to the lack of leakage insight. This paper introduces a novel weakly profiling side-channel analysis method that bridges classical profiling and non-profiling analyses. Our method operates within a profiling framework yet discards the necessity for a cloned device, which relies on the fact that there is (commonly) a bijective relationship between known information, such as plaintext and ciphertext, and secret information. This relationship allows an adversary to label leakage measurements using known information and then profile leakages directly on the attacked device. The empirical results show that the proposed approach reaches at least $3\times$ better attack performance with negligible computational effort than existing non-profiling methods. Moreover, it can rival the performance of state-of-the-art profiling attacks.
While the Weil pairing is geometric, the Tate pairing is arithmetic: its value depends on the base field considered. Nevertheless, the étale topology allows to interpret the Galois action in a geometric manner. In this paper, we discuss this point of view for the Tate pairing: its natural geometric interpretation is that it gives étale $\mu_n$-torsors. While well known to experts, this interpretation is perhaps less known in the cryptographic community. As an application, we explain how to use the Tate pairing to study the fibers of an isogeny, and we prove a conjecture by Castryck and Decru on multiradical isogenies.
Discrete logarithm problem(DLP) is the pillar of many cryptographical schemes. We propose an improvement to the Gaudry-Schost algorithm, for multi-dimensional DLP. We have derived the cost estimates in general and specialized cases, which prove efficiency of our new method. We report the implementation of our algorithm, which confirms the theory. Both theory and experiments val- idate the fact that the advantage of our algorithm increases for large sizes, which helps in practical scenarios. Our method is applicable to speed-up electronic voting, cash schemes, along with other ar- eas associated with multi-dimensional discrete logarithms (point-counting, speeding-up elliptic-curve arithmetic, group-actions, CSIDH etc.).
Homomorphic encryption (HE) allows for computations over ciphertexts while they are encrypted. Because of this, HE supports the outsourcing of computation on private data. Due to the additional risks caused by data outsourcing, the ability to recover from losses is essential, but doing so on data encrypted under an HE scheme introduces additional challenges for recovery and usability. This work introduces X-Cipher, which aims to make HE ciphertexts resilient by ensuring they are private and recoverable simultaneously at all stages during data outsourcing. X-Cipher allows data recovery without requiring the decryption of HE ciphertexts and maintains its ability to recover and keep data private when a cluster server has been compromised. X-Cipher allows for reduced ciphertext storage overhead by introducing novel encoding and leveraging previously introduced ciphertext packing. X-Cipher's capabilities were evaluated on a synthetic dataset to demonstrate that X-Cipher enables secure availability capabilities while enabling privacy-preserving outsourced computations.
Private heavy-hitters is a data-collection task where multiple clients possess private bit strings, and data-collection servers aim to identify the most popular strings without learning anything about the clients' inputs. In this work, we introduce PLASMA: a private analytics framework in the three-server setting that protects the privacy of honest clients and the correctness of the protocol against a coalition of malicious clients and a malicious server. Our core primitives are a verifiable incremental distributed point function (VIDPF) and a batched consistency check, which are of independent interest. Our VIDPF introduces new methods to validate client inputs based on hashing. Meanwhile, our batched consistency check uses Merkle trees to validate multiple client sessions together in a batch. This drastically reduces server communication across multiple client sessions, resulting in significantly less communication compared to related works. Finally, we compare PLASMA with the recent works of Asharov et al. (CCS'22) and Poplar (S&P'21) and compare in terms of monetary cost for different input sizes.
We revisit the question of what the definition of bit security should be, previously answered by Micciancio-Walter (Eurocrypt 2018) and Watanabe-Yasunaga (Asiacrypt 2021). Our new definition is simple, but (i) captures both search and decision primitives in a single framework like Micciancio-Walter, and (ii) has a firm operational meaning like Watanabe-Yasunaga. It also matches intuitive expectations and can be well-formulated regarding Hellinger distance. To support and justify the new definition, we prove several classic security reductions with respect to our bit security. We also provide pathological examples that indicate the ill-definedness of bit security defined in Micciancio-Walter and Watanabe-Yasunaga.
This paper reviews, from bottom to top, a polynomial-time algorithm to correct t errors in classical binary Goppa codes defined by squarefree degree-t polynomials. The proof is factored through a proof of a simple Reed--Solomon decoder, and the algorithm is simpler than Patterson's algorithm. All algorithm layers are expressed as Sage scripts backed by test scripts. All theorems are formally verified. The paper also covers the use of decoding inside the Classic McEliece cryptosystem, including reliable recognition of valid inputs.
Consider key agreement by two parties who start out knowing a common secret (which we refer to as “pass-string”, a generalization of “password”), but face two complications: (1) the pass-string may come from a low-entropy distribution, and (2) the two parties’ copies of the pass-string may have some noise, and thus not match exactly. We provide the first efficient and general solutions to this problem that enable, for example, key agreement based on commonly used biometrics such as iris scans. The problem of key agreement with each of these complications individually has been well studied in literature. Key agreement from low-entropy shared pass-strings is achieved by password-authenticated key exchange (PAKE), and key agreement from noisy but high-entropy shared pass-strings is achieved by information-reconciliation protocols as long as the two secrets are “close enough.” However, the problem of key agreement from noisy low-entropy pass-strings has never been studied. We introduce (universally composable) fuzzy password-authenticated key exchange (fPAKE), which solves exactly this problem. fPAKE does not have any entropy requirements for the pass-strings, and enables secure key agreement as long as the two pass-strings are “close” for some notion of closeness. We also give two constructions. The first construction achieves our fPAKE definition for any (efficiently computable) notion of closeness, including those that could not be handled before even in the high-entropy setting. It uses Yao’s garbled circuits in a way that is only two times more costly than their use against semi-honest adversaries, but that guarantees security against malicious adversaries. The second construction is more efficient, but achieves our fPAKE definition only for pass-strings with low Hamming distance. It builds on very simple primitives: robust secret sharing and PAKE.
Bitcoin is one of the most prominent examples of a distributed cryptographic protocol that is extensively used in reality. Nonetheless, existing security proofs are property-based, and as such they do not support composition. In this work, we put forth a universally composable treatment of the Bitcoin protocol. We specify the goal that Bitcoin aims to achieve as an instance of a parameterizable ledger functionality and present a UC abstraction of the Bitcoin blockchain protocol. Our ideal functionality is weaker than the first proposed candidate by Kiayias, Zhou, and Zikas [EUROCRYPT’16], but unlike the latter suggestion, which is arguably not implementable by the UC Bitcoin protocol, we prove that the one proposed here is securely UC-realized by the protocol assuming access to a global clock, to model time-based executions, a random oracle, to model hash functions, and an idealized network, to model message dissemination. We further show how known property-based approaches can be cast as special instances of our treatment and how their underlying assumptions can be cast in UC as part of the setup functionalities and without restricting the environment or the adversary.