publications
Selected papers
- CACMPengfei Li, Jianyi Yang, Mohammad A. Islam, and Shaolei RenCommunications of the ACM (accepted), 2024
The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water (withdrawal and consumption) footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly evaporate 700,000 liters of clean freshwater, but such information has been kept a secret. More critically, the global AI demand might be accountable for 4.2 – 6.6 billion cubic meters of water withdrawal in 2027, which is more than the total annual water withdrawal of 4 – 6 Denmark or half of the United Kingdom. This is very concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also must, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate the water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models’ runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.
- SIGMETRICSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenSIGMETRICS, 2025
This paper studies learning-augmented decentralized online convex optimization in a networked multi-agent system, a challenging setting that has remained under-explored. We first consider a linear learning-augmented decentralized online algorithm (LADO-Lin) that combines a machine learning (ML) policy with a baseline expert policy in a linear manner. We show that, while LADO-Lin can exploit the potential of ML predictions to improve the average cost performance, it cannot have guaranteed worst-case performance. To address this limitation, we propose a novel online algorithm (LADO) that adaptively combines the ML policy and expert policy to safeguard the ML predictions to achieve strong competitiveness guarantees. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. Finally, we run an experiment on decentralized battery management for sustainable computing. Our results highlight the potential of ML augmentation to improve the average performance as well as the guaranteed worst-case performance of LADO.
- NeurIPSTong Zhou, Xuandong Zhao, Xiaolin Xu, and Shaolei RenNeruIPS, 2024
Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.
- ICMLYejia Liu, Jianyi Yang, Pengfei Li, Tongxin Li, and Shaolei RenICML, 2024
Public models have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, their exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents when utilized. Recognizing the public model’s predictions as a service, we advocate for integrating the objectives of downstream agents into the optimization process. Concretely, to address performance disparities and foster fairness among heterogeneous agents in training, we propose a novel q-Equitable objective. This objective, coupled with a policy gradient algorithm, is crafted to train the public model to produce a more equitable/uniform performance distribution across downstream agents, each with their unique concerns. Both theoretical analysis and empirical case studies have proven the effectiveness of our method in advancing performance equity across diverse downstream agents utilizing the public model for their decision-making.
- SIGMETRICSJianyi Yang, Pengfei Li, Mohammad J. Islam, and Shaolei RenSIGMETRICS, 2024
This paper studies online resource allocation with replenishable budgets, where budgets can be replenished on top of the initial budget and an agent sequentially chooses online allocation decisions without violating the available budget constraint at each round. We propose a novel online algorithm, called OACP (Opportunistic Allocation with Conservative Pricing), that conservatively adjusts dual variables while opportunistically utilizing available resources. OACP achieves a bounded asymptotic competitive ratio in adversarial settings as the number of decision rounds 𝑇 gets large. Importantly, the asymptotic competitive ratio of OACP is optimal in the absence of additional assumptions on budget replenishment. To further improve the competitive ratio, we make a mild assumption that there is budget replenishment every 𝑇*≥ 1 decision rounds and propose OACP+ to dynamically adjust the total budget assignment for online allocation. Next, we move beyond the worst-case and propose LA-OACP (Learning-Augmented OACP/OACP+), a novel learning-augmented algorithm for online allocation with replenishable budgets. We prove that LA-OACP can improve the average utility compared to OACP/OACP+ when the ML predictor is properly trained, while still offering worst-case utility guarantees when the ML predictions are arbitrarily wrong. Finally, we run simulation studies of sustainable AI inference powered by renewables, validating our analysis and demonstrating the empirical benefits of LA-OACP.
- NeurIPSJianyi Yang, Pengfei Li, Adam Wierman, and Shaolei RenNeurIPS, 2024
Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (kappa) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1-kappa on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called MetaAd, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio kappa in [0,1]. As a by-product, we extend MetaAd to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learning-augmented algorithms.
- NeurIPSTongxin Li, Tinashe Handina, Shaolei Ren, and Adam WiermanNeurIPS, 2024
The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent’s own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent’s payoff. In particular, we formally define a trade-off between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the trade-off by providing upper and lower bounds on the payoff gap for both normal-form and stochastic Bayesian games, with numerical results provided.
- e-EnergyPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei Rene-Energy, 2024
Fueled by the soaring popularity of large language and foundation models, the accelerated growth of artificial intelligence (AI) models’ enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity – the fact that AI’s environmental footprint can be disproportionately higher in certain regions than in others – has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI’s environmental inequity by balancing its regional negative environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (GLB) to explicitly address AI’s environmental impacts on the most disadvantaged regions. We run trace-based simulations by considering a set of 10 geographically-distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while our proposed equity-aware GLB can significantly reduce the regional disparity in terms of carbon and water footprints.
- e-EnergyPranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, and Mohammad A. Islame-Energy, 2024 (Best Notes Paper Award)
Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2018 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it.
- NeurIPSJianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, and Shaolei RenNeurIPS, 2023
This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under anytime constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
- NeurIPSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenNeurIPS, 2023
We study a general form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically, we prove that RCL is able to guarantee (1+lambda)-competitiveness against any given expert for any lambda>0, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly, RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay. We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportation as a case study.
- NeurIPSTongxin Li, Yiheng Lin, Shaolei Ren, and Adam WiermanNeurIPS, 2023
We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
- ASPLOSNuntipat Narkthong, Shijin Duan, Shaolei Ren, and Xiaolin XuASPLOS, 2024
Artificial intelligence has become feasible on tiny edge devices, thanks to the availability of high-performance microcontroller units (MCUs) and the development of highly-efficient machine learning (ML) models. Although successful, the current design practice implementing high-performance ML models on high-end MCU devices, still presents barriers that prohibit the more widespread adoption of ML on real-world applications built on low-end devices, where power and available hardware resources have been critical factors for including ML functionality in the final product. On the one hand, these low-end MCU devices have very limited computational resources in terms of hardware resources and power budget, which makes ML inference less accessible. On the other hand, these low-end and low-cost MCUs present the leading physical carrier of real-world smart applications, making small and fast-responsive (i.e., always-on) ML models an urgent need. This paper presents MicroVSA, a library of optimized implementations of a low-dimensional computing (LDC) classifier, a recently proposed variant of vector symbolic architecture (VSA), for 8-bit, 16-bit, and 32-bit MCUs. MicroVSA achieves 1.33–10.27x speedup from the vanilla LDC, and our sample model achieves 85–93% accuracy on three most common classification tasks for always-on inference on the MCUs — myocardial infarction detection, human activity recognition, and hot word detection — while requiring only a few tens of bytes of RAM and fitting in the vast majority of tiny 8-bit MCUs. For instance, our model for detecting the phrase "Hey Snapdragon" only needs 6.75KB of flash and 0.02KB of RAM and can complete one inference in 10.6 ms on a low-end 8-bit AVR MCU or 0.68 ms on an ARM Cortex-M4 MCU, outperforming the best neural network-based model by a factor of 7.2x-113x, making it the first to run on an 8-bit MCUs and the most efficient hot word detection model. Our study suggests that ubiquitous ML deployment on extremely low-cost MCUs is possible and that more study on VSA model training, model compression, and implementation techniques is needed to further explore the possibility of lowering the cost and power of ML on edge devices.
- ICMLPengfei Li, Jianyi Yang, and Shaolei RenICML, 2023
Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robustness and can perform arbitrarily poorly. In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judicious condition to hedge against future uncertainties, decides whether to follow the expert’s decision or the RL decision for each online item. We prove that for any ρ∈[0,1], LOMAR is ρ-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines.
- ICLRTong Zhou, Shaolei Ren, and Xiaolin XuICLR, 2024
Deep neural network (DNN) models, despite their impressive performance, are vulnerable to exploitation by attackers who attempt to adapt them to other tasks for their own benefit. Current defense strategies mainly address this vulnerability at the model parameter level, leaving the potential of architectural-level defense largely unexplored. This paper, for the first time, addresses the issue of model protection by reducing transferability at the architecture level. Specially, we present a novel neural architecture search (NAS)-enabled algorithm that employs zero-cost proxies and evolutionary search, to design model architectures with low transferability. Our method, namely ArchLock, aims to achieve high performance on the source task, while degrading the performance on target tasks, i.e., locking the transferability of a DNN model. To achieve efficient cross-task search without having access to the training data owned by the attackers, we utilize zero-cost proxies to speed up architecture evaluation and simulate potential target task embeddings to assist cross-task search with a binary performance predictor. Extensive experiments on NAS-Bench-201 and TransNAS-Bench-101 demonstrate that ArchLock reduces transferability by up to 30% and 50%, respectively, with negligible performance degradation on source tasks (<2%).
- SIGMETRICSPengfei Li, Jianyi Yang, and Shaolei RenSIGMETRICS, 2022
We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses — one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.
- SIGMETRICSBingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, and Shaolei RenSIGMETRICS, 2022
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity — the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.
- ICMLJianyi Yang, and Shaolei RenICML, 2022
By integrating domain knowledge with labeled samples, informed machine learning has been emerging to improve the learning performance for a wide range of applications. Nonetheless, rigorous understanding of the role of injected domain knowledge has been under-explored. In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. Concretely, we quantitatively demonstrate the two benefits of domain knowledge in informed learning – regularizing the label-based supervision and supplementing the labeled samples – and reveal the trade-off between label and knowledge imperfectness in the bound of the population risk. Based on the theoretical analysis, we propose a generalized informed training objective to better exploit the benefits of knowledge and balance the label and knowledge imperfectness, which is validated by the population risk bound. Our analysis on sampling complexity sheds lights on how to choose the hyper-parameters for informed learning, and further justifies the advantages of knowledge informed learning.
- TCCMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangIEEE Transactions on Cloud Computing, Jan 2020
The massive energy consumption of data centers worldwide has resulted in a large carbon footprint, raising serious concerns to sustainable IT initiatives and attracting a great amount of research attention. Nonetheless, the current efforts to date, despite encouraging, have been primarily centered around owner-operated data centers (e.g., Google data center), leaving out another major segment of data center industry-colocation data centers-much less explored. As a major hindrance to carbon efficiency desired by the operator, colocation suffers from “split incentive”: tenants may not be willing to manage their servers for carbon efficiency. In this paper, we aim at minimizing the carbon footprint of geo-distributed colocation data centers, while ensuring that the operator’s cost meets a long-term budget constraint. We overcome the “split incentive” hurdle by devising a novel online carbon-aware incentive mechanism, called GreenColo, in which tenants voluntarily bid for energy reduction at self-determined prices and will receive financial rewards if their bids are accepted at runtime. Using trace based simulation we show that GreenColo results in a carbon footprint fairly close (23 versus 18 percent) to the optimal offline solution with future information, while being able to satisfy the colocation operator’s long-term budget constraint. We demonstrate the effectiveness of GreenColo in practical scenarios via both simulation studies and scaled-down prototype experiments. Our results show that GreenColo can reduce the carbon footprint by up to 24 percent without incurring any additional cost for the colocation operator (compared to the no-incentive baseline case), while tenants receive financial rewards for “free” without violating service level agreement.
- SIGMETRICSMohammad A. Islam, Luting Yang, Kiran Ranganath, and Shaolei RenSIGMETRICS, 2018
The common practice of power infrastructure oversubscription in data centers exposes dangerous vulnerabilities to well-timed power attacks (i.e., maliciously timed power loads to overload the infrastructure capacity), possibly creating outages and resulting in multimillion-dollar losses. In this paper, we focus on the emerging threat of power attacks in a multi-tenant data center, where a malicious tenant (i.e., attacker) aims at compromising the data center availability through power attacks. We discover a novel acoustic side channel resulting from servers’ cooling fan noise, which can help the attacker time power attacks at the moments when benign tenants’ power usage is high. Concretely, we exploit the acoustic side channel by: (1) employing a high-pass filter to filter out the air conditioner’s noise; (2) applying non-negative matrix factorization with sparsity constraint to demix the received aggregate noise and detect periods of high power usage by benign tenants; and (3) designing a state machine to guide power attacks. We run experiments in a practical data center environment as well as simulation studies, and demonstrate that the acoustic side channel can assist the attacker with detecting more than 50% of all attack opportunities, representing state-of-the-art timing accuracy.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, and Adam WiermanHPCA, 2018
Despite the common practice of oversubscription, power capacity is largely under-utilized in data centers. A significant factor driving this under-utilization is fluctuation of the aggregate power demand, resulting in unused “spot (power) capacity”. In this paper, we tap into spot capacity for improving power infrastructure utilization in multi-tenant data centers, an important but under-explored type of data center where multiple tenants house their own physical servers. We propose a novel market, called SpotDC, to allocate spot capacity to tenants on demand. Specifically, SpotDC extracts tenants’ racklevel spot capacity demand through an elastic demand function, based on which the operator sets the market price for spot capacity allocation. We evaluate SpotDC using both testbed experiments and simulations, demonstrating that SpotDC improves power infrastructure utilization and creates a “win-win” situation: the data center operator increases its profit (by nearly 10%), while tenants improve their performance (by 1.2-1.8x on average compared to the no spot capacity case, yet at a marginal cost).
- TCCMohammad A. Islam, Kishwar Ahmed, Hong Xu, Nguyen H. Tran, Gang Quan, and Shaolei RenIEEE Transactions on Cloud Computing, Jul 2018
As the critical infrastructure for supporting Internet and cloud computing services, massive geo-distributed data centers are notorious for their huge electricity appetites and carbon footprints. Nonetheless, a lesser-known fact is that data centers are also “thirsty”: to operate data centers, millions of gallons of water are required for cooling and electricity production. The existing water-saving techniques primarily focus on improved “engineering” (e.g., upgrading to air economizer cooling, diverting recycled/sea water instead of potable water) and do not apply to all data centers due to high upfront capital costs and/or location restrictions. In this paper, we propose a software-based approach towards water conservation by exploiting the inherent spatio-temporal diversity of water efficiency across geo-distributed data centers. Specifically, we propose a batch job scheduling algorithm, called WACE (minimization of WAter, Carbon and Electricity cost), which dynamically adjusts geographic load balancing and resource provisioning to minimize the water consumption along with carbon emission and electricity cost while satisfying average delay performance requirement. WACE can be implemented online without foreseeing the far future information and yields a total cost (incorporating electricity cost, water consumption and carbon emission) that is provably close to the optimal algorithm with lookahead information. Finally, we validate WACE through a trace-based simulation study and show that WACE outperforms state-of-the-art benchmarks: 25 percent water saving while incurring an acceptable delay increase. We also extend WACE to joint scheduling of batch workloads and delay-sensitive interactive workloads for further water footprint reduction in geo-distributed data centers.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, and Xiaorui WangHPCA, 2016
Power oversubscription in data centers may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. Handling such an emergency requires a graceful power capping solution that minimizes the performance loss. In this paper, we study power capping in a multi-tenant data center where the operator supplies power to multiple tenants that manage their own servers. Unlike owner-operated data centers, the operator lacks control over tenants’ servers. To address this challenge, we propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants’ power reduction for minimizing total performance loss (quantified in performance cost) while satisfying multiple power capping constraints. We build a prototype to show that COOP is efficient in terms of minimizing the total performance cost, even compared to the ideal but infeasible case that assumes the operator has full control over tenants’ servers. We also demonstrate that COOP is "win-win", increasing the operator’s profit (through oversubscription) and reducing tenants’ cost (through financial compensation for their power reduction during emergencies).
- e-EnergyQihang Sun, Shaolei Ren, Chuan Wu, and Zongpeng Lie-Energy, 2016
Deferring batch workload in data centers is promising for demand response to enhance the efficiency and reliability of a power grid. Yet operators of multi-tenant colocation data centers still resort to eco-unfriendly diesel generators for demand response, because tenants lack incentives to defer their workloads. This work proposes an online auction mechanism for emergency demand response (EDR) in geo-distributed colocation data centers, which incentivizes tenants to delay and shuffle their workload across multiple data centers by providing monetary rewards. The mechanism, called BatchEDR, decides the tenants’ workload deferment/reduction and diesel usage in each data center upon receiving an EDR signal, for cost minimization throughout the entire EDR event, considering that only a limited amount of batch workloads can be deferred throughout EDR as well as across multiple data centers. Without future information, BatchEDR achieves a good competitive ratio compared to an omniscient offline optimal algorithm, while ensuring truthfulness and individual rationality over the auction process. Trace-driven experiments show that BatchEDR outperforms the existing mechanisms and achieves good social cost.
- HPCAMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangHPCA, 2015
Data centers are key participants in demand response programs, including emergency demand response (EDR), where the grid coordinates large electricity consumers for demand reduction in emergency situations to prevent major economic losses. While existing literature concentrates on owner-operated data centers, this work studies EDR in multi-tenant colocation data centers where servers are owned and managed by individual tenants. EDR in colocation data centers is significantly more challenging, due to lack of incentives to reduce energy consumption by tenants who control their servers and are typically on fixed power contracts with the colocation operator. Consequently, to achieve demand reduction goals set by the EDR program, the operator has to rely on the highly expensive and/or environmentally-unfriendly on-site energy backup/generation. To reduce cost and environmental impact, an efficient incentive mechanism is therefore in need, motivating tenants’ voluntary energy reduction in case of EDR. This work proposes a novel incentive mechanism, Truth-DR, which leverages a reverse auction to provide monetary remuneration to tenants according to their agreed energy reduction. Truth-DR is computationally efficient, truthful, and achieves 2-approximation in colocation-wide social cost. Trace-driven simulations verify the efficacy of the proposed auction mechanism.
- SCShaolei Ren, and Yuxiong HeSC, 2013
Due to the enormous energy consumption and associated environmental concerns, data centers have been increasingly pressured to reduce long-term net carbon footprint to zero, i.e., carbon neutrality. In this paper, we propose an online algorithm, called COCA (optimizing for COst minimization and CArbon neutrality), for minimizing data center operational cost while satisfying carbon neutrality without long-term future information. Unlike the existing research, COCA enables distributed server-level resource management: each server autonomously adjusts its processing speed and optimally decides the amount of workloads to process. We prove that COCA achieves a close-to-minimum operational cost (incorporating both electricity and delay costs) compared to the optimal algorithm with future information, while bounding the potential violation of carbon neutrality. We also perform trace-based simulation studies to complement the analysis, and the results show that COCA reduces cost by more than 25% (compared to state of the art) while resulting in a smaller carbon footprint.
Sustainable AI & computing
- CACMPengfei Li, Jianyi Yang, Mohammad A. Islam, and Shaolei RenCommunications of the ACM (accepted), 2024
The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water (withdrawal and consumption) footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly evaporate 700,000 liters of clean freshwater, but such information has been kept a secret. More critically, the global AI demand might be accountable for 4.2 – 6.6 billion cubic meters of water withdrawal in 2027, which is more than the total annual water withdrawal of 4 – 6 Denmark or half of the United Kingdom. This is very concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also must, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate the water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models’ runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.
- SIGMETRICSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenSIGMETRICS, 2025
This paper studies learning-augmented decentralized online convex optimization in a networked multi-agent system, a challenging setting that has remained under-explored. We first consider a linear learning-augmented decentralized online algorithm (LADO-Lin) that combines a machine learning (ML) policy with a baseline expert policy in a linear manner. We show that, while LADO-Lin can exploit the potential of ML predictions to improve the average cost performance, it cannot have guaranteed worst-case performance. To address this limitation, we propose a novel online algorithm (LADO) that adaptively combines the ML policy and expert policy to safeguard the ML predictions to achieve strong competitiveness guarantees. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. Finally, we run an experiment on decentralized battery management for sustainable computing. Our results highlight the potential of ML augmentation to improve the average performance as well as the guaranteed worst-case performance of LADO.
- SIGMETRICSJianyi Yang, Pengfei Li, Mohammad J. Islam, and Shaolei RenSIGMETRICS, 2024
This paper studies online resource allocation with replenishable budgets, where budgets can be replenished on top of the initial budget and an agent sequentially chooses online allocation decisions without violating the available budget constraint at each round. We propose a novel online algorithm, called OACP (Opportunistic Allocation with Conservative Pricing), that conservatively adjusts dual variables while opportunistically utilizing available resources. OACP achieves a bounded asymptotic competitive ratio in adversarial settings as the number of decision rounds 𝑇 gets large. Importantly, the asymptotic competitive ratio of OACP is optimal in the absence of additional assumptions on budget replenishment. To further improve the competitive ratio, we make a mild assumption that there is budget replenishment every 𝑇*≥ 1 decision rounds and propose OACP+ to dynamically adjust the total budget assignment for online allocation. Next, we move beyond the worst-case and propose LA-OACP (Learning-Augmented OACP/OACP+), a novel learning-augmented algorithm for online allocation with replenishable budgets. We prove that LA-OACP can improve the average utility compared to OACP/OACP+ when the ML predictor is properly trained, while still offering worst-case utility guarantees when the ML predictions are arbitrarily wrong. Finally, we run simulation studies of sustainable AI inference powered by renewables, validating our analysis and demonstrating the empirical benefits of LA-OACP.
- NeurIPSJianyi Yang, Pengfei Li, Adam Wierman, and Shaolei RenNeurIPS, 2024
Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (kappa) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1-kappa on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called MetaAd, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio kappa in [0,1]. As a by-product, we extend MetaAd to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learning-augmented algorithms.
- e-EnergyPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei Rene-Energy, 2024
Fueled by the soaring popularity of large language and foundation models, the accelerated growth of artificial intelligence (AI) models’ enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity – the fact that AI’s environmental footprint can be disproportionately higher in certain regions than in others – has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI’s environmental inequity by balancing its regional negative environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (GLB) to explicitly address AI’s environmental impacts on the most disadvantaged regions. We run trace-based simulations by considering a set of 10 geographically-distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while our proposed equity-aware GLB can significantly reduce the regional disparity in terms of carbon and water footprints.
- e-EnergyPranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, and Mohammad A. Islame-Energy, 2024 (Best Notes Paper Award)
Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2018 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it.
- HotCarbonYejia Liu, Pengfei Li, Daniel Wong, and Shaolei RenHotCarbon, 2024
The enormous growth of AI computing has led to a surging demand for electricity. To stem the resulting energy cost and environmental impact, this paper explores opportunities enabled by the increasing hardware heterogeneity and introduces the concept of Geographical Server Relocation (GSR). Specifically, GSR physically balances the available AI servers across geographically distributed data centers subject to AI computing demand and power capacity constraints in each location. The key idea of GSR is to relocate older and less energy-efficient servers to regions with more renewables, better water efficiencies and/or lower electricity prices. Our case study demonstrates that, even with modest flexibility of relocation, GSR can substantially reduce the total operational environmental footprints and operation costs of AI computing. We conclude this paper by discussing the major challenges of GSR, including service migration, software management, and algorithms.
- HotEthicsPengfei Li, Yejia Liu, Jianyi Yang, and Shaolei RenHotEthics, 2024
The sharply increasing sizes of artificial intelligence (AI) models come with significant energy consumption and environmental footprints, which can disproportionately impact certain (often marginalized) regions and hence create environmental inequity concerns. Moreover, concerns with social inequity have also emerged, as AI computing resources may not be equitably distributed across the globe and users from certain disadvantaged regions with severe resource constraints can consistently experience inferior model performance. Importantly, the inequity concerns that encompass both social and environmental dimensions still remain unexplored and have increasingly hindered responsible AI. In this paper, we leverage the spatial flexibility of AI inference workloads and propose equitable geographical load balancing (GLB) to fairly balance AI’s regional social and environmental costs. Concretely, to penalize the disproportionately high social and environmental costs for equity, we introduce Lq norms as novel regularization terms into the optimization objective for GLB decisions. Our empirical results based on real-world AI inference traces demonstrate that while the existing GLB algorithms result in disproportionately large social and environmental costs in certain regions, our proposed equitable GLB can fairly balance AI’s negative social and environmental costs across all the regions.
- e-EnergyJieming Bian, Lei Wang, Shaolei Ren, and Jie Xue-Energy, 2024
Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
- NeurIPSJianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, and Shaolei RenNeurIPS, 2023
This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under anytime constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
- NeurIPSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenNeurIPS, 2023
We study a general form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically, we prove that RCL is able to guarantee (1+lambda)-competitiveness against any given expert for any lambda>0, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly, RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay. We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportation as a case study.
- ASPLOSNuntipat Narkthong, Shijin Duan, Shaolei Ren, and Xiaolin XuASPLOS, 2024
Artificial intelligence has become feasible on tiny edge devices, thanks to the availability of high-performance microcontroller units (MCUs) and the development of highly-efficient machine learning (ML) models. Although successful, the current design practice implementing high-performance ML models on high-end MCU devices, still presents barriers that prohibit the more widespread adoption of ML on real-world applications built on low-end devices, where power and available hardware resources have been critical factors for including ML functionality in the final product. On the one hand, these low-end MCU devices have very limited computational resources in terms of hardware resources and power budget, which makes ML inference less accessible. On the other hand, these low-end and low-cost MCUs present the leading physical carrier of real-world smart applications, making small and fast-responsive (i.e., always-on) ML models an urgent need. This paper presents MicroVSA, a library of optimized implementations of a low-dimensional computing (LDC) classifier, a recently proposed variant of vector symbolic architecture (VSA), for 8-bit, 16-bit, and 32-bit MCUs. MicroVSA achieves 1.33–10.27x speedup from the vanilla LDC, and our sample model achieves 85–93% accuracy on three most common classification tasks for always-on inference on the MCUs — myocardial infarction detection, human activity recognition, and hot word detection — while requiring only a few tens of bytes of RAM and fitting in the vast majority of tiny 8-bit MCUs. For instance, our model for detecting the phrase "Hey Snapdragon" only needs 6.75KB of flash and 0.02KB of RAM and can complete one inference in 10.6 ms on a low-end 8-bit AVR MCU or 0.68 ms on an ARM Cortex-M4 MCU, outperforming the best neural network-based model by a factor of 7.2x-113x, making it the first to run on an 8-bit MCUs and the most efficient hot word detection model. Our study suggests that ubiquitous ML deployment on extremely low-cost MCUs is possible and that more study on VSA model training, model compression, and implementation techniques is needed to further explore the possibility of lowering the cost and power of ML on edge devices.
- INFOCOMPengfei Li, Jianyi Yang, and Shaolei RenINFOCOM, 2023
Online optimization with memory costs has many real-world applications, where sequential actions are made without knowing the future input. Nonetheless, the memory cost couples the actions over time, adding substantial challenges. Conventionally, this problem has been approached by various expert-designed online algorithms with the goal of achieving bounded worst-case competitive ratios, but the resulting average performance is often unsatisfactory. On the other hand, emerging machine learning (ML) based optimizers can improve the average performance, but suffer from the lack of worst-case performance robustness. In this paper, we propose a novel expert-robustified learning (ERL) approach, achieving both good average performance and robustness. More concretely, for robustness, ERL introduces a novel projection operator that robustifies ML actions by utilizing an expert online algorithm; for average performance, ERL trains the ML optimizer based on a recurrent architecture by explicitly considering downstream expert robustification. We prove that, for any lambda≥1, ERL can achieve lambda-competitive against the expert algorithm and lambda*C-competitive against the optimal offline algorithm (where C is the expert’s competitive ratio). Additionally, we extend our analysis to a novel setting of multi-step memory costs. Finally, our analysis is supported by empirical experiments for an energy scheduling application.
- SIGMETRICSPengfei Li, Jianyi Yang, and Shaolei RenSIGMETRICS, 2022
We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses — one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.
- SIGMETRICSBingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, and Shaolei RenSIGMETRICS, 2022
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity — the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.
- ICMLJianyi Yang, and Shaolei RenICML, 2022
By integrating domain knowledge with labeled samples, informed machine learning has been emerging to improve the learning performance for a wide range of applications. Nonetheless, rigorous understanding of the role of injected domain knowledge has been under-explored. In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. Concretely, we quantitatively demonstrate the two benefits of domain knowledge in informed learning – regularizing the label-based supervision and supplementing the labeled samples – and reveal the trade-off between label and knowledge imperfectness in the bound of the population risk. Based on the theoretical analysis, we propose a generalized informed training objective to better exploit the benefits of knowledge and balance the label and knowledge imperfectness, which is validated by the population risk bound. Our analysis on sampling complexity sheds lights on how to choose the hyper-parameters for informed learning, and further justifies the advantages of knowledge informed learning.
- INFOCOMZhihui Shao, Jianyi Yang, Cong Shen, and Shaolei RenINFOCOM, 2022
Learning to optimize (L2O) has recently emerged as a promising approach to solving optimization problems by exploiting the strong prediction power of neural networks and offering lower runtime complexity than conventional solvers. While L2O has been applied to various problems, a crucial yet challenging class of problems – robust combinatorial optimization in the form of minimax optimization – have largely remained under-explored. In addition to the exponentially large decision space, a key challenge for robust combinatorial optimization lies in the inner optimization problem, which is typically non-convex and entangled with outer optimization. In this paper, we study robust combinatorial optimization and propose a novel learning-based optimizer, called LRCO (Learning for Robust Combinatorial Optimization), which quickly outputs a robust solution in the presence of uncertain context. LRCO leverages a pair of learning-based optimizers — one for the minimizer and the other for the maximizer — that use their respective objective functions as losses and can be trained without the need of labels for training problem instances. To evaluate the performance of LRCO, we perform simulations for the task offloading problem in vehicular edge computing. Our results highlight that LRCO can greatly reduce the worst-case cost and improve robustness, while having a very low runtime complexity.
- TCCMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangIEEE Transactions on Cloud Computing, Jan 2020
The massive energy consumption of data centers worldwide has resulted in a large carbon footprint, raising serious concerns to sustainable IT initiatives and attracting a great amount of research attention. Nonetheless, the current efforts to date, despite encouraging, have been primarily centered around owner-operated data centers (e.g., Google data center), leaving out another major segment of data center industry-colocation data centers-much less explored. As a major hindrance to carbon efficiency desired by the operator, colocation suffers from “split incentive”: tenants may not be willing to manage their servers for carbon efficiency. In this paper, we aim at minimizing the carbon footprint of geo-distributed colocation data centers, while ensuring that the operator’s cost meets a long-term budget constraint. We overcome the “split incentive” hurdle by devising a novel online carbon-aware incentive mechanism, called GreenColo, in which tenants voluntarily bid for energy reduction at self-determined prices and will receive financial rewards if their bids are accepted at runtime. Using trace based simulation we show that GreenColo results in a carbon footprint fairly close (23 versus 18 percent) to the optimal offline solution with future information, while being able to satisfy the colocation operator’s long-term budget constraint. We demonstrate the effectiveness of GreenColo in practical scenarios via both simulation studies and scaled-down prototype experiments. Our results show that GreenColo can reduce the carbon footprint by up to 24 percent without incurring any additional cost for the colocation operator (compared to the no-incentive baseline case), while tenants receive financial rewards for “free” without violating service level agreement.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, and Adam WiermanHPCA, 2018
Despite the common practice of oversubscription, power capacity is largely under-utilized in data centers. A significant factor driving this under-utilization is fluctuation of the aggregate power demand, resulting in unused “spot (power) capacity”. In this paper, we tap into spot capacity for improving power infrastructure utilization in multi-tenant data centers, an important but under-explored type of data center where multiple tenants house their own physical servers. We propose a novel market, called SpotDC, to allocate spot capacity to tenants on demand. Specifically, SpotDC extracts tenants’ racklevel spot capacity demand through an elastic demand function, based on which the operator sets the market price for spot capacity allocation. We evaluate SpotDC using both testbed experiments and simulations, demonstrating that SpotDC improves power infrastructure utilization and creates a “win-win” situation: the data center operator increases its profit (by nearly 10%), while tenants improve their performance (by 1.2-1.8x on average compared to the no spot capacity case, yet at a marginal cost).
- TCCMohammad A. Islam, Kishwar Ahmed, Hong Xu, Nguyen H. Tran, Gang Quan, and Shaolei RenIEEE Transactions on Cloud Computing, Jul 2018
As the critical infrastructure for supporting Internet and cloud computing services, massive geo-distributed data centers are notorious for their huge electricity appetites and carbon footprints. Nonetheless, a lesser-known fact is that data centers are also “thirsty”: to operate data centers, millions of gallons of water are required for cooling and electricity production. The existing water-saving techniques primarily focus on improved “engineering” (e.g., upgrading to air economizer cooling, diverting recycled/sea water instead of potable water) and do not apply to all data centers due to high upfront capital costs and/or location restrictions. In this paper, we propose a software-based approach towards water conservation by exploiting the inherent spatio-temporal diversity of water efficiency across geo-distributed data centers. Specifically, we propose a batch job scheduling algorithm, called WACE (minimization of WAter, Carbon and Electricity cost), which dynamically adjusts geographic load balancing and resource provisioning to minimize the water consumption along with carbon emission and electricity cost while satisfying average delay performance requirement. WACE can be implemented online without foreseeing the far future information and yields a total cost (incorporating electricity cost, water consumption and carbon emission) that is provably close to the optimal algorithm with lookahead information. Finally, we validate WACE through a trace-based simulation study and show that WACE outperforms state-of-the-art benchmarks: 25 percent water saving while incurring an acceptable delay increase. We also extend WACE to joint scheduling of batch workloads and delay-sensitive interactive workloads for further water footprint reduction in geo-distributed data centers.
- ICShaolei RenIEEE Internet Computing (invited), Jun 2017
Scaling up power infrastructures to accommodate growing data center demand is very costly and one of the biggest challenges faced by data centers today. In this paper, we propose to leverage market approaches for maximizing power capacity utilization in multi-tenant data centers, a crucial but under-explored type of data centers. Our study transforms the current practice that simply allocates power capacity in a fixed manner, into a dynamic, scalable, and coordinated market-based paradigm. To illustrate our design, we consider power oversubscription and study gracefully handling the occasional power emergencies in oversubscribed multi-tenant data centers. Concretely, we present a market approach, called COOrdinated Power management solution (COOP), which extracts tenants’ power reduction capabilities at runtime and judiciously coordinates tenants’ power reduction at a minimum performance cost.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, and Xiaorui WangHPCA, 2016
Power oversubscription in data centers may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. Handling such an emergency requires a graceful power capping solution that minimizes the performance loss. In this paper, we study power capping in a multi-tenant data center where the operator supplies power to multiple tenants that manage their own servers. Unlike owner-operated data centers, the operator lacks control over tenants’ servers. To address this challenge, we propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants’ power reduction for minimizing total performance loss (quantified in performance cost) while satisfying multiple power capping constraints. We build a prototype to show that COOP is efficient in terms of minimizing the total performance cost, even compared to the ideal but infeasible case that assumes the operator has full control over tenants’ servers. We also demonstrate that COOP is "win-win", increasing the operator’s profit (through oversubscription) and reducing tenants’ cost (through financial compensation for their power reduction during emergencies).
- HPCAMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangHPCA, 2015
Data centers are key participants in demand response programs, including emergency demand response (EDR), where the grid coordinates large electricity consumers for demand reduction in emergency situations to prevent major economic losses. While existing literature concentrates on owner-operated data centers, this work studies EDR in multi-tenant colocation data centers where servers are owned and managed by individual tenants. EDR in colocation data centers is significantly more challenging, due to lack of incentives to reduce energy consumption by tenants who control their servers and are typically on fixed power contracts with the colocation operator. Consequently, to achieve demand reduction goals set by the EDR program, the operator has to rely on the highly expensive and/or environmentally-unfriendly on-site energy backup/generation. To reduce cost and environmental impact, an efficient incentive mechanism is therefore in need, motivating tenants’ voluntary energy reduction in case of EDR. This work proposes a novel incentive mechanism, Truth-DR, which leverages a reverse auction to provide monetary remuneration to tenants according to their agreed energy reduction. Truth-DR is computationally efficient, truthful, and achieves 2-approximation in colocation-wide social cost. Trace-driven simulations verify the efficacy of the proposed auction mechanism.
- SCShaolei Ren, and Yuxiong HeSC, 2013
Due to the enormous energy consumption and associated environmental concerns, data centers have been increasingly pressured to reduce long-term net carbon footprint to zero, i.e., carbon neutrality. In this paper, we propose an online algorithm, called COCA (optimizing for COst minimization and CArbon neutrality), for minimizing data center operational cost while satisfying carbon neutrality without long-term future information. Unlike the existing research, COCA enables distributed server-level resource management: each server autonomously adjusts its processing speed and optimally decides the amount of workloads to process. We prove that COCA achieves a close-to-minimum operational cost (incorporating both electricity and delay costs) compared to the optimal algorithm with future information, while bounding the potential violation of carbon neutrality. We also perform trace-based simulation studies to complement the analysis, and the results show that COCA reduces cost by more than 25% (compared to state of the art) while resulting in a smaller carbon footprint.
AI & computing for sustainability
- ICMLYejia Liu, Jianyi Yang, Pengfei Li, Tongxin Li, and Shaolei RenICML, 2024
Public models have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, their exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents when utilized. Recognizing the public model’s predictions as a service, we advocate for integrating the objectives of downstream agents into the optimization process. Concretely, to address performance disparities and foster fairness among heterogeneous agents in training, we propose a novel q-Equitable objective. This objective, coupled with a policy gradient algorithm, is crafted to train the public model to produce a more equitable/uniform performance distribution across downstream agents, each with their unique concerns. Both theoretical analysis and empirical case studies have proven the effectiveness of our method in advancing performance equity across diverse downstream agents utilizing the public model for their decision-making.
- NeurIPSTongxin Li, Yiheng Lin, Shaolei Ren, and Adam WiermanNeurIPS, 2023
We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
- ICMLPengfei Li, Jianyi Yang, and Shaolei RenICML, 2023
Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robustness and can perform arbitrarily poorly. In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judicious condition to hedge against future uncertainties, decides whether to follow the expert’s decision or the RL decision for each online item. We prove that for any ρ∈[0,1], LOMAR is ρ-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines.
- INFOCOMJianyi Yang, and Shaolei RenINFOCOM, 2021
Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret’s structural decomposition, we propose context query algorithms to selectively obtain outside expert’s input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.
- TSUSCTianshu Wei, Shaolei Ren, and Qi ZhuIEEE Transactions on Sustainable Computing, Jul 2021
The majority of today’s power-hungry datacenters are physically co-located with office rooms in mixed-use buildings (MUBs). The heating, ventilation, and air conditioning (HVAC) system within each MUB is often shared or partially-shared between datacenter rooms and office zones, for removing the heat generated by computing equipment and maintaining desired room temperature for building tenants. To effectively reduce the total energy cost of MUBs, it is important to leverage the scheduling flexibility in both the HVAC system and the datacenter workload. In this work, we formulate both HVAC control and datacenter workload scheduling as a Markov decision process (MDP), and propose a deep reinforcement learning (DRL) based algorithm for minimizing the total energy cost while maintaining desired room temperature and meeting datacenter workload deadline constraints. Moreover, we also develop a heuristic DRL-based algorithm to enable interactive workload allocation among geographically distributed MUBs for further energy reduction. The experiment results demonstrate that our regular DRL-based algorithm can achieve up to 26.9 percent cost reduction for a single MUB, when compared with a baseline strategy. Our heuristic DRL-based algorithm can reduce the total energy cost by an additional 5.5 percent, when intelligently allocating interactive workload for multiple geographically distributed MUBs.
- TOMPECSShutong Chen, Zhi Zhou, Fangming Liu, Zongpeng Li, and Shaolei RenACM Transactions on Modeling and Performance Evaluation of Computing Systems, Jun 2018
Datacenters are major energy consumers and dissipate an enormous amount of waste heat. Simple outdoor discharging of datacenter heat is energy-consuming and environmentally unfriendly. By harvesting datacenter waste heat and selling to the district heating system (DHS), both energy cost compensation and environment protection can be achieved. To realize such benefits in practice, an efficient market mechanism is required to incentivize the participation of datacenters. This work proposes CloudHeat, an online reverse auction mechanism for the DHS to solicit heat bids from datacenters. To minimize long-term social operational cost of the DHS and the datacenters, we apply a RFHC approach for decomposing the long-term problem into a series of one-round auctions, guaranteeing a small loss in competitive ratio. The one-round optimization is still NP-hard, and we employ a randomized auction framework to simultaneously guarantee truthfulness, polynomial running time, and an approximation ratio of 2. The performance of CloudHeat is validated through theoretical analysis and trace-driven simulation studies.
- e-EnergyQihang Sun, Shaolei Ren, Chuan Wu, and Zongpeng Lie-Energy, 2016
Deferring batch workload in data centers is promising for demand response to enhance the efficiency and reliability of a power grid. Yet operators of multi-tenant colocation data centers still resort to eco-unfriendly diesel generators for demand response, because tenants lack incentives to defer their workloads. This work proposes an online auction mechanism for emergency demand response (EDR) in geo-distributed colocation data centers, which incentivizes tenants to delay and shuffle their workload across multiple data centers by providing monetary rewards. The mechanism, called BatchEDR, decides the tenants’ workload deferment/reduction and diesel usage in each data center upon receiving an EDR signal, for cost minimization throughout the entire EDR event, considering that only a limited amount of batch workloads can be deferred throughout EDR as well as across multiple data centers. Without future information, BatchEDR achieves a good competitive ratio compared to an omniscient offline optimal algorithm, while ensuring truthfulness and individual rationality over the auction process. Trace-driven experiments show that BatchEDR outperforms the existing mechanisms and achieves good social cost.
- JSACQihang Sun, Chuan Wu, Zongpeng Li, and Shaolei RenIEEE Journal on Selected Areas in Communications, Dec 2016
In this survey, we review the existing game-theoretic approaches for cyber security and privacy issues, categorizing their application into two classes, security and privacy. To show how game theory is utilized in cyberspace security and privacy, we select research regarding three main applications: cyber-physical security, communication security, and privacy. We present game models, features, and solutions of the selected works and describe their advantages and limitations from design to implementation of the defense mechanisms. We also identify some emerging trends and topics for future research. This survey not only demonstrates how to employ game-theoretic approaches to security and privacy but also encourages researchers to employ game theory to establish a comprehensive understanding of emerging security and privacy problems in cyberspace and potential solutions.
- INFOCOMLingquan Zhang, Shaolei Ren, Chuan Wu, and Zongpeng LiINFOCOM, 2015
Data centers are key participants in demand response programs, including emergency demand response (EDR), where the grid coordinates large electricity consumers for demand reduction in emergency situations to prevent major economic losses. While existing literature concentrates on owner-operated data centers, this work studies EDR in multi-tenant colocation data centers where servers are owned and managed by individual tenants. EDR in colocation data centers is significantly more challenging, due to lack of incentives to reduce energy consumption by tenants who control their servers and are typically on fixed power contracts with the colocation operator. Consequently, to achieve demand reduction goals set by the EDR program, the operator has to rely on the highly expensive and/or environmentally-unfriendly on-site energy backup/generation. To reduce cost and environmental impact, an efficient incentive mechanism is therefore in need, motivating tenants’ voluntary energy reduction in case of EDR. This work proposes a novel incentive mechanism, Truth-DR, which leverages a reverse auction to provide monetary remuneration to tenants according to their agreed energy reduction. Truth-DR is computationally efficient, truthful, and achieves 2-approximation in colocation-wide social cost. Trace-driven simulations verify the efficacy of the proposed auction mechanism.
- PerformanceNiangjun Chen, Xiaoqi Ren, Shaolei Ren, and Adam WiermanPerformance Evaluation, Sep 2015
Data centers have emerged as promising resources for demand response, particularly for emergency demand response (EDR), which saves the power grid from incurring blackouts during emergency situations. However, currently, data centers typically participate in EDR by turning on backup (diesel) generators, which is both expensive and environmentally unfriendly. In this paper, we focus on “greening” demand response in multi-tenant data centers, i.e., colocation data centers, by designing a pricing mechanism through which the data center operator can efficiently extract load reductions from tenants during emergency periods for EDR. In particular, we propose a pricing mechanism for both mandatory and voluntary EDR programs, ColoEDR, that is based on parameterized supply function bidding and provides provably near-optimal efficiency guarantees, both when tenants are price-taking and when they are price-anticipating. In addition to analytic results, we extend the literature on supply function mechanism design, and evaluate ColoEDR using trace-based simulation studies. These validate the efficiency analysis and conclude that the pricing mechanism is both beneficial to the environment and to the data center operator (by decreasing the need for backup diesel generation), while also aiding tenants (by providing payments for load reductions).
Secure AI & computing
- NeurIPSTong Zhou, Xuandong Zhao, Xiaolin Xu, and Shaolei RenNeruIPS, 2024
Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.
- ICLRTong Zhou, Shaolei Ren, and Xiaolin XuICLR, 2024
Deep neural network (DNN) models, despite their impressive performance, are vulnerable to exploitation by attackers who attempt to adapt them to other tasks for their own benefit. Current defense strategies mainly address this vulnerability at the model parameter level, leaving the potential of architectural-level defense largely unexplored. This paper, for the first time, addresses the issue of model protection by reducing transferability at the architecture level. Specially, we present a novel neural architecture search (NAS)-enabled algorithm that employs zero-cost proxies and evolutionary search, to design model architectures with low transferability. Our method, namely ArchLock, aims to achieve high performance on the source task, while degrading the performance on target tasks, i.e., locking the transferability of a DNN model. To achieve efficient cross-task search without having access to the training data owned by the attackers, we utilize zero-cost proxies to speed up architecture evaluation and simulate potential target task embeddings to assist cross-task search with a binary performance predictor. Extensive experiments on NAS-Bench-201 and TransNAS-Bench-101 demonstrate that ArchLock reduces transferability by up to 30% and 50%, respectively, with negligible performance degradation on source tasks (<2%).
- ICMLTong Zhou, Yukui Luo, Shaolei Ren, and Xiaolin XuICML, 2023
As a type of valuable intellectual property (IP), deep neural network (DNN) models have been protected by techniques like watermarking. However, such passive model protection cannot fully prevent model abuse. In this work, we propose an active model IP protection scheme, namely NNSplitter, which actively protects the model by splitting it into two parts: the obfuscated model that performs poorly due to weight obfuscation, and the model secrets consisting of the indexes and original values of the obfuscated weights, which can only be accessed by authorized users with the support of the trusted execution environment. Experimental results demonstrate the effectiveness of NNSplitter, e.g., by only modifying 275 out of over 11 million (i.e., 0.002%) weights, the accuracy of the obfuscated ResNet-18 model on CIFAR-10 can drop to 10%. Moreover, NNSplitter is stealthy and resilient against norm clipping and fine-tuning attacks, making it an appealing solution for DNN model protection.
- HPCAZhihui Shao, Mohammad A. Islam, and Shaolei RenHPCA, 2021
The widespread adoption of Internet of Things and latency-critical applications has fueled the burgeoning development of edge colocation data centers (a.k. a., edge colocation) — small-scale data centers in distributed locations. In an edge colocation, multiple entities/tenants house their own physical servers together, sharing the power and cooling infrastructures for cost efficiency and scalability. In this paper, we discover that the sharing of cooling systems also exposes edge colocations’ potential vulnerabilities to cooling load injection attacks (called thermal attacks) by an attacker which, if left at large, may create thermal emergencies and even trigger system outages. Importantly, thermal attacks can be launched by leveraging the emerging architecture of built-in batteries integrated with servers that can conceal the attacker’s actual server power (or cooling load). We consider both one-shot attacks (which aim at creating system outages) and repeated attacks (which aim at causing frequent thermal emergencies). For repeated attacks, we present a foresighted attack strategy which, using reinforcement learning, learns on the fly a good timing for attacks based on the battery state and benign tenants’ load. We also combine prototype experiments with simulations to validate our attacks and show that, for a small 8kW edge colocation, an attacker can potentially cause significant losses. Finally, we suggest effective countermeasures to the potential threat of thermal attacks.
- SIGMETRICSZhihui Shao, Mohammad A. Islam, and Shaolei RenSIGMETRICS, 2020
Attacks based on power analysis have been long existing and studied, with some recent works focused on data exfiltration from victim systems without using conventional communications (e.g., WiFi). Nonetheless, prior works typically rely on intrusive direct power measurement, either by implanting meters in the power outlet or tapping into the power cable, thus jeopardizing the stealthiness of attacks. In this paper, we propose NoDE (Noise for Data Exfiltration), a new system for stealthy data exfiltration from enterprise desktop computers. Specifically, NoDE achieves data exfiltration over a building’s power network by exploiting high-frequency voltage ripples (i.e., switching noises) generated by power factor correction circuits built into today’s computers. Located at a distance and even from a different room, the receiver can non-intrusively measure the voltage of a power outlet to capture the high-frequency switching noises for online information decoding without supervised training/learning. To evaluate NoDE, we run experiments on seven different computers from top-vendors and using top brand power supply units. Our results show that for a single transmitter, NoDE achieves a rate of up to 28.48 bits/second with a distance of 90 feet (27.4 meters) without the line of sight, demonstrating a practically stealthy threat. Based on the orthogonality of switching noise frequencies of different computers, we also demonstrate simultaneous data exfiltration from four computers using only one receiver. Finally, we present a few possible defenses, such as installing noise filters, and discuss their limitations.
- SIGMETRICSMohammad A. Islam, Luting Yang, Kiran Ranganath, and Shaolei RenSIGMETRICS, 2018
The common practice of power infrastructure oversubscription in data centers exposes dangerous vulnerabilities to well-timed power attacks (i.e., maliciously timed power loads to overload the infrastructure capacity), possibly creating outages and resulting in multimillion-dollar losses. In this paper, we focus on the emerging threat of power attacks in a multi-tenant data center, where a malicious tenant (i.e., attacker) aims at compromising the data center availability through power attacks. We discover a novel acoustic side channel resulting from servers’ cooling fan noise, which can help the attacker time power attacks at the moments when benign tenants’ power usage is high. Concretely, we exploit the acoustic side channel by: (1) employing a high-pass filter to filter out the air conditioner’s noise; (2) applying non-negative matrix factorization with sparsity constraint to demix the received aggregate noise and detect periods of high power usage by benign tenants; and (3) designing a state machine to guide power attacks. We run experiments in a practical data center environment as well as simulation studies, and demonstrate that the acoustic side channel can assist the attacker with detecting more than 50% of all attack opportunities, representing state-of-the-art timing accuracy.
- CCSMohammad A. Islam, and Shaolei RenCCS, 2018
Maliciously-injected power load, a.k.a. power attack, has recently surfaced as a new egregious attack vector for dangerously compromising the data center availability. This paper focuses on the emerging threat of power attacks in a multi-tenant colocation data center, an important type of data center where multiple tenants house their own servers and share the power distribution system. Concretely, we discover a novel physical side channel — a voltage side channel — which leaks the benign tenants’ power usage information at runtime and helps an attacker precisely time its power attacks. The key idea we exploit is that, due to the Ohm’s Law, the high-frequency switching operation (40 100kHz) of the power factor correction circuit universally built in today’s server power supply units creates voltage ripples in the data center power lines. Importantly, without overlapping the grid voltage in the frequency domain, the voltage ripple signals can be easily sensed by the attacker to track the benign tenants’ runtime power usage and precisely time its power attacks. We evaluate the timing accuracy of the voltage side channel in a real data center prototype, demonstrating that the attacker can extract benign tenants’ power pattern with a great accuracy (correlation coefficient = 0.90+) and utilize 64% of all the attack opportunities without launching attacks randomly or consecutively. Finally, we highlight a few possible defense strategies and extend our study to more complex three-phase power distribution systems used in large multi-tenant data centers.
- CCSMohammad A. Islam, Shaolei Ren, and Adam WiermanCCS, 2017
The power capacity of multi-tenant data centers is typically oversubscribed in order to increase the utilization of expensive power infrastructure. This practice can create dangerous situations and compromise data center availability if the designed power capacity is exceeded. This paper demonstrates that current safeguards are vulnerable to well-timed power attacks launched by malicious tenants (i.e., attackers). Further, we demonstrate that there is a physical side channel — a thermal side channel due to hot air recirculation — that contains information about the benign tenants’ runtime power usage and can enable a malicious tenant to time power attacks effectively. In particular, we design a state-augmented Kalman filter to extract this information from the side channel and guide an attacker to use its maximum power at moments that coincide with the benign tenants’ high power demand, thus overloading the shared power capacity. Our experimental results show that an attacker can capture 54% of all attack opportunities, significantly compromising the data center availability. Finally, we discuss a set of possible defense strategies to safeguard the data center infrastructure against power attacks.
- CSURCuong T. Do, Nguyen H. Tran, Choongseon Hong, Charles A. Kamhoua, Kevin A. Kwiat, Erik Blasch, Shaolei Ren, Niki Pissinou, and Sundaraja Sitharama IyengarACM Computing Surveys, May 2017
In this survey, we review the existing game-theoretic approaches for cyber security and privacy issues, categorizing their application into two classes, security and privacy. To show how game theory is utilized in cyberspace security and privacy, we select research regarding three main applications: cyber-physical security, communication security, and privacy. We present game models, features, and solutions of the selected works and describe their advantages and limitations from design to implementation of the defense mechanisms. We also identify some emerging trends and topics for future research. This survey not only demonstrates how to employ game-theoretic approaches to security and privacy but also encourages researchers to employ game theory to establish a comprehensive understanding of emerging security and privacy problems in cyberspace and potential solutions.
Others
- NeurIPSTongxin Li, Tinashe Handina, Shaolei Ren, and Adam WiermanNeurIPS, 2024
The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent’s own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent’s payoff. In particular, we formally define a trade-off between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the trade-off by providing upper and lower bounds on the payoff gap for both normal-form and stochastic Bayesian games, with numerical results provided.
- TCYang Bai, Lixing Chen, Shaolei Ren, and Jie XuIEEE Transactions on Computers, May 2022
The rapid uptake of intelligent applications is pushing deep learning (DL) capabilities to Internet-of-Things (IoT). Despite the emergence of new tools for embedding deep neural networks (DNNs) into IoT devices, providing satisfactory Quality of Experience (QoE) to users is still challenging due to the heterogeneity in DNN architectures, IoT devices, and user preferences. This paper studies automated customization for DL inference on IoT devices (termed as on-thing inference), and our goal is to enhance user QoE by configuring the on-thing inference with an appropriate DNN for users under different usage scenarios. The core of our method is a DNN selection module that learns user QoE patterns on-the-fly and identifies the best-fit DNN for on-thing inference with the learned knowledge. It leverages a novel online learning algorithm, NeuralUCB, that has excellent generalization ability for handling various user QoE patterns. We also embed the knowledge transfer technique in NeuralUCB to expedite the learning process. However, NeuralUCB frequently solicits QoE ratings from users, which incurs non-negligible inconvenience. To address this problem, we design feedback solicitation schemes to reduce the number of QoE solicitations while maintaining the learning efficiency of NeuralUCB. A pragmatic problem, aggregated QoE, is further investigated to improve the practicality of our framework. We conduct experiments on both synthetic and real-world data. The results indicate that our method efficiently learns the user QoE pattern with few solicitations and provides drastic QoE enhancement for IoT devices.
- NeurIPSYejia Liu, Wang Zhu, and Shaolei RenNeurIPS, 2022
Continual learning faces a crucial challenge of catastrophic forgetting. To address this challenge, experience replay (ER) that maintains a tiny subset of samples from previous tasks has been commonly used. Existing ER works usually focus on refining the learning objective for each task with a static memory construction policy. In this paper, we formulate the dynamic memory construction in ER as a combinatorial optimization problem, which aims at directly minimizing the global loss across all experienced tasks. We first apply three tactics to solve the problem in the offline setting as a starting point. To provide an approximate solution to this problem in the online continual learning setting, we further propose the Global Pseudo-task Simulation (GPS), which mimics future catastrophic forgetting of the current task by permutation. Our empirical results and analyses suggest that the GPS consistently improves accuracy across four commonly used vision benchmarks. We have also shown that our GPS can serve as the unified framework for integrating various memory construction policies in existing ER works.
- IoTBingqian Lu, Jianyi Yang, Jie Xu, and Shaolei RenIEEE Internet of Things Journal, Nov 2022
Edge devices, including, in particular, mobile devices, have been emerging as an increasingly more important platform for deep neural network (DNN) inference. Typically, multiple lightweight DNN models generated using different architectures and/or compression schemes can fit into a device, thus selecting an optimal one is crucial in order to maximize the users’ Quality of Experience (QoE) for edge inference. The existing approaches to device-aware DNN optimization are usually time consuming and not scalable in view of extremely diverse edge devices. More importantly, they focus on optimizing standard performance metrics (e.g., accuracy and latency), which may not translate into improvement of the users’ actual subjective QoE. In this article, we propose a novel automated and user-centric DNN selection engine, called Aquaman , which keeps users into a closed loop and leverages their QoE feedback to guide DNN selection decisions. The core of Aquaman is a neural network-based QoE predictor, which is continuously updated online. Additionally, we use neural bandit learning to balance exploitation and exploration, with a provably efficient QoE performance. Finally, we evaluate Aquaman on a 15-user experimental study as well as synthetic simulations, demonstrating the effectiveness of Aquaman.
- TCCQihang Sun, Shaolei Ren, and Chuan WuIEEE Transactions on Cloud Computing, Jan 2020
In view of the high capital expense for scaling up power capacity to meet the escalating demand, maximizing the utilization of built capacity has become a top priority for multi-tenant data center operators, where many cloud providers house their physical servers. The traditional power provisioning guarantees a high availability, but is very costly and results in a significant capacity under-utilization. On the other hand, power oversubscription (i.e., deploying more servers than what the capacity allows) improves utilization but offers no availability guarantees due to the necessity of power reduction to handle the resulting power emergencies. Given these limitations, we propose a novel hybrid power provisioning approach, called HyPP, which provides a combination of two different power availabilities to tenants: capacity with a very high availability (100 percent or nearly 100 percent), plus additional capacity with a medium availability that may be unavailable for up to a certain amount during each billing period. For HyPP, we design an online algorithm for the operator to coordinate tenants’ power reduction at runtime when the tenants’ aggregate power demand exceeds the power capacities. Our algorithm aims at achieving long-term fairness in tenants’ power reduction (defined as the ratio of total actual power reduction by a tenant to its contracted reduction budget over a billing period). We analyze the theoretical performance of our online algorithm and derive a good competitive ratio in terms of fairness compared to the offline optimum. We also validate our algorithm through simulations under realistic settings.
- TWCLixing Chen, Jie Xu, Shaolei Ren, and Pan ZhouIEEE Transactions on Wireless Communications, Dec 2018
Shared edge computing platforms deployed at the radio access network are expected to significantly improve the quality-of-service delivered by application service providers (ASPs) in a flexible and economic way. However, placing edge service in every possible edge site by an ASP is practically infeasible due to the ASP’s prohibitive budget requirement. In this paper, we investigate the edge service placement problem of an ASP under a limited budget, where the ASP dynamically rents computing/storage resources in edge sites to host its applications in close proximity to end users. Since the benefit of placing edge service in a specific site is usually unknown to the ASP a priori , optimal placement decisions must be made while learning this benefit. We pose this problem as a novel combinatorial contextual bandit learning problem. It is “combinatorial” because only a limited number of edge sites can be rented to provide the edge service given the ASP’s budget. It is “contextual” because we utilize user context information to enable finer-grained learning and decision-making. To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm. Furthermore, SEEN is extended to scenarios with overlapping service coverage by incorporating a disjunctively constrained knapsack problem. In both cases, we prove that our algorithm achieves a sublinear regret bound when it is compared with an Oracle algorithm that knows the exact benefit information. Simulations are carried out on a real-world dataset, whose results show that SEEN significantly outperforms benchmark solutions.
- TWCMinh NH Nguyen, Nguyen H Tran, Mohammad A. Islam, Chuan Pham, Shaolei Ren, and Choong Seon HongIEEE Transactions on Wireless Communications, Mar 2018
Keeping wireless base stations operating continually and providing uninterrupted communications services can save billions of dollars as well as human lives during natural disasters and/or electricity outages. Toward this end, wireless operators need to install backup power supplies whose capacity is sufficient to support their peak power demand, thus incurring a significant capital expense. Hence, pooling together backup power supplies and sharing it among co-located wireless operators can effectively reduce the capital expense, as the backup power capacity can be sized based on the aggregate demand of co-located operators instead of individual demand. Turning this vision into reality, however, faces a new challenge: how to fairly share the backup power supply? In this paper, we propose fair sharing of backup power supply by multiple wireless operators based on the Nash bargaining solution (NBS). In addition, we integrate our analysis with multiple time slots for emergency cases in which the study the backup energy sharing based on model predictive control and NBS subject to an energy capacity constraint regarding future service availability. Our simulations demonstrate that sharing backup power/energy improves the communications service quality with lower cost and consumes less base station power than the non-sharing approach.
- SIGIRJeong-Min Yun, Yuxiong He, Sameh Elnikety, and Shaolei RenSIGIR, 2015
A web search engine often employs partition-aggregate architecture, where an aggregator propagates a user query to all index serving nodes (ISNs) and collects the responses from them. An aggregation policy determines how long the aggregators wait for the ISNs before returning aggregated results to users, crucially affecting both query latency and quality. Designing an aggregation policy is, however, challenging: Response latency among queries and among ISNs varies significantly, and aggregators lack of knowledge about when ISNs will respond. In this paper, we propose aggregation policies that minimize tail latency of search queries subject to search quality service level agreements (SLAs), combining data-driven offline analysis with online processing. Beginning with a single aggregator, we formally prove the optimality of our policy: It achieves the offline optimal result without knowing future responses of ISNs. We extend our policy for commonly-used hierarchical levels of aggregators and prove its optimality when messaging times between aggregators are known. We also present an empirically-effective policy to address unknown messaging time. We use production traces from a commercial search engine, a commercial advertisement engine, and synthetic workloads to evaluate the aggregation policy. The results show that compared to prior work, the policy reduces tail latency by up to 40% while satisfying same quality SLAs.
- INFOCOMShaolei Ren, Jaeok Park, and Mihaela SchaarINFOCOM, 2012
In this paper, we consider a user-generated content platform monetized through advertising and managed by an intermediary. To maximize the intermediary’s profit given the rational decision-making of content viewers and heterogeneous content producers, a payment scheme is proposed in which the intermediary can either tax or subsidize the content producers. First, we use a model with a representative content viewer to determine how the content viewers’ attention is allocated across available content by solving a utility maximization problem. Then, by modeling the content producers as self-interested agents making independent production decisions, we show that there exists a unique equilibrium in the content production stage, and propose a best-response dynamics to model the decision-making process. Next, we study the intermediary’s optimal payment based on decisions made by the representative content viewer and the content producers. In particular, by considering the well-known quality-adjusted Dixit-Stiglitz utility function for the representative content viewer, we derive explicitly the optimal payment maximizing the intermediary’s profit and characterize analytical conditions under which the intermediary should tax or subsidize the content producers. Finally, we generalize the analysis by considering heterogeneity in terms of production costs among the content producers.
- INFOCOMShaolei Ren, Jaeok Park, and Mihaela SchaarINFOCOM, 2011
In order to understand the complex interactions between different technologies in a communications market, it is of fundamental importance to understand how technologies affect the demand of users and competition between network service providers (NSPs). To this end, we analyze user subscription dynamics and revenue maximization in monopoly and duopoly communications markets. First, by considering a monopoly market with only one NSP, we investigate the impact of technologies on the users’ dynamic subscription. It is shown that, for any price charged by the NSP, there exists a unique equilibrium point of the considered user subscription dynamics. We also provide a sufficient condition under which the user subscription dynamics converges to the equilibrium point starting from any initial point. We then derive upper and lower bounds on the optimal price and market share that maximize the NSP’s revenue. Next, we turn to the analysis of a duopoly market and show that, for any charged prices, the equilibrium point of the considered user subscription dynamics exists and is unique. As in a monopoly market, we derive a sufficient condition on the technologies of the NSPs that ensures the user subscription dynamics to reach the equilibrium point. Then, we model the NSP competition using a non-cooperative game, in which the two NSPs choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium in the market competition game.
By year
2021 -- present
- CACMPengfei Li, Jianyi Yang, Mohammad A. Islam, and Shaolei RenCommunications of the ACM (accepted), 2024
The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water (withdrawal and consumption) footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly evaporate 700,000 liters of clean freshwater, but such information has been kept a secret. More critically, the global AI demand might be accountable for 4.2 – 6.6 billion cubic meters of water withdrawal in 2027, which is more than the total annual water withdrawal of 4 – 6 Denmark or half of the United Kingdom. This is very concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also must, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate the water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models’ runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.
- SIGMETRICSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenSIGMETRICS, 2025
This paper studies learning-augmented decentralized online convex optimization in a networked multi-agent system, a challenging setting that has remained under-explored. We first consider a linear learning-augmented decentralized online algorithm (LADO-Lin) that combines a machine learning (ML) policy with a baseline expert policy in a linear manner. We show that, while LADO-Lin can exploit the potential of ML predictions to improve the average cost performance, it cannot have guaranteed worst-case performance. To address this limitation, we propose a novel online algorithm (LADO) that adaptively combines the ML policy and expert policy to safeguard the ML predictions to achieve strong competitiveness guarantees. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. Finally, we run an experiment on decentralized battery management for sustainable computing. Our results highlight the potential of ML augmentation to improve the average performance as well as the guaranteed worst-case performance of LADO.
- NeurIPSTong Zhou, Xuandong Zhao, Xiaolin Xu, and Shaolei RenNeruIPS, 2024
Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.
- ICMLYejia Liu, Jianyi Yang, Pengfei Li, Tongxin Li, and Shaolei RenICML, 2024
Public models have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, their exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents when utilized. Recognizing the public model’s predictions as a service, we advocate for integrating the objectives of downstream agents into the optimization process. Concretely, to address performance disparities and foster fairness among heterogeneous agents in training, we propose a novel q-Equitable objective. This objective, coupled with a policy gradient algorithm, is crafted to train the public model to produce a more equitable/uniform performance distribution across downstream agents, each with their unique concerns. Both theoretical analysis and empirical case studies have proven the effectiveness of our method in advancing performance equity across diverse downstream agents utilizing the public model for their decision-making.
- SIGMETRICSJianyi Yang, Pengfei Li, Mohammad J. Islam, and Shaolei RenSIGMETRICS, 2024
This paper studies online resource allocation with replenishable budgets, where budgets can be replenished on top of the initial budget and an agent sequentially chooses online allocation decisions without violating the available budget constraint at each round. We propose a novel online algorithm, called OACP (Opportunistic Allocation with Conservative Pricing), that conservatively adjusts dual variables while opportunistically utilizing available resources. OACP achieves a bounded asymptotic competitive ratio in adversarial settings as the number of decision rounds 𝑇 gets large. Importantly, the asymptotic competitive ratio of OACP is optimal in the absence of additional assumptions on budget replenishment. To further improve the competitive ratio, we make a mild assumption that there is budget replenishment every 𝑇*≥ 1 decision rounds and propose OACP+ to dynamically adjust the total budget assignment for online allocation. Next, we move beyond the worst-case and propose LA-OACP (Learning-Augmented OACP/OACP+), a novel learning-augmented algorithm for online allocation with replenishable budgets. We prove that LA-OACP can improve the average utility compared to OACP/OACP+ when the ML predictor is properly trained, while still offering worst-case utility guarantees when the ML predictions are arbitrarily wrong. Finally, we run simulation studies of sustainable AI inference powered by renewables, validating our analysis and demonstrating the empirical benefits of LA-OACP.
- NeurIPSJianyi Yang, Pengfei Li, Adam Wierman, and Shaolei RenNeurIPS, 2024
Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (kappa) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1-kappa on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called MetaAd, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio kappa in [0,1]. As a by-product, we extend MetaAd to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learning-augmented algorithms.
- NeurIPSTongxin Li, Tinashe Handina, Shaolei Ren, and Adam WiermanNeurIPS, 2024
The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent’s own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent’s payoff. In particular, we formally define a trade-off between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the trade-off by providing upper and lower bounds on the payoff gap for both normal-form and stochastic Bayesian games, with numerical results provided.
- e-EnergyPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei Rene-Energy, 2024
Fueled by the soaring popularity of large language and foundation models, the accelerated growth of artificial intelligence (AI) models’ enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity – the fact that AI’s environmental footprint can be disproportionately higher in certain regions than in others – has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI’s environmental inequity by balancing its regional negative environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (GLB) to explicitly address AI’s environmental impacts on the most disadvantaged regions. We run trace-based simulations by considering a set of 10 geographically-distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while our proposed equity-aware GLB can significantly reduce the regional disparity in terms of carbon and water footprints.
- e-EnergyPranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, and Mohammad A. Islame-Energy, 2024 (Best Notes Paper Award)
Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2018 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it.
- HotCarbonYejia Liu, Pengfei Li, Daniel Wong, and Shaolei RenHotCarbon, 2024
The enormous growth of AI computing has led to a surging demand for electricity. To stem the resulting energy cost and environmental impact, this paper explores opportunities enabled by the increasing hardware heterogeneity and introduces the concept of Geographical Server Relocation (GSR). Specifically, GSR physically balances the available AI servers across geographically distributed data centers subject to AI computing demand and power capacity constraints in each location. The key idea of GSR is to relocate older and less energy-efficient servers to regions with more renewables, better water efficiencies and/or lower electricity prices. Our case study demonstrates that, even with modest flexibility of relocation, GSR can substantially reduce the total operational environmental footprints and operation costs of AI computing. We conclude this paper by discussing the major challenges of GSR, including service migration, software management, and algorithms.
- HotEthicsPengfei Li, Yejia Liu, Jianyi Yang, and Shaolei RenHotEthics, 2024
The sharply increasing sizes of artificial intelligence (AI) models come with significant energy consumption and environmental footprints, which can disproportionately impact certain (often marginalized) regions and hence create environmental inequity concerns. Moreover, concerns with social inequity have also emerged, as AI computing resources may not be equitably distributed across the globe and users from certain disadvantaged regions with severe resource constraints can consistently experience inferior model performance. Importantly, the inequity concerns that encompass both social and environmental dimensions still remain unexplored and have increasingly hindered responsible AI. In this paper, we leverage the spatial flexibility of AI inference workloads and propose equitable geographical load balancing (GLB) to fairly balance AI’s regional social and environmental costs. Concretely, to penalize the disproportionately high social and environmental costs for equity, we introduce Lq norms as novel regularization terms into the optimization objective for GLB decisions. Our empirical results based on real-world AI inference traces demonstrate that while the existing GLB algorithms result in disproportionately large social and environmental costs in certain regions, our proposed equitable GLB can fairly balance AI’s negative social and environmental costs across all the regions.
- e-EnergyJieming Bian, Lei Wang, Shaolei Ren, and Jie Xue-Energy, 2024
Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
- NeurIPSJianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, and Shaolei RenNeurIPS, 2023
This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under anytime constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
- NeurIPSPengfei Li, Jianyi Yang, Adam Wierman, and Shaolei RenNeurIPS, 2023
We study a general form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically, we prove that RCL is able to guarantee (1+lambda)-competitiveness against any given expert for any lambda>0, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly, RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay. We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportation as a case study.
- NeurIPSTongxin Li, Yiheng Lin, Shaolei Ren, and Adam WiermanNeurIPS, 2023
We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
- ASPLOSNuntipat Narkthong, Shijin Duan, Shaolei Ren, and Xiaolin XuASPLOS, 2024
Artificial intelligence has become feasible on tiny edge devices, thanks to the availability of high-performance microcontroller units (MCUs) and the development of highly-efficient machine learning (ML) models. Although successful, the current design practice implementing high-performance ML models on high-end MCU devices, still presents barriers that prohibit the more widespread adoption of ML on real-world applications built on low-end devices, where power and available hardware resources have been critical factors for including ML functionality in the final product. On the one hand, these low-end MCU devices have very limited computational resources in terms of hardware resources and power budget, which makes ML inference less accessible. On the other hand, these low-end and low-cost MCUs present the leading physical carrier of real-world smart applications, making small and fast-responsive (i.e., always-on) ML models an urgent need. This paper presents MicroVSA, a library of optimized implementations of a low-dimensional computing (LDC) classifier, a recently proposed variant of vector symbolic architecture (VSA), for 8-bit, 16-bit, and 32-bit MCUs. MicroVSA achieves 1.33–10.27x speedup from the vanilla LDC, and our sample model achieves 85–93% accuracy on three most common classification tasks for always-on inference on the MCUs — myocardial infarction detection, human activity recognition, and hot word detection — while requiring only a few tens of bytes of RAM and fitting in the vast majority of tiny 8-bit MCUs. For instance, our model for detecting the phrase "Hey Snapdragon" only needs 6.75KB of flash and 0.02KB of RAM and can complete one inference in 10.6 ms on a low-end 8-bit AVR MCU or 0.68 ms on an ARM Cortex-M4 MCU, outperforming the best neural network-based model by a factor of 7.2x-113x, making it the first to run on an 8-bit MCUs and the most efficient hot word detection model. Our study suggests that ubiquitous ML deployment on extremely low-cost MCUs is possible and that more study on VSA model training, model compression, and implementation techniques is needed to further explore the possibility of lowering the cost and power of ML on edge devices.
- INFOCOMPengfei Li, Jianyi Yang, and Shaolei RenINFOCOM, 2023
Online optimization with memory costs has many real-world applications, where sequential actions are made without knowing the future input. Nonetheless, the memory cost couples the actions over time, adding substantial challenges. Conventionally, this problem has been approached by various expert-designed online algorithms with the goal of achieving bounded worst-case competitive ratios, but the resulting average performance is often unsatisfactory. On the other hand, emerging machine learning (ML) based optimizers can improve the average performance, but suffer from the lack of worst-case performance robustness. In this paper, we propose a novel expert-robustified learning (ERL) approach, achieving both good average performance and robustness. More concretely, for robustness, ERL introduces a novel projection operator that robustifies ML actions by utilizing an expert online algorithm; for average performance, ERL trains the ML optimizer based on a recurrent architecture by explicitly considering downstream expert robustification. We prove that, for any lambda≥1, ERL can achieve lambda-competitive against the expert algorithm and lambda*C-competitive against the optimal offline algorithm (where C is the expert’s competitive ratio). Additionally, we extend our analysis to a novel setting of multi-step memory costs. Finally, our analysis is supported by empirical experiments for an energy scheduling application.
- ICMLPengfei Li, Jianyi Yang, and Shaolei RenICML, 2023
Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robustness and can perform arbitrarily poorly. In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judicious condition to hedge against future uncertainties, decides whether to follow the expert’s decision or the RL decision for each online item. We prove that for any ρ∈[0,1], LOMAR is ρ-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines.
- ICLRTong Zhou, Shaolei Ren, and Xiaolin XuICLR, 2024
Deep neural network (DNN) models, despite their impressive performance, are vulnerable to exploitation by attackers who attempt to adapt them to other tasks for their own benefit. Current defense strategies mainly address this vulnerability at the model parameter level, leaving the potential of architectural-level defense largely unexplored. This paper, for the first time, addresses the issue of model protection by reducing transferability at the architecture level. Specially, we present a novel neural architecture search (NAS)-enabled algorithm that employs zero-cost proxies and evolutionary search, to design model architectures with low transferability. Our method, namely ArchLock, aims to achieve high performance on the source task, while degrading the performance on target tasks, i.e., locking the transferability of a DNN model. To achieve efficient cross-task search without having access to the training data owned by the attackers, we utilize zero-cost proxies to speed up architecture evaluation and simulate potential target task embeddings to assist cross-task search with a binary performance predictor. Extensive experiments on NAS-Bench-201 and TransNAS-Bench-101 demonstrate that ArchLock reduces transferability by up to 30% and 50%, respectively, with negligible performance degradation on source tasks (<2%).
- ICMLTong Zhou, Yukui Luo, Shaolei Ren, and Xiaolin XuICML, 2023
As a type of valuable intellectual property (IP), deep neural network (DNN) models have been protected by techniques like watermarking. However, such passive model protection cannot fully prevent model abuse. In this work, we propose an active model IP protection scheme, namely NNSplitter, which actively protects the model by splitting it into two parts: the obfuscated model that performs poorly due to weight obfuscation, and the model secrets consisting of the indexes and original values of the obfuscated weights, which can only be accessed by authorized users with the support of the trusted execution environment. Experimental results demonstrate the effectiveness of NNSplitter, e.g., by only modifying 275 out of over 11 million (i.e., 0.002%) weights, the accuracy of the obfuscated ResNet-18 model on CIFAR-10 can drop to 10%. Moreover, NNSplitter is stealthy and resilient against norm clipping and fine-tuning attacks, making it an appealing solution for DNN model protection.
- SIGMETRICSPengfei Li, Jianyi Yang, and Shaolei RenSIGMETRICS, 2022
We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses — one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.
- SIGMETRICSBingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, and Shaolei RenSIGMETRICS, 2022
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity — the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.
- ICMLJianyi Yang, and Shaolei RenICML, 2022
By integrating domain knowledge with labeled samples, informed machine learning has been emerging to improve the learning performance for a wide range of applications. Nonetheless, rigorous understanding of the role of injected domain knowledge has been under-explored. In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. Concretely, we quantitatively demonstrate the two benefits of domain knowledge in informed learning – regularizing the label-based supervision and supplementing the labeled samples – and reveal the trade-off between label and knowledge imperfectness in the bound of the population risk. Based on the theoretical analysis, we propose a generalized informed training objective to better exploit the benefits of knowledge and balance the label and knowledge imperfectness, which is validated by the population risk bound. Our analysis on sampling complexity sheds lights on how to choose the hyper-parameters for informed learning, and further justifies the advantages of knowledge informed learning.
- INFOCOMZhihui Shao, Jianyi Yang, Cong Shen, and Shaolei RenINFOCOM, 2022
Learning to optimize (L2O) has recently emerged as a promising approach to solving optimization problems by exploiting the strong prediction power of neural networks and offering lower runtime complexity than conventional solvers. While L2O has been applied to various problems, a crucial yet challenging class of problems – robust combinatorial optimization in the form of minimax optimization – have largely remained under-explored. In addition to the exponentially large decision space, a key challenge for robust combinatorial optimization lies in the inner optimization problem, which is typically non-convex and entangled with outer optimization. In this paper, we study robust combinatorial optimization and propose a novel learning-based optimizer, called LRCO (Learning for Robust Combinatorial Optimization), which quickly outputs a robust solution in the presence of uncertain context. LRCO leverages a pair of learning-based optimizers — one for the minimizer and the other for the maximizer — that use their respective objective functions as losses and can be trained without the need of labels for training problem instances. To evaluate the performance of LRCO, we perform simulations for the task offloading problem in vehicular edge computing. Our results highlight that LRCO can greatly reduce the worst-case cost and improve robustness, while having a very low runtime complexity.
- TCYang Bai, Lixing Chen, Shaolei Ren, and Jie XuIEEE Transactions on Computers, May 2022
The rapid uptake of intelligent applications is pushing deep learning (DL) capabilities to Internet-of-Things (IoT). Despite the emergence of new tools for embedding deep neural networks (DNNs) into IoT devices, providing satisfactory Quality of Experience (QoE) to users is still challenging due to the heterogeneity in DNN architectures, IoT devices, and user preferences. This paper studies automated customization for DL inference on IoT devices (termed as on-thing inference), and our goal is to enhance user QoE by configuring the on-thing inference with an appropriate DNN for users under different usage scenarios. The core of our method is a DNN selection module that learns user QoE patterns on-the-fly and identifies the best-fit DNN for on-thing inference with the learned knowledge. It leverages a novel online learning algorithm, NeuralUCB, that has excellent generalization ability for handling various user QoE patterns. We also embed the knowledge transfer technique in NeuralUCB to expedite the learning process. However, NeuralUCB frequently solicits QoE ratings from users, which incurs non-negligible inconvenience. To address this problem, we design feedback solicitation schemes to reduce the number of QoE solicitations while maintaining the learning efficiency of NeuralUCB. A pragmatic problem, aggregated QoE, is further investigated to improve the practicality of our framework. We conduct experiments on both synthetic and real-world data. The results indicate that our method efficiently learns the user QoE pattern with few solicitations and provides drastic QoE enhancement for IoT devices.
- NeurIPSYejia Liu, Wang Zhu, and Shaolei RenNeurIPS, 2022
Continual learning faces a crucial challenge of catastrophic forgetting. To address this challenge, experience replay (ER) that maintains a tiny subset of samples from previous tasks has been commonly used. Existing ER works usually focus on refining the learning objective for each task with a static memory construction policy. In this paper, we formulate the dynamic memory construction in ER as a combinatorial optimization problem, which aims at directly minimizing the global loss across all experienced tasks. We first apply three tactics to solve the problem in the offline setting as a starting point. To provide an approximate solution to this problem in the online continual learning setting, we further propose the Global Pseudo-task Simulation (GPS), which mimics future catastrophic forgetting of the current task by permutation. Our empirical results and analyses suggest that the GPS consistently improves accuracy across four commonly used vision benchmarks. We have also shown that our GPS can serve as the unified framework for integrating various memory construction policies in existing ER works.
- INFOCOMJianyi Yang, and Shaolei RenINFOCOM, 2021
Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret’s structural decomposition, we propose context query algorithms to selectively obtain outside expert’s input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.
- IoTBingqian Lu, Jianyi Yang, Jie Xu, and Shaolei RenIEEE Internet of Things Journal, Nov 2022
Edge devices, including, in particular, mobile devices, have been emerging as an increasingly more important platform for deep neural network (DNN) inference. Typically, multiple lightweight DNN models generated using different architectures and/or compression schemes can fit into a device, thus selecting an optimal one is crucial in order to maximize the users’ Quality of Experience (QoE) for edge inference. The existing approaches to device-aware DNN optimization are usually time consuming and not scalable in view of extremely diverse edge devices. More importantly, they focus on optimizing standard performance metrics (e.g., accuracy and latency), which may not translate into improvement of the users’ actual subjective QoE. In this article, we propose a novel automated and user-centric DNN selection engine, called Aquaman , which keeps users into a closed loop and leverages their QoE feedback to guide DNN selection decisions. The core of Aquaman is a neural network-based QoE predictor, which is continuously updated online. Additionally, we use neural bandit learning to balance exploitation and exploration, with a provably efficient QoE performance. Finally, we evaluate Aquaman on a 15-user experimental study as well as synthetic simulations, demonstrating the effectiveness of Aquaman.
- HPCAZhihui Shao, Mohammad A. Islam, and Shaolei RenHPCA, 2021
The widespread adoption of Internet of Things and latency-critical applications has fueled the burgeoning development of edge colocation data centers (a.k. a., edge colocation) — small-scale data centers in distributed locations. In an edge colocation, multiple entities/tenants house their own physical servers together, sharing the power and cooling infrastructures for cost efficiency and scalability. In this paper, we discover that the sharing of cooling systems also exposes edge colocations’ potential vulnerabilities to cooling load injection attacks (called thermal attacks) by an attacker which, if left at large, may create thermal emergencies and even trigger system outages. Importantly, thermal attacks can be launched by leveraging the emerging architecture of built-in batteries integrated with servers that can conceal the attacker’s actual server power (or cooling load). We consider both one-shot attacks (which aim at creating system outages) and repeated attacks (which aim at causing frequent thermal emergencies). For repeated attacks, we present a foresighted attack strategy which, using reinforcement learning, learns on the fly a good timing for attacks based on the battery state and benign tenants’ load. We also combine prototype experiments with simulations to validate our attacks and show that, for a small 8kW edge colocation, an attacker can potentially cause significant losses. Finally, we suggest effective countermeasures to the potential threat of thermal attacks.
- TSUSCTianshu Wei, Shaolei Ren, and Qi ZhuIEEE Transactions on Sustainable Computing, Jul 2021
The majority of today’s power-hungry datacenters are physically co-located with office rooms in mixed-use buildings (MUBs). The heating, ventilation, and air conditioning (HVAC) system within each MUB is often shared or partially-shared between datacenter rooms and office zones, for removing the heat generated by computing equipment and maintaining desired room temperature for building tenants. To effectively reduce the total energy cost of MUBs, it is important to leverage the scheduling flexibility in both the HVAC system and the datacenter workload. In this work, we formulate both HVAC control and datacenter workload scheduling as a Markov decision process (MDP), and propose a deep reinforcement learning (DRL) based algorithm for minimizing the total energy cost while maintaining desired room temperature and meeting datacenter workload deadline constraints. Moreover, we also develop a heuristic DRL-based algorithm to enable interactive workload allocation among geographically distributed MUBs for further energy reduction. The experiment results demonstrate that our regular DRL-based algorithm can achieve up to 26.9 percent cost reduction for a single MUB, when compared with a baseline strategy. Our heuristic DRL-based algorithm can reduce the total energy cost by an additional 5.5 percent, when intelligently allocating interactive workload for multiple geographically distributed MUBs.
2016 -- 2020
- SIGMETRICSZhihui Shao, Mohammad A. Islam, and Shaolei RenSIGMETRICS, 2020
Attacks based on power analysis have been long existing and studied, with some recent works focused on data exfiltration from victim systems without using conventional communications (e.g., WiFi). Nonetheless, prior works typically rely on intrusive direct power measurement, either by implanting meters in the power outlet or tapping into the power cable, thus jeopardizing the stealthiness of attacks. In this paper, we propose NoDE (Noise for Data Exfiltration), a new system for stealthy data exfiltration from enterprise desktop computers. Specifically, NoDE achieves data exfiltration over a building’s power network by exploiting high-frequency voltage ripples (i.e., switching noises) generated by power factor correction circuits built into today’s computers. Located at a distance and even from a different room, the receiver can non-intrusively measure the voltage of a power outlet to capture the high-frequency switching noises for online information decoding without supervised training/learning. To evaluate NoDE, we run experiments on seven different computers from top-vendors and using top brand power supply units. Our results show that for a single transmitter, NoDE achieves a rate of up to 28.48 bits/second with a distance of 90 feet (27.4 meters) without the line of sight, demonstrating a practically stealthy threat. Based on the orthogonality of switching noise frequencies of different computers, we also demonstrate simultaneous data exfiltration from four computers using only one receiver. Finally, we present a few possible defenses, such as installing noise filters, and discuss their limitations.
- TCCMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangIEEE Transactions on Cloud Computing, Jan 2020
The massive energy consumption of data centers worldwide has resulted in a large carbon footprint, raising serious concerns to sustainable IT initiatives and attracting a great amount of research attention. Nonetheless, the current efforts to date, despite encouraging, have been primarily centered around owner-operated data centers (e.g., Google data center), leaving out another major segment of data center industry-colocation data centers-much less explored. As a major hindrance to carbon efficiency desired by the operator, colocation suffers from “split incentive”: tenants may not be willing to manage their servers for carbon efficiency. In this paper, we aim at minimizing the carbon footprint of geo-distributed colocation data centers, while ensuring that the operator’s cost meets a long-term budget constraint. We overcome the “split incentive” hurdle by devising a novel online carbon-aware incentive mechanism, called GreenColo, in which tenants voluntarily bid for energy reduction at self-determined prices and will receive financial rewards if their bids are accepted at runtime. Using trace based simulation we show that GreenColo results in a carbon footprint fairly close (23 versus 18 percent) to the optimal offline solution with future information, while being able to satisfy the colocation operator’s long-term budget constraint. We demonstrate the effectiveness of GreenColo in practical scenarios via both simulation studies and scaled-down prototype experiments. Our results show that GreenColo can reduce the carbon footprint by up to 24 percent without incurring any additional cost for the colocation operator (compared to the no-incentive baseline case), while tenants receive financial rewards for “free” without violating service level agreement.
- TCCQihang Sun, Shaolei Ren, and Chuan WuIEEE Transactions on Cloud Computing, Jan 2020
In view of the high capital expense for scaling up power capacity to meet the escalating demand, maximizing the utilization of built capacity has become a top priority for multi-tenant data center operators, where many cloud providers house their physical servers. The traditional power provisioning guarantees a high availability, but is very costly and results in a significant capacity under-utilization. On the other hand, power oversubscription (i.e., deploying more servers than what the capacity allows) improves utilization but offers no availability guarantees due to the necessity of power reduction to handle the resulting power emergencies. Given these limitations, we propose a novel hybrid power provisioning approach, called HyPP, which provides a combination of two different power availabilities to tenants: capacity with a very high availability (100 percent or nearly 100 percent), plus additional capacity with a medium availability that may be unavailable for up to a certain amount during each billing period. For HyPP, we design an online algorithm for the operator to coordinate tenants’ power reduction at runtime when the tenants’ aggregate power demand exceeds the power capacities. Our algorithm aims at achieving long-term fairness in tenants’ power reduction (defined as the ratio of total actual power reduction by a tenant to its contracted reduction budget over a billing period). We analyze the theoretical performance of our online algorithm and derive a good competitive ratio in terms of fairness compared to the offline optimum. We also validate our algorithm through simulations under realistic settings.
- SIGMETRICSMohammad A. Islam, Luting Yang, Kiran Ranganath, and Shaolei RenSIGMETRICS, 2018
The common practice of power infrastructure oversubscription in data centers exposes dangerous vulnerabilities to well-timed power attacks (i.e., maliciously timed power loads to overload the infrastructure capacity), possibly creating outages and resulting in multimillion-dollar losses. In this paper, we focus on the emerging threat of power attacks in a multi-tenant data center, where a malicious tenant (i.e., attacker) aims at compromising the data center availability through power attacks. We discover a novel acoustic side channel resulting from servers’ cooling fan noise, which can help the attacker time power attacks at the moments when benign tenants’ power usage is high. Concretely, we exploit the acoustic side channel by: (1) employing a high-pass filter to filter out the air conditioner’s noise; (2) applying non-negative matrix factorization with sparsity constraint to demix the received aggregate noise and detect periods of high power usage by benign tenants; and (3) designing a state machine to guide power attacks. We run experiments in a practical data center environment as well as simulation studies, and demonstrate that the acoustic side channel can assist the attacker with detecting more than 50% of all attack opportunities, representing state-of-the-art timing accuracy.
- CCSMohammad A. Islam, and Shaolei RenCCS, 2018
Maliciously-injected power load, a.k.a. power attack, has recently surfaced as a new egregious attack vector for dangerously compromising the data center availability. This paper focuses on the emerging threat of power attacks in a multi-tenant colocation data center, an important type of data center where multiple tenants house their own servers and share the power distribution system. Concretely, we discover a novel physical side channel — a voltage side channel — which leaks the benign tenants’ power usage information at runtime and helps an attacker precisely time its power attacks. The key idea we exploit is that, due to the Ohm’s Law, the high-frequency switching operation (40 100kHz) of the power factor correction circuit universally built in today’s server power supply units creates voltage ripples in the data center power lines. Importantly, without overlapping the grid voltage in the frequency domain, the voltage ripple signals can be easily sensed by the attacker to track the benign tenants’ runtime power usage and precisely time its power attacks. We evaluate the timing accuracy of the voltage side channel in a real data center prototype, demonstrating that the attacker can extract benign tenants’ power pattern with a great accuracy (correlation coefficient = 0.90+) and utilize 64% of all the attack opportunities without launching attacks randomly or consecutively. Finally, we highlight a few possible defense strategies and extend our study to more complex three-phase power distribution systems used in large multi-tenant data centers.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, and Adam WiermanHPCA, 2018
Despite the common practice of oversubscription, power capacity is largely under-utilized in data centers. A significant factor driving this under-utilization is fluctuation of the aggregate power demand, resulting in unused “spot (power) capacity”. In this paper, we tap into spot capacity for improving power infrastructure utilization in multi-tenant data centers, an important but under-explored type of data center where multiple tenants house their own physical servers. We propose a novel market, called SpotDC, to allocate spot capacity to tenants on demand. Specifically, SpotDC extracts tenants’ racklevel spot capacity demand through an elastic demand function, based on which the operator sets the market price for spot capacity allocation. We evaluate SpotDC using both testbed experiments and simulations, demonstrating that SpotDC improves power infrastructure utilization and creates a “win-win” situation: the data center operator increases its profit (by nearly 10%), while tenants improve their performance (by 1.2-1.8x on average compared to the no spot capacity case, yet at a marginal cost).
- TCCMohammad A. Islam, Kishwar Ahmed, Hong Xu, Nguyen H. Tran, Gang Quan, and Shaolei RenIEEE Transactions on Cloud Computing, Jul 2018
As the critical infrastructure for supporting Internet and cloud computing services, massive geo-distributed data centers are notorious for their huge electricity appetites and carbon footprints. Nonetheless, a lesser-known fact is that data centers are also “thirsty”: to operate data centers, millions of gallons of water are required for cooling and electricity production. The existing water-saving techniques primarily focus on improved “engineering” (e.g., upgrading to air economizer cooling, diverting recycled/sea water instead of potable water) and do not apply to all data centers due to high upfront capital costs and/or location restrictions. In this paper, we propose a software-based approach towards water conservation by exploiting the inherent spatio-temporal diversity of water efficiency across geo-distributed data centers. Specifically, we propose a batch job scheduling algorithm, called WACE (minimization of WAter, Carbon and Electricity cost), which dynamically adjusts geographic load balancing and resource provisioning to minimize the water consumption along with carbon emission and electricity cost while satisfying average delay performance requirement. WACE can be implemented online without foreseeing the far future information and yields a total cost (incorporating electricity cost, water consumption and carbon emission) that is provably close to the optimal algorithm with lookahead information. Finally, we validate WACE through a trace-based simulation study and show that WACE outperforms state-of-the-art benchmarks: 25 percent water saving while incurring an acceptable delay increase. We also extend WACE to joint scheduling of batch workloads and delay-sensitive interactive workloads for further water footprint reduction in geo-distributed data centers.
- TOMPECSShutong Chen, Zhi Zhou, Fangming Liu, Zongpeng Li, and Shaolei RenACM Transactions on Modeling and Performance Evaluation of Computing Systems, Jun 2018
Datacenters are major energy consumers and dissipate an enormous amount of waste heat. Simple outdoor discharging of datacenter heat is energy-consuming and environmentally unfriendly. By harvesting datacenter waste heat and selling to the district heating system (DHS), both energy cost compensation and environment protection can be achieved. To realize such benefits in practice, an efficient market mechanism is required to incentivize the participation of datacenters. This work proposes CloudHeat, an online reverse auction mechanism for the DHS to solicit heat bids from datacenters. To minimize long-term social operational cost of the DHS and the datacenters, we apply a RFHC approach for decomposing the long-term problem into a series of one-round auctions, guaranteeing a small loss in competitive ratio. The one-round optimization is still NP-hard, and we employ a randomized auction framework to simultaneously guarantee truthfulness, polynomial running time, and an approximation ratio of 2. The performance of CloudHeat is validated through theoretical analysis and trace-driven simulation studies.
- TWCLixing Chen, Jie Xu, Shaolei Ren, and Pan ZhouIEEE Transactions on Wireless Communications, Dec 2018
Shared edge computing platforms deployed at the radio access network are expected to significantly improve the quality-of-service delivered by application service providers (ASPs) in a flexible and economic way. However, placing edge service in every possible edge site by an ASP is practically infeasible due to the ASP’s prohibitive budget requirement. In this paper, we investigate the edge service placement problem of an ASP under a limited budget, where the ASP dynamically rents computing/storage resources in edge sites to host its applications in close proximity to end users. Since the benefit of placing edge service in a specific site is usually unknown to the ASP a priori , optimal placement decisions must be made while learning this benefit. We pose this problem as a novel combinatorial contextual bandit learning problem. It is “combinatorial” because only a limited number of edge sites can be rented to provide the edge service given the ASP’s budget. It is “contextual” because we utilize user context information to enable finer-grained learning and decision-making. To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm. Furthermore, SEEN is extended to scenarios with overlapping service coverage by incorporating a disjunctively constrained knapsack problem. In both cases, we prove that our algorithm achieves a sublinear regret bound when it is compared with an Oracle algorithm that knows the exact benefit information. Simulations are carried out on a real-world dataset, whose results show that SEEN significantly outperforms benchmark solutions.
- TWCMinh NH Nguyen, Nguyen H Tran, Mohammad A. Islam, Chuan Pham, Shaolei Ren, and Choong Seon HongIEEE Transactions on Wireless Communications, Mar 2018
Keeping wireless base stations operating continually and providing uninterrupted communications services can save billions of dollars as well as human lives during natural disasters and/or electricity outages. Toward this end, wireless operators need to install backup power supplies whose capacity is sufficient to support their peak power demand, thus incurring a significant capital expense. Hence, pooling together backup power supplies and sharing it among co-located wireless operators can effectively reduce the capital expense, as the backup power capacity can be sized based on the aggregate demand of co-located operators instead of individual demand. Turning this vision into reality, however, faces a new challenge: how to fairly share the backup power supply? In this paper, we propose fair sharing of backup power supply by multiple wireless operators based on the Nash bargaining solution (NBS). In addition, we integrate our analysis with multiple time slots for emergency cases in which the study the backup energy sharing based on model predictive control and NBS subject to an energy capacity constraint regarding future service availability. Our simulations demonstrate that sharing backup power/energy improves the communications service quality with lower cost and consumes less base station power than the non-sharing approach.
- CCSMohammad A. Islam, Shaolei Ren, and Adam WiermanCCS, Mar 2017
The power capacity of multi-tenant data centers is typically oversubscribed in order to increase the utilization of expensive power infrastructure. This practice can create dangerous situations and compromise data center availability if the designed power capacity is exceeded. This paper demonstrates that current safeguards are vulnerable to well-timed power attacks launched by malicious tenants (i.e., attackers). Further, we demonstrate that there is a physical side channel — a thermal side channel due to hot air recirculation — that contains information about the benign tenants’ runtime power usage and can enable a malicious tenant to time power attacks effectively. In particular, we design a state-augmented Kalman filter to extract this information from the side channel and guide an attacker to use its maximum power at moments that coincide with the benign tenants’ high power demand, thus overloading the shared power capacity. Our experimental results show that an attacker can capture 54% of all attack opportunities, significantly compromising the data center availability. Finally, we discuss a set of possible defense strategies to safeguard the data center infrastructure against power attacks.
- ICShaolei RenIEEE Internet Computing (invited), Jun 2017
Scaling up power infrastructures to accommodate growing data center demand is very costly and one of the biggest challenges faced by data centers today. In this paper, we propose to leverage market approaches for maximizing power capacity utilization in multi-tenant data centers, a crucial but under-explored type of data centers. Our study transforms the current practice that simply allocates power capacity in a fixed manner, into a dynamic, scalable, and coordinated market-based paradigm. To illustrate our design, we consider power oversubscription and study gracefully handling the occasional power emergencies in oversubscribed multi-tenant data centers. Concretely, we present a market approach, called COOrdinated Power management solution (COOP), which extracts tenants’ power reduction capabilities at runtime and judiciously coordinates tenants’ power reduction at a minimum performance cost.
- CSURCuong T. Do, Nguyen H. Tran, Choongseon Hong, Charles A. Kamhoua, Kevin A. Kwiat, Erik Blasch, Shaolei Ren, Niki Pissinou, and Sundaraja Sitharama IyengarACM Computing Surveys, May 2017
In this survey, we review the existing game-theoretic approaches for cyber security and privacy issues, categorizing their application into two classes, security and privacy. To show how game theory is utilized in cyberspace security and privacy, we select research regarding three main applications: cyber-physical security, communication security, and privacy. We present game models, features, and solutions of the selected works and describe their advantages and limitations from design to implementation of the defense mechanisms. We also identify some emerging trends and topics for future research. This survey not only demonstrates how to employ game-theoretic approaches to security and privacy but also encourages researchers to employ game theory to establish a comprehensive understanding of emerging security and privacy problems in cyberspace and potential solutions.
- HPCAMohammad A. Islam, Xiaoqi Ren, Shaolei Ren, Adam Wierman, and Xiaorui WangHPCA, 2016
Power oversubscription in data centers may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. Handling such an emergency requires a graceful power capping solution that minimizes the performance loss. In this paper, we study power capping in a multi-tenant data center where the operator supplies power to multiple tenants that manage their own servers. Unlike owner-operated data centers, the operator lacks control over tenants’ servers. To address this challenge, we propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants’ power reduction for minimizing total performance loss (quantified in performance cost) while satisfying multiple power capping constraints. We build a prototype to show that COOP is efficient in terms of minimizing the total performance cost, even compared to the ideal but infeasible case that assumes the operator has full control over tenants’ servers. We also demonstrate that COOP is "win-win", increasing the operator’s profit (through oversubscription) and reducing tenants’ cost (through financial compensation for their power reduction during emergencies).
- e-EnergyQihang Sun, Shaolei Ren, Chuan Wu, and Zongpeng Lie-Energy, 2016
Deferring batch workload in data centers is promising for demand response to enhance the efficiency and reliability of a power grid. Yet operators of multi-tenant colocation data centers still resort to eco-unfriendly diesel generators for demand response, because tenants lack incentives to defer their workloads. This work proposes an online auction mechanism for emergency demand response (EDR) in geo-distributed colocation data centers, which incentivizes tenants to delay and shuffle their workload across multiple data centers by providing monetary rewards. The mechanism, called BatchEDR, decides the tenants’ workload deferment/reduction and diesel usage in each data center upon receiving an EDR signal, for cost minimization throughout the entire EDR event, considering that only a limited amount of batch workloads can be deferred throughout EDR as well as across multiple data centers. Without future information, BatchEDR achieves a good competitive ratio compared to an omniscient offline optimal algorithm, while ensuring truthfulness and individual rationality over the auction process. Trace-driven experiments show that BatchEDR outperforms the existing mechanisms and achieves good social cost.
- JSACQihang Sun, Chuan Wu, Zongpeng Li, and Shaolei RenIEEE Journal on Selected Areas in Communications, Dec 2016
In this survey, we review the existing game-theoretic approaches for cyber security and privacy issues, categorizing their application into two classes, security and privacy. To show how game theory is utilized in cyberspace security and privacy, we select research regarding three main applications: cyber-physical security, communication security, and privacy. We present game models, features, and solutions of the selected works and describe their advantages and limitations from design to implementation of the defense mechanisms. We also identify some emerging trends and topics for future research. This survey not only demonstrates how to employ game-theoretic approaches to security and privacy but also encourages researchers to employ game theory to establish a comprehensive understanding of emerging security and privacy problems in cyberspace and potential solutions.
2015 & before
- HPCAMohammad A. Islam, Hasan Mahmud, Shaolei Ren, and Xiaorui WangHPCA, 2015
Data centers are key participants in demand response programs, including emergency demand response (EDR), where the grid coordinates large electricity consumers for demand reduction in emergency situations to prevent major economic losses. While existing literature concentrates on owner-operated data centers, this work studies EDR in multi-tenant colocation data centers where servers are owned and managed by individual tenants. EDR in colocation data centers is significantly more challenging, due to lack of incentives to reduce energy consumption by tenants who control their servers and are typically on fixed power contracts with the colocation operator. Consequently, to achieve demand reduction goals set by the EDR program, the operator has to rely on the highly expensive and/or environmentally-unfriendly on-site energy backup/generation. To reduce cost and environmental impact, an efficient incentive mechanism is therefore in need, motivating tenants’ voluntary energy reduction in case of EDR. This work proposes a novel incentive mechanism, Truth-DR, which leverages a reverse auction to provide monetary remuneration to tenants according to their agreed energy reduction. Truth-DR is computationally efficient, truthful, and achieves 2-approximation in colocation-wide social cost. Trace-driven simulations verify the efficacy of the proposed auction mechanism.
- INFOCOMLingquan Zhang, Shaolei Ren, Chuan Wu, and Zongpeng LiINFOCOM, 2015
Data centers are key participants in demand response programs, including emergency demand response (EDR), where the grid coordinates large electricity consumers for demand reduction in emergency situations to prevent major economic losses. While existing literature concentrates on owner-operated data centers, this work studies EDR in multi-tenant colocation data centers where servers are owned and managed by individual tenants. EDR in colocation data centers is significantly more challenging, due to lack of incentives to reduce energy consumption by tenants who control their servers and are typically on fixed power contracts with the colocation operator. Consequently, to achieve demand reduction goals set by the EDR program, the operator has to rely on the highly expensive and/or environmentally-unfriendly on-site energy backup/generation. To reduce cost and environmental impact, an efficient incentive mechanism is therefore in need, motivating tenants’ voluntary energy reduction in case of EDR. This work proposes a novel incentive mechanism, Truth-DR, which leverages a reverse auction to provide monetary remuneration to tenants according to their agreed energy reduction. Truth-DR is computationally efficient, truthful, and achieves 2-approximation in colocation-wide social cost. Trace-driven simulations verify the efficacy of the proposed auction mechanism.
- SIGIRJeong-Min Yun, Yuxiong He, Sameh Elnikety, and Shaolei RenSIGIR, 2015
A web search engine often employs partition-aggregate architecture, where an aggregator propagates a user query to all index serving nodes (ISNs) and collects the responses from them. An aggregation policy determines how long the aggregators wait for the ISNs before returning aggregated results to users, crucially affecting both query latency and quality. Designing an aggregation policy is, however, challenging: Response latency among queries and among ISNs varies significantly, and aggregators lack of knowledge about when ISNs will respond. In this paper, we propose aggregation policies that minimize tail latency of search queries subject to search quality service level agreements (SLAs), combining data-driven offline analysis with online processing. Beginning with a single aggregator, we formally prove the optimality of our policy: It achieves the offline optimal result without knowing future responses of ISNs. We extend our policy for commonly-used hierarchical levels of aggregators and prove its optimality when messaging times between aggregators are known. We also present an empirically-effective policy to address unknown messaging time. We use production traces from a commercial search engine, a commercial advertisement engine, and synthetic workloads to evaluate the aggregation policy. The results show that compared to prior work, the policy reduces tail latency by up to 40% while satisfying same quality SLAs.
- PerformanceNiangjun Chen, Xiaoqi Ren, Shaolei Ren, and Adam WiermanPerformance Evaluation, Sep 2015
Data centers have emerged as promising resources for demand response, particularly for emergency demand response (EDR), which saves the power grid from incurring blackouts during emergency situations. However, currently, data centers typically participate in EDR by turning on backup (diesel) generators, which is both expensive and environmentally unfriendly. In this paper, we focus on “greening” demand response in multi-tenant data centers, i.e., colocation data centers, by designing a pricing mechanism through which the data center operator can efficiently extract load reductions from tenants during emergency periods for EDR. In particular, we propose a pricing mechanism for both mandatory and voluntary EDR programs, ColoEDR, that is based on parameterized supply function bidding and provides provably near-optimal efficiency guarantees, both when tenants are price-taking and when they are price-anticipating. In addition to analytic results, we extend the literature on supply function mechanism design, and evaluate ColoEDR using trace-based simulation studies. These validate the efficiency analysis and conclude that the pricing mechanism is both beneficial to the environment and to the data center operator (by decreasing the need for backup diesel generation), while also aiding tenants (by providing payments for load reductions).
- SCShaolei Ren, and Yuxiong HeSC, 2013
Due to the enormous energy consumption and associated environmental concerns, data centers have been increasingly pressured to reduce long-term net carbon footprint to zero, i.e., carbon neutrality. In this paper, we propose an online algorithm, called COCA (optimizing for COst minimization and CArbon neutrality), for minimizing data center operational cost while satisfying carbon neutrality without long-term future information. Unlike the existing research, COCA enables distributed server-level resource management: each server autonomously adjusts its processing speed and optimally decides the amount of workloads to process. We prove that COCA achieves a close-to-minimum operational cost (incorporating both electricity and delay costs) compared to the optimal algorithm with future information, while bounding the potential violation of carbon neutrality. We also perform trace-based simulation studies to complement the analysis, and the results show that COCA reduces cost by more than 25% (compared to state of the art) while resulting in a smaller carbon footprint.
- TNETShaolei Ren, Jaeok Park, and Mihaela SchaarIEEE/ACM Transactions on Networking, Feb 2013
Focusing on a femtocell communications market, we study the entrant network service provider’s (NSP’s) long-term decision: whether to enter the market and which spectrum sharing technology to select to maximize its profit. This long-term decision is closely related to the entrant’s pricing strategy and the users’ aggregate demand, which we model as medium-term and short-term decisions, respectively. We consider two markets, one with no incumbent and the other with one incumbent. For both markets, we show the existence and uniqueness of an equilibrium point in the user subscription dynamics and provide a sufficient condition for the convergence of the dynamics. For the market with no incumbent, we derive upper and lower bounds on the optimal price and market share that maximize the entrant’s revenue, based on which the entrant selects an available technology to maximize its long-term profit. For the market with one incumbent, we model competition between the two NSPs as a noncooperative game, in which the incumbent and the entrant choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium. Finally, we formalize the problem of entry and spectrum-sharing scheme selection for the entrant and provide numerical results to complement our analysis.
- JSACByung-Gook Kim, Shaolei Ren, Mihaela Schaar, and Jang-Won LeeIEEE Journal on Selected Areas in Communications, Jul 2013
Electric vehicles (EVs) will play an important role in the future smart grid because of their capabilities of storing electrical energy in their batteries during off-peak hours and supplying the stored energy to the power grid during peak hours. In this paper, we consider a power system with an aggregator and multiple customers with EVs and propose novel electricity load scheduling algorithms which, unlike previous works, jointly consider the load scheduling for appliances and the energy trading using EVs. Specifically, we allow customers to determine how much energy to purchase from or to sell to the aggregator while taking into consideration the load demands of their residential appliances and the associated electricity bill. We propose two different approaches: a collaborative and a non-collaborative approach. In the collaborative approach, we develop an optimal distributed load scheduling algorithm that maximizes the social welfare of the power system. In the non-collaborative approach, we model the energy scheduling problem as a non-cooperative game among self-interested customers, where each customer determines its own load scheduling and energy trading to maximize its own profit. In order to resolve the unfairness between heavy and light customers in the non-collaborative approach, we propose a tiered billing scheme that can control the electricity rates for customers according to their different energy consumption levels. In both approaches, we also consider the uncertainty in the load demands, with which customers’ actual energy consumption may vary from the scheduled energy consumption. To study the impact of the uncertainty, we use the worst-case-uncertainty approach and develop distributed load scheduling algorithms that provide the guaranteed minimum performances in uncertain environments. Subsequently, we show when energy trading leads to an increase in the social welfare and we determine what are the customers’ incentives to participate in the energy trading in various usage scenarios including practical environments with uncertain load demands.
- INFOCOMShaolei Ren, Jaeok Park, and Mihaela SchaarINFOCOM, 2012
In this paper, we consider a user-generated content platform monetized through advertising and managed by an intermediary. To maximize the intermediary’s profit given the rational decision-making of content viewers and heterogeneous content producers, a payment scheme is proposed in which the intermediary can either tax or subsidize the content producers. First, we use a model with a representative content viewer to determine how the content viewers’ attention is allocated across available content by solving a utility maximization problem. Then, by modeling the content producers as self-interested agents making independent production decisions, we show that there exists a unique equilibrium in the content production stage, and propose a best-response dynamics to model the decision-making process. Next, we study the intermediary’s optimal payment based on decisions made by the representative content viewer and the content producers. In particular, by considering the well-known quality-adjusted Dixit-Stiglitz utility function for the representative content viewer, we derive explicitly the optimal payment maximizing the intermediary’s profit and characterize analytical conditions under which the intermediary should tax or subsidize the content producers. Finally, we generalize the analysis by considering heterogeneity in terms of production costs among the content producers.
- INFOCOMShaolei Ren, Jaeok Park, and Mihaela SchaarINFOCOM, 2011
In order to understand the complex interactions between different technologies in a communications market, it is of fundamental importance to understand how technologies affect the demand of users and competition between network service providers (NSPs). To this end, we analyze user subscription dynamics and revenue maximization in monopoly and duopoly communications markets. First, by considering a monopoly market with only one NSP, we investigate the impact of technologies on the users’ dynamic subscription. It is shown that, for any price charged by the NSP, there exists a unique equilibrium point of the considered user subscription dynamics. We also provide a sufficient condition under which the user subscription dynamics converges to the equilibrium point starting from any initial point. We then derive upper and lower bounds on the optimal price and market share that maximize the NSP’s revenue. Next, we turn to the analysis of a duopoly market and show that, for any charged prices, the equilibrium point of the considered user subscription dynamics exists and is unique. As in a monopoly market, we derive a sufficient condition on the technologies of the NSPs that ensures the user subscription dynamics to reach the equilibrium point. Then, we model the NSP competition using a non-cooperative game, in which the two NSPs choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium in the market competition game.