1
|
Czumaj A, Davies-Peck P, Parter M. Component stability in low-space massively parallel computation. DISTRIBUTED COMPUTING 2024; 37:35-64. [PMID: 38370529 PMCID: PMC10873458 DOI: 10.1007/s00446-024-00461-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/11/2024] [Indexed: 02/20/2024]
Abstract
In this paper, we study the power and limitations of component-stable algorithms in the low-space model of massively parallel computation (MPC). Recently Ghaffari, Kuhn and Uitto (FOCS 2019) introduced the class of component-stable low-space MPC algorithms, which are, informally, those algorithms for which the outputs reported by the nodes in different connected components are required to be independent. This very natural notion was introduced to capture most (if not all) of the known efficient MPC algorithms to date, and it was the first general class of MPC algorithms for which one can show non-trivial conditional lower bounds. In this paper we enhance the framework of component-stable algorithms and investigate its effect on the complexity of randomized and deterministic low-space MPC. Our key contributions include: 1. We revise and formalize the lifting approach of Ghaffari, Kuhn and Uitto. This requires a very delicate amendment of the notion of component stability, which allows us to fill in gaps in the earlier arguments. 2. We also extend the framework to obtain conditional lower bounds for deterministic algorithms and fine-grained lower bounds that depend on the maximum degree Δ . 3. We demonstrate a collection of natural graph problems for which deterministic component-unstable algorithms break the conditional lower bound obtained for component-stable algorithms. This implies that, in the context of deterministic algorithms, component-stable algorithms are conditionally weaker than the component-unstable ones. 4. We also show that the restriction to component-stable algorithms has an impact in the randomized setting. We present a natural problem which can be solved in O(1) rounds by a component-unstable MPC algorithm, but requires Ω ( log log ∗ n ) rounds for any component-stable algorithm, conditioned on the connectivity conjecture. Altogether our results imply that component-stability might limit the computational power of the low-space MPC model, at least in certain contexts, paving the way for improved upper bounds that escape the conditional lower bound setting of Ghaffari, Kuhn, and Uitto.
Collapse
Affiliation(s)
- Artur Czumaj
- Computer Science and Centre for Discrete Mathematics and its Applications (DIMAP), University of Warwick, Coventry, CV4 7AL UK
| | | | - Merav Parter
- Computer Science, Weizmann Institute, Rehovot, 7610001 Israel
| |
Collapse
|
2
|
A repository for the publication and sharing of heterogeneous materials data. Sci Data 2022; 9:787. [PMID: 36575234 PMCID: PMC9794830 DOI: 10.1038/s41597-022-01897-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 12/14/2022] [Indexed: 12/28/2022] Open
Abstract
National Materials Data Management and Service platform (NMDMS) is a materials data repository for the publication and sharing of heterogeneous materials scientific data and follows the FAIR principles: Findable, Accessible, Interoperable, and Reusable. To ensure data are 'Interoperable, NMDMS uses a user-friendly semi-structured scientific data model, named dynamic container', to define, exchange, and store heterogeneous scientific data. Then, a personalized yet standardized data submission subsystem, a rigorous project data review and publication subsystem, and a multi-granularity data query and retrieval subsystem collaboratively make data 'Reusable', 'Findable', and 'Accessible'. Finally, China's "National Key R&D Program: Material Genetic Engineering Key Special Project" has adopted NMDMS to publish and share its project data. There are 12,251,040 pieces of data published in NMDMS since 2018, under 87 categories and 1,912 user-defined schemas from 45 projects. The platform has been accessed 908875 times, and 2403,208 pieces of data have been downloaded. In short, NMDMS effectively accelerates the publication and sharing of material project data in China.
Collapse
|
3
|
Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022; 10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
- Genome Science and Technology Program, Faculty of Science, The University of British Columbia, Vancouver, Canada
| | - Pedro Lenz Casa
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Daniel Luis Notari
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila E Silva
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| |
Collapse
|
4
|
Áika: A Distributed Edge System for AI Inference. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6020068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Video monitoring and surveillance of commercial fisheries in world oceans has been proposed by the governing bodies of several nations as a response to crimes such as overfishing. Traditional video monitoring systems may not be suitable due to limitations in the offshore fishing environment, including low bandwidth, unstable satellite network connections and issues of preserving the privacy of crew members. In this paper, we present Áika, a robust system for executing distributed Artificial Intelligence (AI) applications on the edge. Áika provides engineers and researchers with several building blocks in the form of Agents, which enable the expression of computation pipelines and distributed applications with robustness and privacy guarantees. Agents are continuously monitored by dedicated monitoring nodes, and provide applications with a distributed checkpointing and replication scheme. Áika is designed for monitoring and surveillance in privacy-sensitive and unstable offshore environments, where flexible access policies at the storage level can provide privacy guarantees for data transfer and access.
Collapse
|
5
|
Nanongkai D, Scquizzato M. Equivalence classes and conditional hardness in massively parallel computations. DISTRIBUTED COMPUTING 2022; 35:165-183. [PMID: 35300185 PMCID: PMC8907129 DOI: 10.1007/s00446-021-00418-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/17/2021] [Indexed: 06/14/2023]
Abstract
The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle versus two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., P ≠ NP ), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems. In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by MPC ( o ( log N ) ) , and the standard space complexity classes L and NL , and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model. Specifically, our main results are as follows.Lower bounds conditioned on the one cycle versus two cycles conjecture can be instead argued under the L ⊈ MPC ( o ( log N ) ) conjecture: these two assumptions are equivalent, and refuting either of them would lead to o ( log N ) -round MPC algorithms for a large number of challenging problems, including list ranking, minimum cut, and planarity testing. In fact, we show that these problems and many others require asymptotically the same number of rounds as the seemingly much easier problem of distinguishing between a graph being one cycle or two cycles.Many lower bounds previously argued under the one cycle versus two cycles conjecture can be argued under an even more robust (thus harder to refute) conjecture, namely NL ⊈ MPC ( o ( log N ) ) . Refuting this conjecture would lead to o ( log N ) -round MPC algorithms for an even larger set of problems, including all-pairs shortest paths, betweenness centrality, and all aforementioned ones. Lower bounds under this conjecture hold for problems such as perfect matching and network flow.
Collapse
Affiliation(s)
- Danupon Nanongkai
- University of Copenhagen, Copenhagen, Denmark
- KTH Royal Institute of Technology, Stockholm, Sweden
| | | |
Collapse
|
6
|
Nagy RC, Balch JK, Bissell EK, Cattau ME, Glenn NF, Halpern BS, Ilangakoon N, Johnson B, Joseph MB, Marconi S, O’Riordan C, Sanovia J, Swetnam TL, Travis WR, Wasser LA, Woolner E, Zarnetske P, Abdulrahim M, Adler J, Barnes G, Bartowitz KJ, Blake RE, Bombaci SP, Brun J, Buchanan JD, Chadwick KD, Chapman MS, Chong SS, Chung YA, Corman JR, Couret J, Crispo E, Doak TG, Donnelly A, Duffy KA, Dunning KH, Duran SM, Edmonds JW, Fairbanks DE, Felton AJ, Florian CR, Gann D, Gebhardt M, Gill NS, Gram WK, Guo JS, Harvey BJ, Hayes KR, Helmus MR, Hensley RT, Hondula KL, Huang T, Hundertmark WJ, Iglesias V, Jacinthe P, Jansen LS, Jarzyna MA, Johnson TM, Jones KD, Jones MA, Just MG, Kaddoura YO, Kagawa‐Vivani AK, Kaushik A, Keller AB, King KBS, Kitzes J, Koontz MJ, Kouba PV, Kwan W, LaMontagne JM, LaRue EA, Li D, Li B, Lin Y, Liptzin D, Long WA, Mahood AL, Malloy SS, Malone SL, McGlinchy JM, Meier CL, Melbourne BA, Mietkiewicz N, Morisette JT, Moustapha M, Muscarella C, Musinsky J, Muthukrishnan R, Naithani K, Neely M, Norman K, Parker SM, Perez Rocha M, Petri L, Ramey CA, Record S, Rossi MW, SanClements M, Scholl VM, Schweiger AK, Seyednasrollah B, Sihi D, Smith KR, Sokol ER, Spaulding SA, Spiers AI, St. Denis LA, Staccone AP, Stack Whitney K, Stanitski DM, Stricker E, Surasinghe TD, Thomsen SK, Vasek PM, Xiaolu L, Yang D, Yu R, Yule KM, Zhu K. Harnessing the NEON data revolution to advance open environmental science with a diverse and data‐capable community. Ecosphere 2021. [DOI: 10.1002/ecs2.3833] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Affiliation(s)
- R. Chelsea Nagy
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Jennifer K. Balch
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
- Department of Geography University of Colorado Boulder Boulder Colorado USA
| | - Erin K. Bissell
- Biology Department Metropolitan State University of Denver Denver Colorado USA
| | - Megan E. Cattau
- Human‐Environment Systems Boise State University Boise Idaho USA
| | - Nancy F. Glenn
- Human‐Environment Systems Boise State University Boise Idaho USA
- University of New South Wales Sydney Sydney New South Wales Australia
| | - Benjamin S. Halpern
- National Center for Ecological Analysis and Synthesis (NCEAS) Santa Barbara California USA
- University of California Santa Barbara Santa Barbara California USA
| | - Nayani Ilangakoon
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Brian Johnson
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Maxwell B. Joseph
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Sergio Marconi
- School of Natural Resources & Environment University of Florida Gainesville Florida USA
| | | | - James Sanovia
- Department of Math, Science, and Technology Oglala Lakota College Kyle South Dakota USA
| | | | - William R. Travis
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
- Department of Geography University of Colorado Boulder Boulder Colorado USA
| | - Leah A. Wasser
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
- Department of Geography University of Colorado Boulder Boulder Colorado USA
| | - Elizabeth Woolner
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Phoebe Zarnetske
- Department of Integrative Biology Michigan State University East Lansing Michigan USA
| | - Mujahid Abdulrahim
- Department of Civil and Mechanical Engineering University of Missouri Kansas City Kansas City Missouri USA
| | - John Adler
- Department of Geography University of Colorado Boulder Boulder Colorado USA
- CIRES University of Colorado Boulder Boulder Colorado USA
| | - Grenville Barnes
- Department of Forest, Fisheries and Geomatics Sciences University of Florida Gainesville Florida USA
| | - Kristina J. Bartowitz
- Department of Forest, Rangeland, and Fire Sciences University of Idaho Moscow Idaho USA
| | - Rachael E. Blake
- National Socio‐Environmental Synthesis Center University of Maryland Annapolis Maryland USA
| | - Sara P. Bombaci
- Department of Fish, Wildlife, and Conservation Biology Colorado State University Fort Collins Colorado USA
| | - Julien Brun
- National Center for Ecological Analysis and Synthesis (NCEAS) Santa Barbara California USA
- University of California Santa Barbara Santa Barbara California USA
| | - Jacob D. Buchanan
- Department of Biological Sciences Bowling Green State University Bowling Green Ohio USA
| | - K. Dana Chadwick
- Department of Geological Sciences University of Texas Austin Austin Texas USA
- Department of Integrative Biology University of Texas Austin Austin Texas USA
| | - Melissa S. Chapman
- Department of Environmental Science, Policy, and Management University of California Berkeley Berkeley California USA
| | - Steven S. Chong
- National Center for Ecological Analysis and Synthesis (NCEAS) Santa Barbara California USA
- University of California Santa Barbara Santa Barbara California USA
- University of California Berkeley Library University of California Berkeley Berkeley California USA
| | - Y. Anny Chung
- Departments of Plant Biology and Plant Pathology University of Georgia Athens Georgia USA
| | - Jessica R. Corman
- School of Natural Resources University of Nebraska Lincoln Lincoln Nebraska USA
| | - Jannelle Couret
- Department of Biological Sciences University of Rhode Island Kingston Rhode Island USA
| | - Erika Crispo
- Department of Biology Pace University New York City New York USA
| | - Thomas G. Doak
- Department of Biology Indiana University Bloomington Indiana USA
| | - Alison Donnelly
- Department of Geography University of Wisconsin‐Milwaukee Milwaukee Wisconsin USA
| | - Katharyn A. Duffy
- School of Informatics, Computing & Cyber Systems Northern Arizona University Flagstaff Arizona USA
| | - Kelly H. Dunning
- School of Forestry and Wildlife Auburn University Auburn Alabama USA
| | - Sandra M. Duran
- Department of Ecology and Evolutionary Biology University of Arizona Tucson Arizona USA
| | - Jennifer W. Edmonds
- Department of Physical and Life Sciences Nevada State College Henderson Nevada USA
| | - Dawson E. Fairbanks
- Department of Environmental Science University of Arizona Tucson Arizona USA
| | - Andrew J. Felton
- Department of Wildland Resources Utah State University Logan Utah USA
| | | | - Daniel Gann
- Department of Biological Sciences Florida International University Miami Florida USA
| | - Martha Gebhardt
- School of Natural Resources and the Environment University of Arizona Tucson Arizona USA
| | - Nathan S. Gill
- Department of Natural Resources Management Texas Tech University Lubbock Texas USA
| | - Wendy K. Gram
- University Corporation for Atmospheric Research Boulder Colorado USA
| | - Jessica S. Guo
- College of Agriculture and Life Sciences University of Arizona Tucson Arizona USA
| | - Brian J. Harvey
- School of Environmental and Forest Sciences University of Washington Seattle Washington USA
| | - Katherine R. Hayes
- Department of Integrative and Systems Biology University of Colorado Denver Denver Colorado USA
| | - Matthew R. Helmus
- Department of Biology Temple University Philadelphia Pennsylvania USA
| | - Robert T. Hensley
- Battelle National Ecological Observatory Network Boulder Colorado USA
| | - Kelly L. Hondula
- National Socio‐Environmental Synthesis Center University of Maryland Annapolis Maryland USA
| | - Tao Huang
- Human‐Environment Systems Boise State University Boise Idaho USA
- Cary Institute of Ecosystem Services Millbrook New York USA
| | | | - Virginia Iglesias
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Pierre‐Andre Jacinthe
- Department of Earth Sciences Indiana University Purdue University Indianapolis Indiana USA
| | - Lara S. Jansen
- Department of Environmental Science & Management Portland State University Portland Oregon USA
| | - Marta A. Jarzyna
- Department of Evolution, Ecology, and Organismal Biology The Ohio State University Columbus Ohio USA
- Translational Data Analytics Institute The Ohio State University Columbus Ohio USA
| | | | | | | | | | - Youssef O. Kaddoura
- Department of Forest, Fisheries and Geomatics Sciences University of Florida Gainesville Florida USA
| | | | - Aleya Kaushik
- National Oceanic and Atmospheric Administration Boulder Colorado USA
| | - Adrienne B. Keller
- Department of Ecology, Evolution, and Behavior University of Minnesota Twin Cities St. Paul Minnesota USA
| | - Katelyn B. S. King
- Department of Fisheries and Wildlife Michigan State University East Lansing Michigan USA
| | - Justin Kitzes
- Department of Biological Sciences University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Michael J. Koontz
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Paige V. Kouba
- Department of Plant Sciences University of California Davis Davis California USA
| | - Wai‐Yin Kwan
- CALeDNA University of California Los Angeles Los Angeles California USA
| | | | - Elizabeth A. LaRue
- Department of Forestry and Natural Resources Purdue University West Lafayette Indiana USA
| | - Daijiang Li
- Department of Biological Sciences Louisiana State University Baton Rouge Louisiana USA
- Center for Computation & Technology Louisiana State University Baton Rouge Louisiana USA
| | - Bonan Li
- Department of Biological & Ecological Engineering Oregon State University Corvallis Oregon USA
| | - Yang Lin
- Soil and Water Sciences Department University of Florida Gainesville Florida USA
| | | | - William Alex Long
- Science and Technology Innovation Program Woodrow Wilson International Center for Scholars Washington D.C. USA
| | - Adam L. Mahood
- Department of Geography University of Colorado Boulder Boulder Colorado USA
| | - Samuel S. Malloy
- Battelle Center for Science, Engineering and Public Policy in the John Glenn College of Public Affairs Ohio State University Columbus Ohio USA
| | - Sparkle L. Malone
- Department of Biological Sciences Florida International University Miami Florida USA
| | | | - Courtney L. Meier
- Battelle National Ecological Observatory Network Boulder Colorado USA
| | - Brett A. Melbourne
- Department of Ecology and Evolutionary Biology University of Colorado Boulder Boulder Colorado USA
| | | | - Jeffery T. Morisette
- U.S. Department of Agriculture Forest Service Rocky Mountain Research Station Fort Collins Colorado USA
| | - Moussa Moustapha
- Department of Biological Science University of Ngaoundere Ngaoundere Adamawa Cameroon
| | - Chance Muscarella
- Department of Environmental Science University of Arizona Tucson Arizona USA
| | - John Musinsky
- Battelle National Ecological Observatory Network Boulder Colorado USA
| | | | - Kusum Naithani
- Department of Biological Sciences University of Arkansas‐Fayetteville Fayetteville Arkansas USA
| | - Merrie Neely
- GEO AquaWatch Clearwater Florida USA
- Global Science and Technology, Inc Greenbelt Maryland USA
| | - Kari Norman
- Department of Environmental Science, Policy, and Management University of California Berkeley Berkeley California USA
| | | | | | - Laís Petri
- School for Environment and Sustainability University of Michigan East Lansing Michigan USA
| | - Colette A. Ramey
- Biology Department Metropolitan State University of Denver Denver Colorado USA
| | - Sydne Record
- Department of Biology Bryn Mawr College Bryn Mawr Pennsylvania USA
| | - Matthew W. Rossi
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | | | - Victoria M. Scholl
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
- Department of Geography University of Colorado Boulder Boulder Colorado USA
| | - Anna K. Schweiger
- Remote Sensing Laboratories Department of Geography University of Zurich Zurich Switzerland
| | - Bijan Seyednasrollah
- School of Informatics, Computing & Cyber Systems Northern Arizona University Flagstaff Arizona USA
| | - Debjani Sihi
- Department of Environmental Sciences Emory University Atlanta Georgia USA
| | - Kathleen R. Smith
- Biology Department Metropolitan State University of Denver Denver Colorado USA
| | - Eric R. Sokol
- Battelle National Ecological Observatory Network Boulder Colorado USA
- INSTAAR University of Colorado Boulder Boulder Colorado USA
| | | | - Anna I. Spiers
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
- Department of Ecology and Evolutionary Biology University of Colorado Boulder Boulder Colorado USA
| | - Lise A. St. Denis
- Earth Lab, CIRES University of Colorado Boulder Boulder Colorado USA
| | - Anika P. Staccone
- Department of Ecology, Evolution, & Environmental Biology Columbia University New York New York USA
| | - Kaitlin Stack Whitney
- Department of Science, Technology, and Society Rochester Institute of Technology Henrietta New York USA
| | | | - Eva Stricker
- Department of Biology University of New Mexico Albuquerque New Mexico USA
| | - Thilina D. Surasinghe
- Department of Biological Sciences Bridgewater State University Bridgewater Massachusetts USA
| | - Sarah K. Thomsen
- Department of Integrative Biology Oregon State University Corvallis Oregon USA
| | - Patrisse M. Vasek
- Department of Math, Science, and Technology Oglala Lakota College Kyle South Dakota USA
| | - Li Xiaolu
- Department of Earth and Atmospheric Sciences Cornell University Ithaca New York USA
| | - Di Yang
- Wyoming GIS Center University of Wyoming Laramie Wyoming USA
| | - Rong Yu
- Department of Geography University of Wisconsin‐Milwaukee Milwaukee Wisconsin USA
| | - Kelsey M. Yule
- Biodiversity Knowledge Integration Center Arizona State University Tempe Arizona USA
| | - Kai Zhu
- Department of Environmental Studies University of California, Santa Cruz Santa Cruz California USA
| |
Collapse
|
7
|
Xu R, Li W, Li K, Zhou X, Qi H. Scheduling Mix-Coflows in Datacenter Networks. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2021. [DOI: 10.1109/tnsm.2020.3027498] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Feng H, Deng Y, Qin X, Min G. Criso: An Incremental Scalable and Cost-Effective Network Architecture for Data Centers. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2021. [DOI: 10.1109/tnsm.2020.3036875] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
9
|
Development of an Innovative ICT Infrastructure for an Eco-Cost System with Life Cycle Assessment. SUSTAINABILITY 2021. [DOI: 10.3390/su13063118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A novel Internet-based information communication technology (ICT) infrastructure for an eco-accounting system was successfully developed to deliver “EcoCosts”, which are the values of environmental impact throughout the product life cycle. The ICT infrastructure manages its internal elements and interacts with operation modules in the supply chain via Web-based service interfaces. The infrastructure consists of upperware, middleware, and resource layers. The upperware layer manipulates the middleware elements (cloud-based load balancing, life cycle assessment, Web-based services, and Radio Frequency Identification (RFID)-enabled mobile access), and manages the associated resources within the eco-accounting system. As novel features of the ICT infrastructure, load balancing is used to handle large numbers of data and to allocate the computing load across the eco-accounting network nodes, and life cycle assessment is conducted to analyse product footprints, which are the core of “EcoCost”, to facilitate consumers in comparing the environmental impacts between different products. A case study was conducted by transmitting product EcoCosts from businesses to consumers through the Internet, successfully verifying the system developed in this research. Because this research aims to pay more attention to the ICT aspects, the EcoCost is represented using a single value, hence simplifying the related calculation. This research provides a novel solution for dealing with the large numbers of data and computing loads required to manage EcoCost data throughout the product life cycle and to transmit EcoCosts from businesses to consumers.
Collapse
|
10
|
Zhang J, Jiang Y, Liu Y. Variable Expanding Structure for Data Center Interconnection Networks. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2021. [DOI: 10.20965/jaciii.2021.p0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Data centers are fundamental facilities that support high-performance computing and large-scale data processing. To guarantee that a data center can provide excellent properties of expanding and routing, the interconnection network of a data center should be designed elaborately. Herein, we propose a novel structure for the interconnection network of data centers that can be expanded with a variable coefficient, also known as a variable expanding structure (VES). A VES is designed in a hierarchical manner and built iteratively. A VES can include hundreds of thousands and millions of servers with only a few layers. Meanwhile, a VES has an extremely short diameter, which implies better performance on routing between every pair of servers. Furthermore, we design an address space for the servers and switches in a VES. In addition, we propose a construction algorithm and routing algorithm associated with the address space. The results and analysis of simulations verify that the expanding rate of a VES depends on three factors:n,m, andkwhere thenis the number of ports on a switch, themis the expanding speed and thekis the number of layers. However, the factormyields the optimal effect. Hence, a VES can be designed with factormto achieve the expected expanding rate and server scale based on the initial planning objectives.
Collapse
|
11
|
Advances in MapReduce Big Data Processing: Platform, Tools, and Algorithms. STUDIES IN BIG DATA 2021. [DOI: 10.1007/978-981-33-6400-4_6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
12
|
AlJame M, Ahmad I. DNA short read alignment on apache spark. APPLIED COMPUTING AND INFORMATICS 2020. [DOI: 10.1016/j.aci.2019.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.
Collapse
|
13
|
Zhang Z, Deng Y, Min G, Xie J, Yang LT, Zhou Y. HSDC: A Highly Scalable Data Center Network Architecture for Greater Incremental Scalability. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2019; 30:1105-1119. [DOI: 10.1109/tpds.2018.2874659] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
|
14
|
An efficient cost-based algorithm for scheduling workflow tasks in cloud computing systems. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3610-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
Sebei H, Hadj Taieb MA, Ben Aouicha M. Review of social media analytics process and Big Data pipeline. SOCIAL NETWORK ANALYSIS AND MINING 2018. [DOI: 10.1007/s13278-018-0507-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Zhang Z, Deng Y, Min G, Xie J, Huang S. ExCCC-DCN: A Highly Scalable, Cost-Effective and Energy-Efficient Data Center Structure. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017; 28:1046-1060. [DOI: 10.1109/tpds.2016.2609428] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
|
17
|
|
18
|
Ghaleb AM, Khalifa T, Ayoubi S, Shaban KB, Assi C. Surviving Multiple Failures in Multicast Virtual Networks With Virtual Machines Migration. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2016. [DOI: 10.1109/tnsm.2016.2616283] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
19
|
Huang M, Wu D, Yu CH, Fang Z, Interlandi M, Condie T, Cong J. Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale. PROCEEDINGS OF THE ... ACM SYMPOSIUM ON CLOUD COMPUTING [ELECTRONIC RESOURCE] : SOCC ... ... SOCC (CONFERENCE) 2016; 2016:456-469. [PMID: 28317049 DOI: 10.1145/2987550.2987569] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
Collapse
Affiliation(s)
- Muhuan Huang
- University of California Los Angeles; Falcon Computing Solutions, Inc
| | - Di Wu
- University of California Los Angeles; Falcon Computing Solutions, Inc
| | | | | | | | | | | |
Collapse
|
20
|
|
21
|
Affiliation(s)
- Anthony Lee
- Department of Statistics University of Warwick Coventry CV4 7AL UK
| | - Nick Whiteley
- School of Mathematics University of Bristol Bristol BS8 1TH UK
| |
Collapse
|
22
|
Mohammed EA, Far BH, Naugler C. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 2014; 7:22. [PMID: 25383096 PMCID: PMC4224309 DOI: 10.1186/1756-0381-7-22] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 10/18/2014] [Indexed: 12/23/2022] Open
Abstract
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
Collapse
Affiliation(s)
- Emad A Mohammed
- Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
| | - Behrouz H Far
- Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory Services, Calgary, AB, Canada
| |
Collapse
|
23
|
|
24
|
Zhang G, Li C, Zhang Y, Xing C. A Semantic++ MapReduce Parallel Programming Model. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2014. [DOI: 10.1142/s1793351x14400091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Big data is playing a more and more important role in every area such as medical health, internet finance, culture and education etc. How to process these big data efficiently is a huge challenge. MapReduce is a good parallel programming language to process big data. However, it has lots of shortcomings. For example, it cannot process complex computing. It cannot suit real-time computing. In order to overcome these shortcomings of MapReduce and its variants, in this paper, we propose a Semantic++ MapReduce parallel programming model. This study includes the following parts. (1) Semantic++ MapReduce parallel programming model. It includes physical framework of semantic++ MapReduce parallel programming model and logic framework of semantic++ MapReduce parallel programming model; (2) Semantic++ extraction and management method for big data; (3) Semantic++ MapReduce parallel programming computing framework. It includes semantic++ map, semantic++ reduce and semantic++ shuffle; (4) Semantic++ MapReduce for multi-data centers. It includes basic framework of semantic++ MapReduce for multi-data centers and semantic++ MapReduce application framework for multi-data centers; (5) A Case Study of semantic++ MapReduce across multi-data centers.
Collapse
Affiliation(s)
- Guigang Zhang
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, P. R. China
- Research Institute of Information Technology, Tsinghua University, Beijing 100190, P. R. China
| | - Chao Li
- Research Institute of Information Technology, Tsinghua University, Beijing 100190, P. R. China
| | - Yong Zhang
- Research Institute of Information Technology, Tsinghua University, Beijing 100190, P. R. China
| | - Chunxiao Xing
- Research Institute of Information Technology, Tsinghua University, Beijing 100190, P. R. China
| |
Collapse
|
25
|
Philip Chen C, Zhang CY. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2014.01.015] [Citation(s) in RCA: 1722] [Impact Index Per Article: 172.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
26
|
Freeman J, Vladimirov N, Kawashima T, Mu Y, Sofroniew NJ, Bennett DV, Rosen J, Yang CT, Looger LL, Ahrens MB. Mapping brain activity at scale with cluster computing. Nat Methods 2014; 11:941-50. [DOI: 10.1038/nmeth.3041] [Citation(s) in RCA: 205] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 06/23/2014] [Indexed: 12/18/2022]
|
27
|
Risk intelligence: making profit from uncertainty in data processing system. ScientificWorldJournal 2014; 2014:398235. [PMID: 24883392 PMCID: PMC4030500 DOI: 10.1155/2014/398235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 03/19/2014] [Indexed: 11/17/2022] Open
Abstract
In extreme scale data processing systems, fault tolerance is an essential and indispensable part. Proactive fault tolerance scheme (such as the speculative execution in MapReduce framework) is introduced to dramatically improve the response time of job executions when the failure becomes a norm rather than an exception. Efficient proactive fault tolerance schemes require precise knowledge on the task executions, which has been an open challenge for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a riskaware task assignment algorithm, to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertainty brings not only great challenges, but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that, compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.
Collapse
|
28
|
Liang F, Feng C, Lu X, Xu Z. Performance Benefits of DataMPI: A Case Study with BigDataBench. BIG DATA BENCHMARKS, PERFORMANCE OPTIMIZATION, AND EMERGING HARDWARE 2014. [DOI: 10.1007/978-3-319-13021-7_9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
29
|
Ravindra P, Anyanwu K. Nesting Strategies for Enabling Nimble MapReduce Dataflows for Large RDF Data. INT J SEMANT WEB INF 2014. [DOI: 10.4018/ijswis.2014010101] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Graph and semi-structured data are usually modeled in relational processing frameworks as “thin” relations (node, edge, node) and processing such data involves a lot of join operations. Intermediate results of joins with multi-valued attributes or relationships, contain redundant subtuples due to repetition of single-valued attributes. The amount of redundant content is high for real-world multi-valued relationships in social network (millions of Twitter followers of popular celebrities) or biological (multiple references to related proteins) datasets. In MapReduce-based platforms such as Apache Hive and Pig, redundancy in intermediate results contributes avoidable costs to the overall I/O, sorting, and network transfer overhead of join-intensive workloads due to longer workflows. Consequently, providing techniques for dealing with such redundancy will enable more nimble execution of such workflows. This paper argues for the use of a nested data model for representing intermediate data concisely using nesting-aware dataflow operators that allow for lazy and partial unnesting strategies. This approach reduces the overall I/O and network footprint of a workflow by concisely representing intermediate results during most of a workflow's execution, until complete unnesting is absolutely necessary. The proposed strategies are integrated into Apache Pig and experimental evaluation over real-world and synthetic benchmark datasets confirms their superiority over relational-style MapReduce systems such as Apache Pig and Hive.
Collapse
Affiliation(s)
- Padmashree Ravindra
- Department of Computer Science, North Carolina State University, Raleigh, NC, USA
| | - Kemafor Anyanwu
- Department of Computer Science, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
30
|
Ding L, Wang G, Xin J, Wang X, Huang S, Zhang R. ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms. DATA KNOWL ENG 2013. [DOI: 10.1016/j.datak.2013.04.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
31
|
Miklošík A, Hvizdová E. Knowledge base cloud - a new approach to knowledge management systems architecture. ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS 2013. [DOI: 10.11118/actaun201260040267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
32
|
Qiu J, Ekanayake J, Gunarathne T, Choi JY, Bae SH, Ruan Y, Ekanayake S, Wu S, Beason S, Fox G, Rho M, Tang H. Data Intensive Computing for Bioinformatics. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.
Collapse
Affiliation(s)
- Judy Qiu
- Indiana University - Bloomington, USA
| | | | | | | | | | - Yang Ruan
- Indiana University - Bloomington, USA
| | | | | | | | | | - Mina Rho
- Indiana University - Bloomington, USA
| | | |
Collapse
|
33
|
Xie J, Tian Y, Yin S, Zhang J, Ruan X, Qin X. Adaptive Preshuffling in Hadoop Clusters. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.procs.2013.05.422] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
34
|
ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce. LECTURE NOTES IN COMPUTER SCIENCE 2013. [DOI: 10.1007/978-3-642-37574-3_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
35
|
Abstract
We present Cloud Haskell, a domain-specific language for developing programs for a distributed computing environment. Implemented as a shallow embedding in Haskell, it provides a message-passing communication model, inspired by Erlang, without introducing incompatibility with Haskell's established shared-memory concurrency. A key contribution is a method for serializing function closures for transmission across the network. Cloud Haskell has been implemented; we present example code and some preliminary performance measurements.
Collapse
Affiliation(s)
- Jeff Epstein
- University of Cambridge, Cambridge, United Kingdom
| | - Andrew P. Black
- Portland State University & Microsoft Research, Portland, OR, USA
| | | |
Collapse
|
36
|
Aksanli B, Venkatesh J, Zhang L, Rosing T. Utilizing green energy prediction to schedule mixed batch and service jobs in data centers. ACTA ACUST UNITED AC 2012. [DOI: 10.1145/2094091.2094105] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
As brown energy costs grow, renewable energy becomes more widely used. Previous work focused on using immediately available green energy to supplement the non-renewable, or brown energy at the cost of canceling and rescheduling jobs whenever the green energy availability is too low [16]. In this paper we design an adaptive data center job scheduler which utilizes short term prediction of solar and wind energy production. This enables us to scale the number of jobs to the expected energy availability, thus reducing the number of cancelled jobs by 4x and improving green energy usage efficiency by 3x over just utilizing the immediately available green energy.
Collapse
Affiliation(s)
| | | | - Liuyi Zhang
- University of California, San Diego, La Jolla, CA
| | | |
Collapse
|
37
|
|
38
|
Abstract
As our society becomes more information-driven, we have begun to amass data at an astounding and accelerating rate. At the same time, power concerns have made it difficult to bring the necessary processing power to bear on querying, processing, and understanding this data. We describe Gordon, a system architecture for data-centric applications that combines low-power processors, flash memory, and data-centric programming systems to improve performance for data-centric applications while reducing power consumption. The paper presents an exhaustive analysis of the design space of Gordon systems, focusing on the trade-offs between power, energy, and performance that Gordon must make. It analyzes the impact of flash-storage and the Gordon architecture on the performance and power efficiency of data-centric applications. It also describes a novel flash translation layer tailored to data intensive workloads and large flash storage arrays. Our data show that, using technologies available in the near future, Gordon systems can out-perform disk-based clusters by 1.5× and deliver up to 2.5× more performance per Watt.
Collapse
|
39
|
Hari P, Ko K, Koukoumidis E, Kremer U, Martonosi M, Ottoni D, Peh LS, Zhang P. SARANA: language, compiler and run-time system support for spatially aware and resource-aware mobile computing. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2008; 366:3699-3708. [PMID: 18672455 DOI: 10.1098/rsta.2008.0127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Increasingly, spatial awareness plays a central role in many distributed and mobile computing applications. Spatially aware applications rely on information about the geographical position of compute devices and their supported services in order to support novel functionality. While many spatial application drivers already exist in mobile and distributed computing, very little systems research has explored how best to program these applications, to express their spatial and temporal constraints, and to allow efficient implementations on highly dynamic real-world platforms. This paper proposes the SARANA system architecture, which includes language and run-time system support for spatially aware and resource-aware applications. SARANA allows users to express spatial regions of interest, as well as trade-offs between quality of result (QoR), latency and cost. The goal is to produce applications that use resources efficiently and that can be run on diverse resource-constrained platforms ranging from laptops to personal digital assistants and to smart phones. SARANA's run-time system manages QoR and cost trade-offs dynamically by tracking resource availability and locations, brokering usage/pricing agreements and migrating programs to nodes accordingly. A resource cost model permeates the SARANA system layers, permitting users to express their resource needs and QoR expectations in units that make sense to them. Although we are still early in the system development, initial versions have been demonstrated on a nine-node system prototype.
Collapse
Affiliation(s)
- Pradip Hari
- Department of Computer Science, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | | | | | | | | | | | | | | |
Collapse
|