Active memory controller
Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an...
Ausführliche Beschreibung
Autor*in: |
Fang, Zhen [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2012 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media, LLC 2012 |
---|
Übergeordnetes Werk: |
Enthalten in: The journal of supercomputing - Springer US, 1987, 62(2012), 1 vom: 17. Jan., Seite 510-549 |
---|---|
Übergeordnetes Werk: |
volume:62 ; year:2012 ; number:1 ; day:17 ; month:01 ; pages:510-549 |
Links: |
---|
DOI / URN: |
10.1007/s11227-011-0735-9 |
---|
Katalog-ID: |
OLC2033939782 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2033939782 | ||
003 | DE-627 | ||
005 | 20230504053732.0 | ||
007 | tu | ||
008 | 200819s2012 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11227-011-0735-9 |2 doi | |
035 | |a (DE-627)OLC2033939782 | ||
035 | |a (DE-He213)s11227-011-0735-9-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |a 620 |q VZ |
100 | 1 | |a Fang, Zhen |e verfasserin |4 aut | |
245 | 1 | 0 | |a Active memory controller |
264 | 1 | |c 2012 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media, LLC 2012 | ||
520 | |a Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. | ||
650 | 4 | |a Distributed shared memory | |
650 | 4 | |a Cache coherence | |
650 | 4 | |a Memory architecture | |
650 | 4 | |a Interprocessor synchronization | |
650 | 4 | |a DRAM organization | |
700 | 1 | |a Zhang, Lixin |4 aut | |
700 | 1 | |a Carter, John B. |4 aut | |
700 | 1 | |a McKee, Sally A. |4 aut | |
700 | 1 | |a Ibrahim, Ali |4 aut | |
700 | 1 | |a Parker, Michael A. |4 aut | |
700 | 1 | |a Jiang, Xiaowei |4 aut | |
773 | 0 | 8 | |i Enthalten in |t The journal of supercomputing |d Springer US, 1987 |g 62(2012), 1 vom: 17. Jan., Seite 510-549 |w (DE-627)13046466X |w (DE-600)740510-8 |w (DE-576)018667775 |x 0920-8542 |7 nnns |
773 | 1 | 8 | |g volume:62 |g year:2012 |g number:1 |g day:17 |g month:01 |g pages:510-549 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11227-011-0735-9 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_70 | ||
951 | |a AR | ||
952 | |d 62 |j 2012 |e 1 |b 17 |c 01 |h 510-549 |
author_variant |
z f zf l z lz j b c jb jbc s a m sa sam a i ai m a p ma map x j xj |
---|---|
matchkey_str |
article:09208542:2012----::cieeoyo |
hierarchy_sort_str |
2012 |
publishDate |
2012 |
allfields |
10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549 |
spelling |
10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549 |
allfields_unstemmed |
10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549 |
allfieldsGer |
10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549 |
allfieldsSound |
10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549 |
language |
English |
source |
Enthalten in The journal of supercomputing 62(2012), 1 vom: 17. Jan., Seite 510-549 volume:62 year:2012 number:1 day:17 month:01 pages:510-549 |
sourceStr |
Enthalten in The journal of supercomputing 62(2012), 1 vom: 17. Jan., Seite 510-549 volume:62 year:2012 number:1 day:17 month:01 pages:510-549 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
The journal of supercomputing |
authorswithroles_txt_mv |
Fang, Zhen @@aut@@ Zhang, Lixin @@aut@@ Carter, John B. @@aut@@ McKee, Sally A. @@aut@@ Ibrahim, Ali @@aut@@ Parker, Michael A. @@aut@@ Jiang, Xiaowei @@aut@@ |
publishDateDaySort_date |
2012-01-17T00:00:00Z |
hierarchy_top_id |
13046466X |
dewey-sort |
14 |
id |
OLC2033939782 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2033939782</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053732.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-011-0735-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2033939782</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-011-0735-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Fang, Zhen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Active memory controller</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed shared memory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache coherence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interprocessor synchronization</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">DRAM organization</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Lixin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Carter, John B.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">McKee, Sally A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ibrahim, Ali</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Parker, Michael A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jiang, Xiaowei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">62(2012), 1 vom: 17. Jan., Seite 510-549</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:1</subfield><subfield code="g">day:17</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:510-549</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-011-0735-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2012</subfield><subfield code="e">1</subfield><subfield code="b">17</subfield><subfield code="c">01</subfield><subfield code="h">510-549</subfield></datafield></record></collection>
|
author |
Fang, Zhen |
spellingShingle |
Fang, Zhen ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization Active memory controller |
authorStr |
Fang, Zhen |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)13046466X |
format |
Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut aut aut aut aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0920-8542 |
topic_title |
004 620 VZ Active memory controller Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization |
topic |
ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization |
topic_unstemmed |
ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization |
topic_browse |
ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
The journal of supercomputing |
hierarchy_parent_id |
13046466X |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering |
hierarchy_top_title |
The journal of supercomputing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 |
title |
Active memory controller |
ctrlnum |
(DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p |
title_full |
Active memory controller |
author_sort |
Fang, Zhen |
journal |
The journal of supercomputing |
journalStr |
The journal of supercomputing |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2012 |
contenttype_str_mv |
txt |
container_start_page |
510 |
author_browse |
Fang, Zhen Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei |
container_volume |
62 |
class |
004 620 VZ |
format_se |
Aufsätze |
author-letter |
Fang, Zhen |
doi_str_mv |
10.1007/s11227-011-0735-9 |
dewey-full |
004 620 |
title_sort |
active memory controller |
title_auth |
Active memory controller |
abstract |
Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012 |
abstractGer |
Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012 |
abstract_unstemmed |
Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 |
container_issue |
1 |
title_short |
Active memory controller |
url |
https://doi.org/10.1007/s11227-011-0735-9 |
remote_bool |
false |
author2 |
Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei |
author2Str |
Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei |
ppnlink |
13046466X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11227-011-0735-9 |
up_date |
2024-07-03T18:59:12.239Z |
_version_ |
1803585482642161665 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2033939782</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053732.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-011-0735-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2033939782</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-011-0735-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Fang, Zhen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Active memory controller</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed shared memory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache coherence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interprocessor synchronization</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">DRAM organization</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Lixin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Carter, John B.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">McKee, Sally A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ibrahim, Ali</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Parker, Michael A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jiang, Xiaowei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">62(2012), 1 vom: 17. Jan., Seite 510-549</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:1</subfield><subfield code="g">day:17</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:510-549</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-011-0735-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2012</subfield><subfield code="e">1</subfield><subfield code="b">17</subfield><subfield code="c">01</subfield><subfield code="h">510-549</subfield></datafield></record></collection>
|
score |
7.400646 |