Benchmarking Nexus devices running AOSP master built with gcc 4.9 vs. clang 3.6

These benchmarks were run at the end of December 2014, shortly after getting clang-built AOSP to work on 64-bit devices and after the AOSP clang update to a 3.6 snapshot. Some results on 64-bit runs do not accurately benchmark the compilers' aarch64 code generators because binary-only benchmarks haven't been rebuilt with aarch64 support - so they end up running in 32-bit mode. Some bad numbers on clang's side are caused by some bits not yet working, causing timeouts. There is some room for improvement in clang results (tweaking compiler flags etc. has been done for gcc, but not yet for clang, at this point). All tests were run 3 times, numbers listed here are average numbers of 3 runs. Builds were done with the default versions of gcc and clang present in AOSP (master branch as of the end of December 2014), to generate results easily reproducible upstream. Unless indicated otherwise, compiler flags were unmodified from AOSP.

Compile time

make droidcore -j12

gcc Nexus 9

clang Nexus 9

real

112m50.341s

99m30.494s

user

700m20.836s

629m55.376s

sys

82m5.260s

67m46.403s

Binary size

ls -lR |grep -v ':$' |grep -v '^total.*' |grep -v '^[dl]' |grep -v '^$' |awk '{ print $5; }' |while read r; do S=$((S+r)); echo $S; done

gcc Nexus 9

clang Nexus 9

gcc Nexus 10

clang Nexus 10

448890009

447810775

299380121

301400856

Benchmark results (unless indicated otherwise, higher numbers are better)

gcc Nexus 9

clang Nexus 9

gcc Nexus 10

clang Nexus 10

AndEMark Native

10739

10527

6553

6497

AndEMark Java

1402

1355

873

866

BenchmarkPi (in ms, lower is better)

142

142

192

196

CaffeineMark

69045

71470

43494

42487

- Sieve

53603

66060

43156

42624

- Loop

125691

135810

61155

56940

- Logic

138207

128329

66275

63570

- String

87096

89738

39822

42174

- Float

39395

40587

35875

35302

- Method

32347

32006

27095

26676

CF-Bench

18832

18217

14982

19130

- Native MIPS

1856

1687

1700

1189

- Java MIPS

1190

1113

1222

1155

- Native MSFLOPS

929

915

1558

1498

- Java MSFLOPS

927

930

1560

1486

- Native MDFLOPS

859

818

1552

1546

- Java MDFLOPS

865

813

1459

1409

- Native MALLOCS

32288

33443

24421

190844

- Native Memory Read

9056

9833

4861

3610

- Java Memory Read

4461

4669

1564

1519

- Native Memory Write

6942

7205

2983

2940

- Java Memory Write

5659

4898

1755

1745

- Native Disk Read

1122

1192

1054

1354

- Native Disk Write

549

550

104

85

- Java Efficiency MIPS

64%

66%

98%

94%

- Java Efficiency MSFLOPS

99%

101%

100%

95%

- Java Efficiency MDFLOPS

100%

99%

94%

88%

- Java Efficiency Memory Read

49%

47%

32%

42%

- Java Efficiency Memory Write

81%

68%

58%

60%

- Native Score

23529

22963

16624

20320

- Java Score

15701

15055

13888

18338

Geekbench 2

3844

3646

2605

2596

- Integer

3313

3266

1651

1645

- Floating Point

4108

4025

3861

3766

- Memory

4738

3954

2842

2956

- Stream

3043

3038

1081

1212

Geekbench 3 (ST/MT)

1695/2909

1723/2872

901/1615

891/1559

- Integer

2007/3882

2096/3849

903/1836

894/1806

- Floating Point

1134/2236

1119/2168

812/1614

785/1543

- Memory

2193/2312

2186/2326

1078/1176

1101/1101

Quadrant Pro

14050

14011

7797

7848

- CPU

43140

43556

26706

26033

- Memory

9878

9520

6992

8048

- I/O

14367

14167

3025

2882

- 2D

420

367

168

202

- 3D

2444

2446

2095

2077

Smartbench 2012 Productivity/Gaming

3713/4447

5659/4422

3870/2825

3861/2771

SQLite bench

19279.155

15306.15

6826.28

7501.072

- Insert 200

140.944

136.548

117.925

110.497

- Insert 15000 TA

35460.993

26674.28

12274.959

13786.765

- Update 500

138.16

150.03

146.413

144.342

- Update 15000 TA

29821.074

25439.82

11820.331

12798.635

- Select 15000

11645.963

10579.91

8130.081

7285.09

- Delete 200

155.642

123.39

143.885

367.647

- Delete 15000 TA

35545.024

28734.5

8324.084

13404.826

Vellamo 1.0.6

839

832

1210

1215

Vellamo 3.1 WebView

5222

2548

2587

- Deep Crossfader

1179

306

323

- Kruptein

229

151

142

- Image Re-focus

246

279

269

- Pixel Blender

520

251

253

- Aquarium Canvas

588

225

229

- CSS 3D Fish

346

101

99

- WebGL Jellyfish

151

0

0

- DOM Node Surfer

338

201

204

- Surf Wax Binder

529

202

205

- See the Sun

374

247

230

- Ocean Scroller

124

64

60

- Page Load Performance

197

204

206

- Text Reflo

0

0

0

- SunSpider 1.0.2

147

148

150

Vellamo 3.1 Browser

6086

2892

2971

- Deep Crossfader

1186

316

345

- Kruptein

232

153

140

- Image Re-focus

256

283

288

- Pixel Blender

601

246

228

- Aquarium Canvas

563

287

306

- CSS 3D Fish

358

136

132

- WebGL Jellyfish

196

0

0

- DOM Node Surfer

348

201

196

- Surf Wax Binder

581

199

202

- See the Sun

393

227

229

- Ocean Scroller

122

79

133

- Page Load Performance

198

193

185

- Text Reflo

591

208

209

- SunSpider 1.0.2

151

139

156

- Octane v1

310

224

218

Vellamo 3.1 Multicore

'

1339

1353

- MT Linpack native

'

138

122

- MT Linpack Java

'

96

104

- MT Stream 5.10

'

260

255

- Membench

'

187

194

- Sysbench

'

216

225

- Threadbench

'

140

141

- Parsec

'

134

132

- Inter Process Communication

'

168

180

Vellamo 3.1 Metal

2797

1498

1485

- Dhrystone 2.1

710

249

245

- Linpack

449

255

259

- Branch-K

253

202

202

- Stream 5.9

607

292

297

- RamJam

514

221

215

- Storage I/O

263

279

268

Linpack Pro ST (MFLOPS/s)

662.134

665.125

183.845

181.351

Linpack Pro MT (MFLOPS/s)

884.312

843.041

218.764

224.123

Octane 2.0 in browser

8269

8732

6123

5947

AnTuTu 5.3

55897

29520

27918

- UX MT

11081

6094

5605

- UX RT

4550

3900

3739

- RAM Operation

1231

888

915

- RAM Speed

5056

3216

3093

- CPU MT integer

3026

1422

863

- CPU MT floating point

1830

1636

1663

- CPU ST integer

2886

1595

1523

- CPU ST floating point

2649

1906

1401

- GPU 2D

741

1643

1643

- GPU 3D

[2048x1536] 20065

[2560x1600] 5043

[2560x1600] 5218

- IO Storage

2082

1477

1555

- IO Database

700

700

700

Notes

  • Compared to the last (32-bit only) Clang build prior to the L release, Clang has caught up a lot in multithreaded code. Clang and gcc are roughly on par on pretty much every test. Overall gcc still has a slight edge.
  • On 64 bit systems, Clang generates slightly smaller code than gcc. (On 32 bit, the opposite is true).
  • The build system should be tweaked to make use of some more clang specific features (e.g. -mcpu=krait on a Nexus 7-2013 -- the assertion that a Krait is a Cortex-A15 made by AOSP is suboptimal, in some ways a krait is closer to a Cortex-A9). -Oz may be interesting for binary size.
  • Most Nexus 9 benchmarks are running in 32-bit mode because we can't recompile NDK components of non-free benchmarks. We need to find or write some (preferrably Open Source) benchmarks with 64-bit NDK support.

Platform/Android/GccClangBenchmark-2014-12 (last modified 2015-01-01 03:13:34)