Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

performance - First time a Java loop is run SLOW, why? [Sun HotSpot 1.5, sparc]

In benchmarking some Java code on a Solaris SPARC box, I noticed that the first time I call the benchmarked function it runs EXTREMELY slowly (10x difference):

  • First | 1 | 25295.979 ms
  • Second | 1 | 2256.990 ms
  • Third | 1 | 2250.575 ms

Why is this? I suspect the JIT compiler, is there any way to verify this?

Edit: In light of some answers I wanted to clarify that this code is the simplest possible test-case I could find exhibiting this behavior. So my goal isn't to get it to run fast, but to understand what's going on so I can avoid it in my real benchmarks.

Solved: Tom Hawtin correctly pointed out that my "SLOW" time was actually reasonable. Following this observation, I attached a debugger to the Java process. During the first, the inner loop looks like this:

0xf9037218:     cmp      %l0, 100
0xf903721c:     bge,pn   %icc,0xf90371f4        ! 0xf90371f4
0xf9037220:     nop
0xf9037224:     ld       [%l3 + 92], %l2
0xf9037228:     ld       [%l2 + 8], %l6
0xf903722c:     add      %l6, 1, %l5
0xf9037230:     st       %l5, [%l2 + 8]
0xf9037234:     inc      %l0
0xf9037238:     ld       [%l1], %g0
0xf903723c:     ba,pt    %icc,0xf9037218        ! 0xf9037218

On the following iterations, the loop looks like this:

0xf90377d4:     sub      %l2, %l0, %l3
0xf90377d8:     add      %l3, %l0, %l2
0xf90377dc:     add      %l2, 1, %l4
0xf90377e0:     inc      %l0
0xf90377e4:     cmp      %l0, 100
0xf90377e8:     bl,pn    %icc,0xf90377d8        ! 0xf90377d8

So HotSpot removed memory accesses from the inner loop, speeding it up by an order of magnitude.

Lesson: Do the math! I should have done Tom's calculation myself.

Benchmark Java code:

    private int counter;
    private int nThreads;

    private void measure(String tag) throws Exception {
            MyThread threads[] = new MyThread[nThreads];
            int i;

            counter = 0;

            for (i = 0; i < nThreads; i++)
                    threads[i] = new MyThread();

            long start = System.nanoTime();

            for (i = 0; i < nThreads; i++)
                    threads[i].start();

            for (i = 0; i < nThreads; i++)
                    threads[i].join();

            if (tag != null)
                    System.out.format("%-20s | %-2d | %.3f ms 
", tag, nThreads,
                                     new Double((System.nanoTime() - start) / 1000000.0));
    }
    public MyBench() {
            try {
                    this.nThreads = 1;
                    measure("First");
                    measure("Second");
                    measure("Third");
            } catch (Exception e) {
                    System.out.println("Error: " + e);
            }
    }

    private class MyThread extends Thread {
            public void run() {
                    while (counter < 10000000) {
                            // work
                            for (int j = 0; j < 100; j++)
                                    counter++;
                            counter -= 99;
                    }
            }
    }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Some ugly, unrealistic code (the stuff of microbenchmarks):

                while (counter < 10000000) {
                        // work
                        for (int j = 0; j < 100; j++)
                                counter++;
                        counter -= 99;
                }

So what is this doing and how fast should it run.

The inner loop increments counter 100 times, then the counter is decremented by 99. So an increment of 1. Note counter is a member variable of an outer class, so some overhead there. This is then run 10,000,000 times. So the inner loop is run 1,000,000,000 times.

A loop using to accessor methods, call it 25 cycles. 1,000,000,000 times at 1 GHz, gives 25s.

Hey, we predicted the SLOW time. The slow time is fast. The fast times are after the benchmark has been broken in some way - 2.5 cycles an iteration? Use -server and you might find it gets even more silly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...