Java Concurrency and Performance Training in Pune India

Home

Java Concurrency and Performance

With the advent of multi-core processors the usage of single threaded programs is soon becoming obsolete. Java was built to be able to do many things at once. In computer lingo, we call this "concurrency". This is the main reason why Java is so useful. Today we see a lot of our applications running on multiple cores, concurrent java programs with multiple threads is the answer for effective performance and stability on multi-core based applications. Concurrency is among the utmost worries for newcomers to Java programming but there's no reason to let it deter you. Not only is excellent documentation available but also pictorial representations of each topic to make understanding much graceful and enhanced. Java threads have become easier to work with as the Java platform has evolved. In order to learn how to do multithreaded programming in Java 6 and 7, you need some building blocks. Our training expert with his rich training and consulting experience illustrates with real application based case studies.

Intended Audience:

The target group is programmers who want to know foundations of concurrent programming and existing concurrent programming environments, in order, now or in future, to develop multithreaded applications for multi-core processors and shared memory multiprocessors

Key Skills:

Dealing with threads and collections on a multi-core/ multiprocessor. To quickly identify the root cause of poor performance in your applications. Eliminate conditions that will prevent you from finding performance bottlenecks. JDK 5, 6, 7 which have features to harness the power of the underlying hardware.

Prerequisites:

Basic knowledge of Java (introductory course or equivalent practical experience).

Instructional Method:

This is an instructor led course provides lecture topics and the practical application of JEE5.0 and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design

Producer Consumer(Basic Hand-Off) ( Day 1 )
Why wait-notify require Synchronization
- notifyAll used as work around
- Structural modification to hidden queue by wait-notify
- locking handling done by OS use cases for notify-notifyAll
- Hidden queue
- design issues with synchronization
- Common Issues with thread
  - Uncaught Exception Handler
  - problem with stop
  - Dealing with InterruptedStatus

Java Memory Model(JMM)

Real Meaning and effect of synchronization
Volatile
Sequential Consistency would disallow common optimizations
The changes in JMM
Final
Shortcomings of the original JMM
- Finals not really final
- Prevents effective compiler optimizations
- Processor executes operations out of order
- Compiler is free to reorder certain instructions
- Cache reorders writes
- Old JMM surprising and confusing
Instruction Reordering
- What is the limit of reordering
- Programmatic Control
- super-scalar processors
- heavily pipelines processors
- As-if-serial-semantics
- Why is reordering done
Cache Coherency
- Write-back Caching explained
- What is cache Coherence.
- How does it effect java programs
- Software based Cache Coherency
- NUMA(Non uniform memory access)
- Caching explained
- Cache incoherency
New JMM and goals of JSR-133
- Simple,intuitive and, feasible
- Out-of-thin-air safety
- High performance JVM implementations across architectures
- Minimal impact on existing code
- Initialization safety
- Preserve existing safety guarantees and type-safety

Applied Threading techniques

safe Construction techniques
Thread Local Storage
Thread safety levels
UnSafe Construction techniques

Building Blocks for Highly Concurrent Design

CAS
- Wait-free Queue implementation
- Optimistic Design
- Wait-free Stack implementation
- Hardware based locking
- ABA problem
- Markable reference
- weakCompareAndSet
- Stamped reference

Reentrant Lock

ReentrantReadWriteLock
ReentrantLock

Lock Striping
- Lock Striping on LinkNodes
- Lock Striping on table
- Indentifying scalability bottlenecks in java.util.Collection
- segregating them based on Thread safety levels
- Lock Implementation
- Multiple user conditions and wait queues
- Lock Polling techniques
- Based on CAS
- Design issues with synchronization

Highly Concurrent Data Structures-Part1 ( Day 2 )

Weakly Consistent Iterators vs Fail Fast Iterators
ConcurrentHashMap
Structure
remove/put/resize lock
Almost immutability
Using volatile to detect interference
Read does not block in common code path

Designing For Concurrency

Atomicity
Confinement
Immutability
Visibility
Almost Immutability
Restructuring and refactoring

Sharing Objects

Thread confinement
Stack confinement
ThreadLocal
Unshared objects are safe
Ad-hoc thread confinement

Visibility

Synchronization and visibility
Non-atomic 64-bit numeric operations
Problems that state data can cause
Volatile vs synchronized
Single-threaded write safety
Volatile flushing
Making fields visible with volatile
Reason why changes are not visible

Immutability

Definition of immutable
Immutable is always thread safe
Immutable containing mutable object
Final fields

Safe publication

Making objects and their state visible
Safe publication idioms
How to share objects safely
"Effectively" immutable objects

Publication and escape

Publishing objects to alien methods
Publishing objects as method returns
Implicit links to outer class
Ways we might let object escape
Publishing objects via fields

Composing Objects

Instance confinement
Split locks
Example of fleet management
Java monitor pattern
Lock confinement
Encapsulation
How instance confinement is good
State guarded by private fields

Documenting synchronization policies

Examples from the JDK
Documentation checklist
What should be documented
Synchronization policies
Interpreting vague documentation

Adding functionality to existing thread-safe classes

Benefits of reuse
Using composition to add functionality
Subclassing to add functionality
Modifying existing code
Client-side locking

Designing a thread-safe class

Pre-condition
Thread-safe counter with invariant
Primitive vs object fields
Encapsulation
Post-conditions
Waiting for pre-condition to become true

Delegating thread safety

Independent fields
Publishing underlying fields
Delegating safety to ConcurrentMap
Invariables and delegation
Using thread safe components
Delegation with vehicle tracker

Canned Synchronizers

Semaphore
Latches
SynchronousQueue
Future
Exchanger
Synchronous Queue Framework
Mutex
Barrier

Structuring Concurrent Applications

Finding exploitable parallelism
Callable controlling lifecycle
CompletionService
Limitations of parallelizing heterogeneous tasks
Callable and Future
Time limited tasks
Example showing page renderer with future
Sequential vs parallel
Breaking up a single client request

The Executor framework

Memory leaks with ThreadLocal
Delayed and periodic tasks
Thread pool structure
Motivation for using Executor
Executor lifecycle, state machine
Difference between java.util.Timer and ScheduledExecutor
ThreadPoolExecutor
Decoupling task submission from execution
Shutdown() vs ShutdownNow()
Executor interface
Thread pool benefits
Standard ExecutorService configurations

Execution policies

Various sizing options for number of threads and queue length
In which order? (FIFO, LIFO, by priority)
Who will execute it?

xecuting tasks in threads

Disadvantage of unbounded thread creation
Single-threaded vs multi-threaded
Explicitely creating tasks
Indepence of tasks
Identifying tasks
Task boundaries

Cancellation and Shutdown ( Day 3 )

Stopping a thread-based service
Graceful shutdown
ExecutorService shutdown
Providing lifecycle methods
Asynchronous logging caveats
Example: A logging service
Poison pills
One-shot execution service

Task cancellation

Cancellation policies
Using flags to signal cancellation
Reasons for wanting to cancel a task
Cooperative vs preemptive cancellation

Interruption

Origins of interruptions
WAITING state of thread
How does interrupt work?
Methods that put thread in WAITING state
Policies in dealing with InterruptedException
Thread.interrupted() method

Dealing with non-interruptible blocking

Interrupting locks
Reactions of IO libraries to interrupts

Responding to interruption

Letting the method throw the exception
Saving the interrupt for later
Ignoring the interrupt status
Restoring the interrupt and exiting

Interruption policies

Task vs Thread
Different meanings of interrupt
Preserving the interrupt status

Example: timed run

Telling a long run to eventually give up
Canceling busy jobs

Handling abnormal thread termination

Using UncaughtExceptionHandler
Dealing with exceptions in Swing
ThreadGroup for uncaught exceptions

JVM shutdown

Shutdown hooks
Orderly shutdown
Daemon threads
Finalizers
Abrupt shutdown

Applying Thread Pools

Configuring ThreadPoolExecutor
Thread factories
corePoolSize
Customizing thread pool executor after construction
Using default Executors.new* methods
Managing queued tasks
maximumPoolSize
keepAliveTime
PriorityBlockingQueue

Saturation policies

Discard
Caller runs
Abort
Discard oldest

Sizing thread pools

Examples of various pool sizes
Determining the maximum allowed threads on your operating system
CPU-intensiv vs IO-intensive task sizing
Danger of hardcoding worker number
Problems when pool is too large or small
Formula for calculating how many threads to use
Mixing different types of tasks

Tasks and Execution Policies

Long-running tasks
Homogenous, independent and thread-agnostic tasks
Thread starvation deadlock

Extending ThreadPoolExecutor

terminate
Using hooks for extension
afterExecute
beforeExecute

Parallelizing recursive algorithms

Using Fork/Join to execute tasks
Converting sequential tasks to parallel

Liveness, Performance, and Testing

Avoiding Liveness Hazards
Other liveness hazards
Poor responsiveness
Livelock

Starvation

ReadWriteLock in Java 5 vs Java 6
Detecting thread starvation

Avoiding and diagnosing deadlocks

Adding a sleep to cause deadlocks
"TryLock" with synchronized
Using open calls
Verifying thread deadlocks
Avoiding multiple locks
Timed lock attempts
Stopping deadlock victims
DeadlockArbitrator
Deadlock analysis with thread dumps
Unit testing for lock ordering deadlocks

Deadlock

Thread-starvation deadlocks
Discovering deadlocks
Checking whether locks are held
Resource deadlocks
The drinking philosophers
Lock-ordering deadlocks
Defining a global ordering
Resolving deadlocks
Causing a deadlock amongst philosophers
Deadlock between cooperating objects
Imposing a natural order
Dynamic lock order deadlocks
Defining order on dynamic locks
Open calls and alien methods
Example in Vector

Performance and Scalability

Thinking about performance
Mistakes in traditional performance optimizations
2-tier vs multi-tier
Evaluating performance tradeoffs
Performance vs scalability
Effects of serial sections and locking
How fast vs how much

Reducing lock contention

How to monitor CPU utilization
Performance comparisons
ReadWriteLock
Using CopyOnWrite collections
Immutable objects
Atomic fields
Using ConcurrentHashMap
Narrowing lock scope
Avoiding "hot fields"
Hotspot options for lock performance
Reasons why CPUs might not be loaded
How to find "hot locks"
Lock splitting
Dangers of object pooling
Safety first!
Reducing lock granularity
Exclusive locks

Lock striping

In ConcurrentHashMap
In ConcurrentLinkedQueue

Amdahl's and Little's laws

Formula for Amdahl's Law
Problems with Amdahl's law in practice
Applying Little's Law in practice
Utilization according to Amdahl
Maximum useful cores
How threading relates to Little's Law
Formula for Little's Law

Costs introduced by threads

Context switching
Locking and unlocking
Cache invalidation
Spinning before actual blocking
Lock elision
Memory barriers
Escape analysis and uncontended locks

Explicit Locks

Lock and ReentrantLock
Using try-finally
Memory visibility semantics
Using try-lock to avoid deadlocks
tryLock and timed locks
Interruptible locking
Non-block-structured locking
ReentrantLock implementation
Using the explicit lock

Synchronized vs ReentrantLock

Memory semantics
Prefer synchronized
Ease of use

Performance considerations

Heavily contended locks
Java 5 vs Java 6 performance
Throughput on contended locks
Uncontended performance

Fairness

Standard non-fair mechanisms
Throughput of fair locks
Round-robin by OS
Barging
Fair explicit locks in Java

Read-write locks

ReadWriteLock interface
Understanding system to avoid starvation

ReadWriteLock implementation options

Release preference
Downgrading
Reader barging
Upgrading
Reentrancy

Building Custom Synchronizers ( Day 4 )

Explicit condition objects
Condition interface
Timed conditions
Benefits of explicit condition queues

AbstractQueuedSynchronizer (AQS)

Basis for other synchronizers

Managing state dependence

Exceptions on pre-condition fails
Structure of blocking state-dependent actions
Crude blocking by polling and sleeping
Example using bounded queues
Single-threaded vs multi-threaded

Introducing condition queues

With intrinsic locks

Using condition queues

Waking up too soon
Conditional waits
Condition queue
Encapsulating condition queues
State-dependence
notify() vs notifyAll()
Condition predicate
Lock
Waiting for a specific timeout

Missed signals

InterruptedException

Atomic Variables and Nonblocking Synchronization

Hardware support for concurrency
Using "Unsafe" to access memory directly
CAS support in the JVM
Compare-and-Set
Performance advantage of padding
Nonblocking counter
Simulation of CAS
Managing conflicts with CAS
Compare-and-Swap (CAS)
Shared cache lines
Optimistic locking

Atomic variable classes

Optimistic locking classes
How do atomics work?
Atomic array classes
Performance comparisons: Locks vs atomics
Cost of atomic spin loops
Very fast when not too much contention
Types of atomic classes

Disadvantages of locking

Priority inversion
Elimination of uncontended intrinsic locks
Volatile vs locking performance

Nonblocking algorithms

Scalability problems with lock-based algorithms
Atomic field updaters
Doing speculative work
AtomicStampedReference
Nonblocking stack
Definition of nonblocking and lock-free
Highly scalable hash table
The ABA problem
Using sun.misc.Unsafe
Dangers
Reasons why we need it

Fork and Join Framework

Fork -join decomposition
Fork and Join
ParallelArray
Divide and conquer
Hardware shapes programming idiom
Exposing fine grained parallelism
Anatomy of Fork and Join
Limitations
Work Stealing

Crash course in Mordern hardware

Amdahl's Law

Cache

cache controller
write
Direct mapped
read
Address mapping in cache

Memory Architectures

NUMA
UMA

Designing for multi-core/processor environment

Concurrent Stack
Harsh Realities of parallelism
Parallel Programming

Concurrent Objects

Sequential Consistency
Linearizability
Concurrency and Correctness
Progress Conditions
Quiescent Consistency

Concurrency Patterns

Lazy Synchronization
Lock free Synchronization
Optimistic Synchronization
Fine grained Synchronization

Priority Queues

Heap Based Unbounded Priority Queue
Skiplist based Unbounded priority Queue
Array Based bounded Priority Queue
Tree based Bounded Priority Queue

Lists

Coarse Grained Synchronization
Lazy Synchronization
Optimistic Synchronization
Non Blocking Synchronization
Fine Grained Synchronization

Skiplists

Spinlocks

Lock suitable for NUMA systems

Concurrent Queues

Unbounded lock-free Queue
Bounded Partial Queue
Unbounded Total Queue
Concurrent Hashing
Open Address Hashing
Closed Address Hashing
Lock Free Hashing

Highly Concurrent Data Structures-Part2

CopyOnWriteArray(List/Set)

NonBlockingHashMap

For systems with more than 100 cpus/cores
State based Reasoning
all CAS spin loop bounded
Constant Time key-value mapping
faster than ConcurrentHashMap
no locks even during resize

Queue interfaces

Queue
BlockingQueue
Deque
BlockingDeque

Queue Implementations

ArrayDeque and ArrayBlockingDeque

WorkStealing using Deques

LinkedBlockingQueue

LinkedBlockingDeque

ConcurrentLinkedQueue

GC unlinking
Michael and Scott algorithm
Tails and heads are allowed to lag
Support for interior removals
Relaxed writes

ConcurrentLinkedDeque

Same as ConcurrentLinkedQueue except bidirectional pointers

LinkedTransferQueue

Internal removes handled differently
Heuristics based spinning/blocking on number of processors
Behavior differs based on method calls
Usual ConcurrentLinkedQueue optimizations
Normal and Dual Queue

Skiplist

Lock free Skiplist
Sequential Skiplist
Lock based Concurrent Skiplist

ConcurrentSkipListMap(and Set)

Indexes are allowed to race
Iteration
Problems with AtomicMarkableReference
Probabilistic Data Structure
Marking and nulling
Different Way to mark

Java Concurrency and Performance Training

Anika Technologies

Technology Consulting

Training Courses

Subscribe