Java Concurrency and Performance Training
Java Concurrency and Performance
- With the advent of multi-core processors the usage of single threaded programs is soon becoming obsolete. Java was built to be able to do many things at once. In computer lingo, we call this "concurrency". This is the main reason why Java is so useful. Today we see a lot of our applications running on multiple cores, concurrent java programs with multiple threads is the answer for effective performance and stability on multi-core based applications. Concurrency is among the utmost worries for newcomers to Java programming but there's no reason to let it deter you. Not only is excellent documentation available but also pictorial representations of each topic to make understanding much graceful and enhanced. Java threads have become easier to work with as the Java platform has evolved. In order to learn how to do multithreaded programming in Java 6 and 7, you need some building blocks. Our training expert with his rich training and consulting experience illustrates with real application based case studies. Intended Audience:
- The target group is programmers who want to know foundations of concurrent programming and existing concurrent programming environments, in order, now or in future, to develop multithreaded applications for multi-core processors and shared memory multiprocessors
- Dealing with threads and collections on a multi-core/ multiprocessor. To quickly identify the root cause of poor performance in your applications. Eliminate conditions that will prevent you from finding performance bottlenecks. JDK 5, 6, 7 which have features to harness the power of the underlying hardware.
- Basic knowledge of Java (introductory course or equivalent practical experience).
- This is an instructor led course provides lecture topics and the practical application of JEE5.0 and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design
- Producer Consumer(Basic Hand-Off) ( Day 1 )
- Why wait-notify require Synchronization
- notifyAll used as work around
- Structural modification to hidden queue by wait-notify
- locking handling done by OS use cases for notify-notifyAll
- Hidden queue
- design issues with synchronization
- Common Issues with thread
- Uncaught Exception Handler
- problem with stop
- Dealing with InterruptedStatus
- Real Meaning and effect of synchronization
- Volatile
- Sequential Consistency would disallow common optimizations
- The changes in JMM
- Final
- Shortcomings of the original JMM
- Finals not really final
- Prevents effective compiler optimizations
- Processor executes operations out of order
- Compiler is free to reorder certain instructions
- Cache reorders writes
- Old JMM surprising and confusing
- Instruction Reordering
- What is the limit of reordering
- Programmatic Control
- super-scalar processors
- heavily pipelines processors
- As-if-serial-semantics
- Why is reordering done
- Cache Coherency
- Write-back Caching explained
- What is cache Coherence.
- How does it effect java programs
- Software based Cache Coherency
- NUMA(Non uniform memory access)
- Caching explained
- Cache incoherency
- New JMM and goals of JSR-133
- Simple,intuitive and, feasible
- Out-of-thin-air safety
- High performance JVM implementations across architectures
- Minimal impact on existing code
- Initialization safety
- Preserve existing safety guarantees and type-safety
- safe Construction techniques
- Thread Local Storage
- Thread safety levels
- UnSafe Construction techniques
- CAS
- Wait-free Queue implementation
- Optimistic Design
- Wait-free Stack implementation
- Hardware based locking
- ABA problem
- Markable reference
- weakCompareAndSet
- Stamped reference
Reentrant Lock
- ReentrantReadWriteLock
- ReentrantLock
- Lock Striping
- Lock Striping on LinkNodes
- Lock Striping on table
- Indentifying scalability bottlenecks in java.util.Collection
- segregating them based on Thread safety levels
- Lock Implementation
- Multiple user conditions and wait queues
- Lock Polling techniques
- Based on CAS
- Design issues with synchronization
Highly Concurrent Data Structures-Part1 ( Day 2 )
- Weakly Consistent Iterators vs Fail Fast Iterators
- ConcurrentHashMap
- Structure
- remove/put/resize lock
- Almost immutability
- Using volatile to detect interference
- Read does not block in common code path
- Atomicity
- Confinement
- Immutability
- Visibility
- Almost Immutability
- Restructuring and refactoring
- Thread confinement
- Stack confinement
- ThreadLocal
- Unshared objects are safe
- Ad-hoc thread confinement
- Synchronization and visibility
- Non-atomic 64-bit numeric operations
- Problems that state data can cause
- Volatile vs synchronized
- Single-threaded write safety
- Volatile flushing
- Making fields visible with volatile
- Reason why changes are not visible
- Definition of immutable
- Immutable is always thread safe
- Immutable containing mutable object
- Final fields
- Making objects and their state visible
- Safe publication idioms
- How to share objects safely
- "Effectively" immutable objects
- Publishing objects to alien methods
- Publishing objects as method returns
- Implicit links to outer class
- Ways we might let object escape
- Publishing objects via fields
- Instance confinement
- Split locks
- Example of fleet management
- Java monitor pattern
- Lock confinement
- Encapsulation
- How instance confinement is good
- State guarded by private fields
- Examples from the JDK
- Documentation checklist
- What should be documented
- Synchronization policies
- Interpreting vague documentation
- Benefits of reuse
- Using composition to add functionality
- Subclassing to add functionality
- Modifying existing code
- Client-side locking
- Pre-condition
- Thread-safe counter with invariant
- Primitive vs object fields
- Encapsulation
- Post-conditions
- Waiting for pre-condition to become true
- Independent fields
- Publishing underlying fields
- Delegating safety to ConcurrentMap
- Invariables and delegation
- Using thread safe components
- Delegation with vehicle tracker
- Semaphore
- Latches
- SynchronousQueue
- Future
- Exchanger
- Synchronous Queue Framework
- Mutex
- Barrier
- Finding exploitable parallelism
- Callable controlling lifecycle
- CompletionService
- Limitations of parallelizing heterogeneous tasks
- Callable and Future
- Time limited tasks
- Example showing page renderer with future
- Sequential vs parallel
- Breaking up a single client request
- Memory leaks with ThreadLocal
- Delayed and periodic tasks
- Thread pool structure
- Motivation for using Executor
- Executor lifecycle, state machine
- Difference between java.util.Timer and ScheduledExecutor
- ThreadPoolExecutor
- Decoupling task submission from execution
- Shutdown() vs ShutdownNow()
- Executor interface
- Thread pool benefits
- Standard ExecutorService configurations
- Various sizing options for number of threads and queue length
- In which order? (FIFO, LIFO, by priority)
- Who will execute it?
- Disadvantage of unbounded thread creation
- Single-threaded vs multi-threaded
- Explicitely creating tasks
- Indepence of tasks
- Identifying tasks
- Task boundaries
- Stopping a thread-based service
- Graceful shutdown
- ExecutorService shutdown
- Providing lifecycle methods
- Asynchronous logging caveats
- Example: A logging service
- Poison pills
- One-shot execution service
- Cancellation policies
- Using flags to signal cancellation
- Reasons for wanting to cancel a task
- Cooperative vs preemptive cancellation
- Origins of interruptions
- WAITING state of thread
- How does interrupt work?
- Methods that put thread in WAITING state
- Policies in dealing with InterruptedException
- Thread.interrupted() method
- Interrupting locks
- Reactions of IO libraries to interrupts
- Letting the method throw the exception
- Saving the interrupt for later
- Ignoring the interrupt status
- Restoring the interrupt and exiting
- Task vs Thread
- Different meanings of interrupt
- Preserving the interrupt status
- Telling a long run to eventually give up
- Canceling busy jobs
- Using UncaughtExceptionHandler
- Dealing with exceptions in Swing
- ThreadGroup for uncaught exceptions
- Shutdown hooks
- Orderly shutdown
- Daemon threads
- Finalizers
- Abrupt shutdown
- Configuring ThreadPoolExecutor
- Thread factories
- corePoolSize
- Customizing thread pool executor after construction
- Using default Executors.new* methods
- Managing queued tasks
- maximumPoolSize
- keepAliveTime
- PriorityBlockingQueue
- Discard
- Caller runs
- Abort
- Discard oldest
- Examples of various pool sizes
- Determining the maximum allowed threads on your operating system
- CPU-intensiv vs IO-intensive task sizing
- Danger of hardcoding worker number
- Problems when pool is too large or small
- Formula for calculating how many threads to use
- Mixing different types of tasks
- Long-running tasks
- Homogenous, independent and thread-agnostic tasks
- Thread starvation deadlock
- terminate
- Using hooks for extension
- afterExecute
- beforeExecute
- Using Fork/Join to execute tasks
- Converting sequential tasks to parallel
- Avoiding Liveness Hazards
- Other liveness hazards
- Poor responsiveness
- Livelock
- ReadWriteLock in Java 5 vs Java 6
- Detecting thread starvation
- Adding a sleep to cause deadlocks
- "TryLock" with synchronized
- Using open calls
- Verifying thread deadlocks
- Avoiding multiple locks
- Timed lock attempts
- Stopping deadlock victims
- DeadlockArbitrator
- Deadlock analysis with thread dumps
- Unit testing for lock ordering deadlocks
- Thread-starvation deadlocks
- Discovering deadlocks
- Checking whether locks are held
- Resource deadlocks
- The drinking philosophers
- Lock-ordering deadlocks
- Defining a global ordering
- Resolving deadlocks
- Causing a deadlock amongst philosophers
- Deadlock between cooperating objects
- Imposing a natural order
- Dynamic lock order deadlocks
- Defining order on dynamic locks
- Open calls and alien methods
- Example in Vector
- Thinking about performance
- Mistakes in traditional performance optimizations
- 2-tier vs multi-tier
- Evaluating performance tradeoffs
- Performance vs scalability
- Effects of serial sections and locking
- How fast vs how much
- How to monitor CPU utilization
- Performance comparisons
- ReadWriteLock
- Using CopyOnWrite collections
- Immutable objects
- Atomic fields
- Using ConcurrentHashMap
- Narrowing lock scope
- Avoiding "hot fields"
- Hotspot options for lock performance
- Reasons why CPUs might not be loaded
- How to find "hot locks"
- Lock splitting
- Dangers of object pooling
- Safety first!
- Reducing lock granularity
- Exclusive locks
- In ConcurrentHashMap
- In ConcurrentLinkedQueue
- Formula for Amdahl's Law
- Problems with Amdahl's law in practice
- Applying Little's Law in practice
- Utilization according to Amdahl
- Maximum useful cores
- How threading relates to Little's Law
- Formula for Little's Law
- Context switching
- Locking and unlocking
- Cache invalidation
- Spinning before actual blocking
- Lock elision
- Memory barriers
- Escape analysis and uncontended locks
- Lock and ReentrantLock
- Using try-finally
- Memory visibility semantics
- Using try-lock to avoid deadlocks
- tryLock and timed locks
- Interruptible locking
- Non-block-structured locking
- ReentrantLock implementation
- Using the explicit lock
- Memory semantics
- Prefer synchronized
- Ease of use
- Heavily contended locks
- Java 5 vs Java 6 performance
- Throughput on contended locks
- Uncontended performance
- Standard non-fair mechanisms
- Throughput of fair locks
- Round-robin by OS
- Barging
- Fair explicit locks in Java
- ReadWriteLock interface
- Understanding system to avoid starvation
- Release preference
- Downgrading
- Reader barging
- Upgrading
- Reentrancy
- Explicit condition objects
- Condition interface
- Timed conditions
- Benefits of explicit condition queues
- Basis for other synchronizers
- Exceptions on pre-condition fails
- Structure of blocking state-dependent actions
- Crude blocking by polling and sleeping
- Example using bounded queues
- Single-threaded vs multi-threaded
- With intrinsic locks
- Waking up too soon
- Conditional waits
- Condition queue
- Encapsulating condition queues
- State-dependence
- notify() vs notifyAll()
- Condition predicate
- Lock
- Waiting for a specific timeout
- InterruptedException
- Hardware support for concurrency
- Using "Unsafe" to access memory directly
- CAS support in the JVM
- Compare-and-Set
- Performance advantage of padding
- Nonblocking counter
- Simulation of CAS
- Managing conflicts with CAS
- Compare-and-Swap (CAS)
- Shared cache lines
- Optimistic locking
- Optimistic locking classes
- How do atomics work?
- Atomic array classes
- Performance comparisons: Locks vs atomics
- Cost of atomic spin loops
- Very fast when not too much contention
- Types of atomic classes
- Priority inversion
- Elimination of uncontended intrinsic locks
- Volatile vs locking performance
- Scalability problems with lock-based algorithms
- Atomic field updaters
- Doing speculative work
- AtomicStampedReference
- Nonblocking stack
- Definition of nonblocking and lock-free
- Highly scalable hash table
- The ABA problem
- Using sun.misc.Unsafe
- Dangers
- Reasons why we need it
- Fork -join decomposition
- Fork and Join
- ParallelArray
- Divide and conquer
- Hardware shapes programming idiom
- Exposing fine grained parallelism
- Anatomy of Fork and Join
- Limitations
- Work Stealing
- Amdahl's Law
- cache controller
- write
- Direct mapped
- read
- Address mapping in cache
- NUMA
- UMA
- Concurrent Stack
- Harsh Realities of parallelism
- Parallel Programming
- Sequential Consistency
- Linearizability
- Concurrency and Correctness
- Progress Conditions
- Quiescent Consistency
- Lazy Synchronization
- Lock free Synchronization
- Optimistic Synchronization
- Fine grained Synchronization
- Heap Based Unbounded Priority Queue
- Skiplist based Unbounded priority Queue
- Array Based bounded Priority Queue
- Tree based Bounded Priority Queue
- Coarse Grained Synchronization
- Lazy Synchronization
- Optimistic Synchronization
- Non Blocking Synchronization
- Fine Grained Synchronization
- Spinlocks
- Lock suitable for NUMA systems
- Unbounded lock-free Queue
- Bounded Partial Queue
- Unbounded Total Queue
- Concurrent Hashing
- Open Address Hashing
- Closed Address Hashing
- Lock Free Hashing
- CopyOnWriteArray(List/Set)
- For systems with more than 100 cpus/cores
- State based Reasoning
- all CAS spin loop bounded
- Constant Time key-value mapping
- faster than ConcurrentHashMap
- no locks even during resize
- Queue
- BlockingQueue
- Deque
- BlockingDeque
- WorkStealing using Deques
- GC unlinking
- Michael and Scott algorithm
- Tails and heads are allowed to lag
- Support for interior removals
- Relaxed writes
- Same as ConcurrentLinkedQueue except bidirectional pointers
- Internal removes handled differently
- Heuristics based spinning/blocking on number of processors
- Behavior differs based on method calls
- Usual ConcurrentLinkedQueue optimizations
- Normal and Dual Queue
- Lock free Skiplist
- Sequential Skiplist
- Lock based Concurrent Skiplist
- Indexes are allowed to race
- Iteration
- Problems with AtomicMarkableReference
- Probabilistic Data Structure
- Marking and nulling
- Different Way to mark