🤖
hacktricks
  • 👾Welcome!
    • HackTricks
    • HackTricks Values & FAQ
    • About the author
  • 🤩Generic Methodologies & Resources
    • Pentesting Methodology
    • External Recon Methodology
      • Wide Source Code Search
      • Github Dorks & Leaks
    • Pentesting Network
      • DHCPv6
      • EIGRP Attacks
      • GLBP & HSRP Attacks
      • IDS and IPS Evasion
      • Lateral VLAN Segmentation Bypass
      • Network Protocols Explained (ESP)
      • Nmap Summary (ESP)
      • Pentesting IPv6
      • WebRTC DoS
      • Spoofing LLMNR, NBT-NS, mDNS/DNS and WPAD and Relay Attacks
      • Spoofing SSDP and UPnP Devices with EvilSSDP
    • Pentesting Wifi
      • Evil Twin EAP-TLS
    • Phishing Methodology
      • Clone a Website
      • Detecting Phishing
      • Phishing Files & Documents
    • Basic Forensic Methodology
      • Baseline Monitoring
      • Anti-Forensic Techniques
      • Docker Forensics
      • Image Acquisition & Mount
      • Linux Forensics
      • Malware Analysis
      • Memory dump analysis
        • Volatility - CheatSheet
      • Partitions/File Systems/Carving
        • File/Data Carving & Recovery Tools
      • Pcap Inspection
        • DNSCat pcap analysis
        • Suricata & Iptables cheatsheet
        • USB Keystrokes
        • Wifi Pcap Analysis
        • Wireshark tricks
      • Specific Software/File-Type Tricks
        • Decompile compiled python binaries (exe, elf) - Retreive from .pyc
        • Browser Artifacts
        • Deofuscation vbs (cscript.exe)
        • Local Cloud Storage
        • Office file analysis
        • PDF File analysis
        • PNG tricks
        • Video and Audio file analysis
        • ZIPs tricks
      • Windows Artifacts
        • Interesting Windows Registry Keys
    • Brute Force - CheatSheet
    • Python Sandbox Escape & Pyscript
      • Bypass Python sandboxes
        • LOAD_NAME / LOAD_CONST opcode OOB Read
      • Class Pollution (Python's Prototype Pollution)
      • Python Internal Read Gadgets
      • Pyscript
      • venv
      • Web Requests
      • Bruteforce hash (few chars)
      • Basic Python
    • Exfiltration
    • Tunneling and Port Forwarding
    • Threat Modeling
    • Search Exploits
    • Reverse Shells (Linux, Windows, MSFVenom)
      • MSFVenom - CheatSheet
      • Reverse Shells - Windows
      • Reverse Shells - Linux
      • Full TTYs
  • 🐧Linux Hardening
    • Checklist - Linux Privilege Escalation
    • Linux Privilege Escalation
      • Arbitrary File Write to Root
      • Cisco - vmanage
      • Containerd (ctr) Privilege Escalation
      • D-Bus Enumeration & Command Injection Privilege Escalation
      • Docker Security
        • Abusing Docker Socket for Privilege Escalation
        • AppArmor
        • AuthZ& AuthN - Docker Access Authorization Plugin
        • CGroups
        • Docker --privileged
        • Docker Breakout / Privilege Escalation
          • release_agent exploit - Relative Paths to PIDs
          • Docker release_agent cgroups escape
          • Sensitive Mounts
        • Namespaces
          • CGroup Namespace
          • IPC Namespace
          • PID Namespace
          • Mount Namespace
          • Network Namespace
          • Time Namespace
          • User Namespace
          • UTS Namespace
        • Seccomp
        • Weaponizing Distroless
      • Escaping from Jails
      • euid, ruid, suid
      • Interesting Groups - Linux Privesc
        • lxd/lxc Group - Privilege escalation
      • Logstash
      • ld.so privesc exploit example
      • Linux Active Directory
      • Linux Capabilities
      • NFS no_root_squash/no_all_squash misconfiguration PE
      • Node inspector/CEF debug abuse
      • Payloads to execute
      • RunC Privilege Escalation
      • SELinux
      • Socket Command Injection
      • Splunk LPE and Persistence
      • SSH Forward Agent exploitation
      • Wildcards Spare tricks
    • Useful Linux Commands
    • Bypass Linux Restrictions
      • Bypass FS protections: read-only / no-exec / Distroless
        • DDexec / EverythingExec
    • Linux Environment Variables
    • Linux Post-Exploitation
      • PAM - Pluggable Authentication Modules
    • FreeIPA Pentesting
  • 🍏MacOS Hardening
    • macOS Security & Privilege Escalation
      • macOS Apps - Inspecting, debugging and Fuzzing
        • Objects in memory
        • Introduction to x64
        • Introduction to ARM64v8
      • macOS AppleFS
      • macOS Bypassing Firewalls
      • macOS Defensive Apps
      • macOS GCD - Grand Central Dispatch
      • macOS Kernel & System Extensions
        • macOS IOKit
        • macOS Kernel Extensions & Debugging
        • macOS Kernel Vulnerabilities
        • macOS System Extensions
      • macOS Network Services & Protocols
      • macOS File Extension & URL scheme app handlers
      • macOS Files, Folders, Binaries & Memory
        • macOS Bundles
        • macOS Installers Abuse
        • macOS Memory Dumping
        • macOS Sensitive Locations & Interesting Daemons
        • macOS Universal binaries & Mach-O Format
      • macOS Objective-C
      • macOS Privilege Escalation
      • macOS Process Abuse
        • macOS Dirty NIB
        • macOS Chromium Injection
        • macOS Electron Applications Injection
        • macOS Function Hooking
        • macOS IPC - Inter Process Communication
          • macOS MIG - Mach Interface Generator
          • macOS XPC
            • macOS XPC Authorization
            • macOS XPC Connecting Process Check
              • macOS PID Reuse
              • macOS xpc_connection_get_audit_token Attack
          • macOS Thread Injection via Task port
        • macOS Java Applications Injection
        • macOS Library Injection
          • macOS Dyld Hijacking & DYLD_INSERT_LIBRARIES
          • macOS Dyld Process
        • macOS Perl Applications Injection
        • macOS Python Applications Injection
        • macOS Ruby Applications Injection
        • macOS .Net Applications Injection
      • macOS Security Protections
        • macOS Gatekeeper / Quarantine / XProtect
        • macOS Launch/Environment Constraints & Trust Cache
        • macOS Sandbox
          • macOS Default Sandbox Debug
          • macOS Sandbox Debug & Bypass
            • macOS Office Sandbox Bypasses
        • macOS Authorizations DB & Authd
        • macOS SIP
        • macOS TCC
          • macOS Apple Events
          • macOS TCC Bypasses
            • macOS Apple Scripts
          • macOS TCC Payloads
        • macOS Dangerous Entitlements & TCC perms
        • macOS - AMFI - AppleMobileFileIntegrity
        • macOS MACF - Mandatory Access Control Framework
        • macOS Code Signing
        • macOS FS Tricks
          • macOS xattr-acls extra stuff
      • macOS Users & External Accounts
    • macOS Red Teaming
      • macOS MDM
        • Enrolling Devices in Other Organisations
        • macOS Serial Number
      • macOS Keychain
    • macOS Useful Commands
    • macOS Auto Start
  • 🪟Windows Hardening
    • Checklist - Local Windows Privilege Escalation
    • Windows Local Privilege Escalation
      • Abusing Tokens
      • Access Tokens
      • ACLs - DACLs/SACLs/ACEs
      • AppendData/AddSubdirectory permission over service registry
      • Create MSI with WIX
      • COM Hijacking
      • Dll Hijacking
        • Writable Sys Path +Dll Hijacking Privesc
      • DPAPI - Extracting Passwords
      • From High Integrity to SYSTEM with Name Pipes
      • Integrity Levels
      • JuicyPotato
      • Leaked Handle Exploitation
      • MSI Wrapper
      • Named Pipe Client Impersonation
      • Privilege Escalation with Autoruns
      • RoguePotato, PrintSpoofer, SharpEfsPotato, GodPotato
      • SeDebug + SeImpersonate copy token
      • SeImpersonate from High To System
      • Windows C Payloads
    • Active Directory Methodology
      • Abusing Active Directory ACLs/ACEs
        • Shadow Credentials
      • AD Certificates
        • AD CS Account Persistence
        • AD CS Domain Escalation
        • AD CS Domain Persistence
        • AD CS Certificate Theft
      • AD information in printers
      • AD DNS Records
      • ASREPRoast
      • BloodHound & Other AD Enum Tools
      • Constrained Delegation
      • Custom SSP
      • DCShadow
      • DCSync
      • Diamond Ticket
      • DSRM Credentials
      • External Forest Domain - OneWay (Inbound) or bidirectional
      • External Forest Domain - One-Way (Outbound)
      • Golden Ticket
      • Kerberoast
      • Kerberos Authentication
      • Kerberos Double Hop Problem
      • LAPS
      • MSSQL AD Abuse
      • Over Pass the Hash/Pass the Key
      • Pass the Ticket
      • Password Spraying / Brute Force
      • PrintNightmare
      • Force NTLM Privileged Authentication
      • Privileged Groups
      • RDP Sessions Abuse
      • Resource-based Constrained Delegation
      • Security Descriptors
      • SID-History Injection
      • Silver Ticket
      • Skeleton Key
      • Unconstrained Delegation
    • Windows Security Controls
      • UAC - User Account Control
    • NTLM
      • Places to steal NTLM creds
    • Lateral Movement
      • AtExec / SchtasksExec
      • DCOM Exec
      • PsExec/Winexec/ScExec
      • SmbExec/ScExec
      • WinRM
      • WmiExec
    • Pivoting to the Cloud
    • Stealing Windows Credentials
      • Windows Credentials Protections
      • Mimikatz
      • WTS Impersonator
    • Basic Win CMD for Pentesters
    • Basic PowerShell for Pentesters
      • PowerView/SharpView
    • Antivirus (AV) Bypass
  • 📱Mobile Pentesting
    • Android APK Checklist
    • Android Applications Pentesting
      • Android Applications Basics
      • Android Task Hijacking
      • ADB Commands
      • APK decompilers
      • AVD - Android Virtual Device
      • Bypass Biometric Authentication (Android)
      • content:// protocol
      • Drozer Tutorial
        • Exploiting Content Providers
      • Exploiting a debuggeable application
      • Frida Tutorial
        • Frida Tutorial 1
        • Frida Tutorial 2
        • Frida Tutorial 3
        • Objection Tutorial
      • Google CTF 2018 - Shall We Play a Game?
      • Install Burp Certificate
      • Intent Injection
      • Make APK Accept CA Certificate
      • Manual DeObfuscation
      • React Native Application
      • Reversing Native Libraries
      • Smali - Decompiling/[Modifying]/Compiling
      • Spoofing your location in Play Store
      • Tapjacking
      • Webview Attacks
    • iOS Pentesting Checklist
    • iOS Pentesting
      • iOS App Extensions
      • iOS Basics
      • iOS Basic Testing Operations
      • iOS Burp Suite Configuration
      • iOS Custom URI Handlers / Deeplinks / Custom Schemes
      • iOS Extracting Entitlements From Compiled Application
      • iOS Frida Configuration
      • iOS Hooking With Objection
      • iOS Protocol Handlers
      • iOS Serialisation and Encoding
      • iOS Testing Environment
      • iOS UIActivity Sharing
      • iOS Universal Links
      • iOS UIPasteboard
      • iOS WebViews
    • Cordova Apps
    • Xamarin Apps
  • 👽Network Services Pentesting
    • Pentesting JDWP - Java Debug Wire Protocol
    • Pentesting Printers
    • Pentesting SAP
    • Pentesting VoIP
      • Basic VoIP Protocols
        • SIP (Session Initiation Protocol)
    • Pentesting Remote GdbServer
    • 7/tcp/udp - Pentesting Echo
    • 21 - Pentesting FTP
      • FTP Bounce attack - Scan
      • FTP Bounce - Download 2ºFTP file
    • 22 - Pentesting SSH/SFTP
    • 23 - Pentesting Telnet
    • 25,465,587 - Pentesting SMTP/s
      • SMTP Smuggling
      • SMTP - Commands
    • 43 - Pentesting WHOIS
    • 49 - Pentesting TACACS+
    • 53 - Pentesting DNS
    • 69/UDP TFTP/Bittorrent-tracker
    • 79 - Pentesting Finger
    • 80,443 - Pentesting Web Methodology
      • 403 & 401 Bypasses
      • AEM - Adobe Experience Cloud
      • Angular
      • Apache
      • Artifactory Hacking guide
      • Bolt CMS
      • Buckets
        • Firebase Database
      • CGI
      • DotNetNuke (DNN)
      • Drupal
        • Drupal RCE
      • Electron Desktop Apps
        • Electron contextIsolation RCE via preload code
        • Electron contextIsolation RCE via Electron internal code
        • Electron contextIsolation RCE via IPC
      • Flask
      • NodeJS Express
      • Git
      • Golang
      • GWT - Google Web Toolkit
      • Grafana
      • GraphQL
      • H2 - Java SQL database
      • IIS - Internet Information Services
      • ImageMagick Security
      • JBOSS
      • Jira & Confluence
      • Joomla
      • JSP
      • Laravel
      • Moodle
      • Nginx
      • NextJS
      • PHP Tricks
        • PHP - Useful Functions & disable_functions/open_basedir bypass
          • disable_functions bypass - php-fpm/FastCGI
          • disable_functions bypass - dl function
          • disable_functions bypass - PHP 7.0-7.4 (*nix only)
          • disable_functions bypass - Imagick <= 3.3.0 PHP >= 5.4 Exploit
          • disable_functions - PHP 5.x Shellshock Exploit
          • disable_functions - PHP 5.2.4 ionCube extension Exploit
          • disable_functions bypass - PHP <= 5.2.9 on windows
          • disable_functions bypass - PHP 5.2.4 and 5.2.5 PHP cURL
          • disable_functions bypass - PHP safe_mode bypass via proc_open() and custom environment Exploit
          • disable_functions bypass - PHP Perl Extension Safe_mode Bypass Exploit
          • disable_functions bypass - PHP 5.2.3 - Win32std ext Protections Bypass
          • disable_functions bypass - PHP 5.2 - FOpen Exploit
          • disable_functions bypass - via mem
          • disable_functions bypass - mod_cgi
          • disable_functions bypass - PHP 4 >= 4.2.0, PHP 5 pcntl_exec
        • PHP - RCE abusing object creation: new $_GET["a"]($_GET["b"])
        • PHP SSRF
      • PrestaShop
      • Python
      • Rocket Chat
      • Special HTTP headers
      • Source code Review / SAST Tools
      • Spring Actuators
      • Symfony
      • Tomcat
        • Basic Tomcat Info
      • Uncovering CloudFlare
      • VMWare (ESX, VCenter...)
      • Web API Pentesting
      • WebDav
      • Werkzeug / Flask Debug
      • Wordpress
    • 88tcp/udp - Pentesting Kerberos
      • Harvesting tickets from Windows
      • Harvesting tickets from Linux
    • 110,995 - Pentesting POP
    • 111/TCP/UDP - Pentesting Portmapper
    • 113 - Pentesting Ident
    • 123/udp - Pentesting NTP
    • 135, 593 - Pentesting MSRPC
    • 137,138,139 - Pentesting NetBios
    • 139,445 - Pentesting SMB
      • rpcclient enumeration
    • 143,993 - Pentesting IMAP
    • 161,162,10161,10162/udp - Pentesting SNMP
      • Cisco SNMP
      • SNMP RCE
    • 194,6667,6660-7000 - Pentesting IRC
    • 264 - Pentesting Check Point FireWall-1
    • 389, 636, 3268, 3269 - Pentesting LDAP
    • 500/udp - Pentesting IPsec/IKE VPN
    • 502 - Pentesting Modbus
    • 512 - Pentesting Rexec
    • 513 - Pentesting Rlogin
    • 514 - Pentesting Rsh
    • 515 - Pentesting Line Printer Daemon (LPD)
    • 548 - Pentesting Apple Filing Protocol (AFP)
    • 554,8554 - Pentesting RTSP
    • 623/UDP/TCP - IPMI
    • 631 - Internet Printing Protocol(IPP)
    • 700 - Pentesting EPP
    • 873 - Pentesting Rsync
    • 1026 - Pentesting Rusersd
    • 1080 - Pentesting Socks
    • 1098/1099/1050 - Pentesting Java RMI - RMI-IIOP
    • 1414 - Pentesting IBM MQ
    • 1433 - Pentesting MSSQL - Microsoft SQL Server
      • Types of MSSQL Users
    • 1521,1522-1529 - Pentesting Oracle TNS Listener
    • 1723 - Pentesting PPTP
    • 1883 - Pentesting MQTT (Mosquitto)
    • 2049 - Pentesting NFS Service
    • 2301,2381 - Pentesting Compaq/HP Insight Manager
    • 2375, 2376 Pentesting Docker
    • 3128 - Pentesting Squid
    • 3260 - Pentesting ISCSI
    • 3299 - Pentesting SAPRouter
    • 3306 - Pentesting Mysql
    • 3389 - Pentesting RDP
    • 3632 - Pentesting distcc
    • 3690 - Pentesting Subversion (svn server)
    • 3702/UDP - Pentesting WS-Discovery
    • 4369 - Pentesting Erlang Port Mapper Daemon (epmd)
    • 4786 - Cisco Smart Install
    • 4840 - OPC Unified Architecture
    • 5000 - Pentesting Docker Registry
    • 5353/UDP Multicast DNS (mDNS) and DNS-SD
    • 5432,5433 - Pentesting Postgresql
    • 5439 - Pentesting Redshift
    • 5555 - Android Debug Bridge
    • 5601 - Pentesting Kibana
    • 5671,5672 - Pentesting AMQP
    • 5800,5801,5900,5901 - Pentesting VNC
    • 5984,6984 - Pentesting CouchDB
    • 5985,5986 - Pentesting WinRM
    • 5985,5986 - Pentesting OMI
    • 6000 - Pentesting X11
    • 6379 - Pentesting Redis
    • 8009 - Pentesting Apache JServ Protocol (AJP)
    • 8086 - Pentesting InfluxDB
    • 8089 - Pentesting Splunkd
    • 8333,18333,38333,18444 - Pentesting Bitcoin
    • 9000 - Pentesting FastCGI
    • 9001 - Pentesting HSQLDB
    • 9042/9160 - Pentesting Cassandra
    • 9100 - Pentesting Raw Printing (JetDirect, AppSocket, PDL-datastream)
    • 9200 - Pentesting Elasticsearch
    • 10000 - Pentesting Network Data Management Protocol (ndmp)
    • 11211 - Pentesting Memcache
      • Memcache Commands
    • 15672 - Pentesting RabbitMQ Management
    • 24007,24008,24009,49152 - Pentesting GlusterFS
    • 27017,27018 - Pentesting MongoDB
    • 44134 - Pentesting Tiller (Helm)
    • 44818/UDP/TCP - Pentesting EthernetIP
    • 47808/udp - Pentesting BACNet
    • 50030,50060,50070,50075,50090 - Pentesting Hadoop
  • 🕸️Pentesting Web
    • Web Vulnerabilities Methodology
    • Reflecting Techniques - PoCs and Polygloths CheatSheet
      • Web Vulns List
    • 2FA/MFA/OTP Bypass
    • Account Takeover
    • Browser Extension Pentesting Methodology
      • BrowExt - ClickJacking
      • BrowExt - permissions & host_permissions
      • BrowExt - XSS Example
    • Bypass Payment Process
    • Captcha Bypass
    • Cache Poisoning and Cache Deception
      • Cache Poisoning via URL discrepancies
      • Cache Poisoning to DoS
    • Clickjacking
    • Client Side Template Injection (CSTI)
    • Client Side Path Traversal
    • Command Injection
    • Content Security Policy (CSP) Bypass
      • CSP bypass: self + 'unsafe-inline' with Iframes
    • Cookies Hacking
      • Cookie Tossing
      • Cookie Jar Overflow
      • Cookie Bomb
    • CORS - Misconfigurations & Bypass
    • CRLF (%0D%0A) Injection
    • CSRF (Cross Site Request Forgery)
    • Dangling Markup - HTML scriptless injection
      • SS-Leaks
    • Dependency Confusion
    • Deserialization
      • NodeJS - __proto__ & prototype Pollution
        • Client Side Prototype Pollution
        • Express Prototype Pollution Gadgets
        • Prototype Pollution to RCE
      • Java JSF ViewState (.faces) Deserialization
      • Java DNS Deserialization, GadgetProbe and Java Deserialization Scanner
      • Basic Java Deserialization (ObjectInputStream, readObject)
      • PHP - Deserialization + Autoload Classes
      • CommonsCollection1 Payload - Java Transformers to Rutime exec() and Thread Sleep
      • Basic .Net deserialization (ObjectDataProvider gadget, ExpandedWrapper, and Json.Net)
      • Exploiting __VIEWSTATE knowing the secrets
      • Exploiting __VIEWSTATE without knowing the secrets
      • Python Yaml Deserialization
      • JNDI - Java Naming and Directory Interface & Log4Shell
      • Ruby Class Pollution
    • Domain/Subdomain takeover
    • Email Injections
    • File Inclusion/Path traversal
      • phar:// deserialization
      • LFI2RCE via PHP Filters
      • LFI2RCE via Nginx temp files
      • LFI2RCE via PHP_SESSION_UPLOAD_PROGRESS
      • LFI2RCE via Segmentation Fault
      • LFI2RCE via phpinfo()
      • LFI2RCE Via temp file uploads
      • LFI2RCE via Eternal waiting
      • LFI2RCE Via compress.zlib + PHP_STREAM_PREFER_STUDIO + Path Disclosure
    • File Upload
      • PDF Upload - XXE and CORS bypass
    • Formula/CSV/Doc/LaTeX/GhostScript Injection
    • gRPC-Web Pentest
    • HTTP Connection Contamination
    • HTTP Connection Request Smuggling
    • HTTP Request Smuggling / HTTP Desync Attack
      • Browser HTTP Request Smuggling
      • Request Smuggling in HTTP/2 Downgrades
    • HTTP Response Smuggling / Desync
    • Upgrade Header Smuggling
    • hop-by-hop headers
    • IDOR
    • JWT Vulnerabilities (Json Web Tokens)
    • LDAP Injection
    • Login Bypass
      • Login bypass List
    • NoSQL injection
    • OAuth to Account takeover
    • Open Redirect
    • ORM Injection
    • Parameter Pollution
    • Phone Number Injections
    • PostMessage Vulnerabilities
      • Blocking main page to steal postmessage
      • Bypassing SOP with Iframes - 1
      • Bypassing SOP with Iframes - 2
      • Steal postmessage modifying iframe location
    • Proxy / WAF Protections Bypass
    • Race Condition
    • Rate Limit Bypass
    • Registration & Takeover Vulnerabilities
    • Regular expression Denial of Service - ReDoS
    • Reset/Forgotten Password Bypass
    • Reverse Tab Nabbing
    • SAML Attacks
      • SAML Basics
    • Server Side Inclusion/Edge Side Inclusion Injection
    • SQL Injection
      • MS Access SQL Injection
      • MSSQL Injection
      • MySQL injection
        • MySQL File priv to SSRF/RCE
      • Oracle injection
      • Cypher Injection (neo4j)
      • PostgreSQL injection
        • dblink/lo_import data exfiltration
        • PL/pgSQL Password Bruteforce
        • Network - Privesc, Port Scanner and NTLM chanllenge response disclosure
        • Big Binary Files Upload (PostgreSQL)
        • RCE with PostgreSQL Languages
        • RCE with PostgreSQL Extensions
      • SQLMap - CheatSheet
        • Second Order Injection - SQLMap
    • SSRF (Server Side Request Forgery)
      • URL Format Bypass
      • SSRF Vulnerable Platforms
      • Cloud SSRF
    • SSTI (Server Side Template Injection)
      • EL - Expression Language
      • Jinja2 SSTI
    • Timing Attacks
    • Unicode Injection
      • Unicode Normalization
    • UUID Insecurities
    • WebSocket Attacks
    • Web Tool - WFuzz
    • XPATH injection
    • XSLT Server Side Injection (Extensible Stylesheet Language Transformations)
    • XXE - XEE - XML External Entity
    • XSS (Cross Site Scripting)
      • Abusing Service Workers
      • Chrome Cache to XSS
      • Debugging Client Side JS
      • Dom Clobbering
      • DOM Invader
      • DOM XSS
      • Iframes in XSS, CSP and SOP
      • Integer Overflow
      • JS Hoisting
      • Misc JS Tricks & Relevant Info
      • PDF Injection
      • Server Side XSS (Dynamic PDF)
      • Shadow DOM
      • SOME - Same Origin Method Execution
      • Sniff Leak
      • Steal Info JS
      • XSS in Markdown
    • XSSI (Cross-Site Script Inclusion)
    • XS-Search/XS-Leaks
      • Connection Pool Examples
      • Connection Pool by Destination Example
      • Cookie Bomb + Onerror XS Leak
      • URL Max Length - Client Side
      • performance.now example
      • performance.now + Force heavy task
      • Event Loop Blocking + Lazy images
      • JavaScript Execution XS Leak
      • CSS Injection
        • CSS Injection Code
    • Iframe Traps
  • ⛈️Cloud Security
    • Pentesting Kubernetes
    • Pentesting Cloud (AWS, GCP, Az...)
    • Pentesting CI/CD (Github, Jenkins, Terraform...)
  • 😎Hardware/Physical Access
    • Physical Attacks
    • Escaping from KIOSKs
    • Firmware Analysis
      • Bootloader testing
      • Firmware Integrity
  • 🎯Binary Exploitation
    • Basic Stack Binary Exploitation Methodology
      • ELF Basic Information
      • Exploiting Tools
        • PwnTools
    • Stack Overflow
      • Pointer Redirecting
      • Ret2win
        • Ret2win - arm64
      • Stack Shellcode
        • Stack Shellcode - arm64
      • Stack Pivoting - EBP2Ret - EBP chaining
      • Uninitialized Variables
    • ROP - Return Oriented Programing
      • BROP - Blind Return Oriented Programming
      • Ret2csu
      • Ret2dlresolve
      • Ret2esp / Ret2reg
      • Ret2lib
        • Leaking libc address with ROP
          • Leaking libc - template
        • One Gadget
        • Ret2lib + Printf leak - arm64
      • Ret2syscall
        • Ret2syscall - ARM64
      • Ret2vDSO
      • SROP - Sigreturn-Oriented Programming
        • SROP - ARM64
    • Array Indexing
    • Integer Overflow
    • Format Strings
      • Format Strings - Arbitrary Read Example
      • Format Strings Template
    • Libc Heap
      • Bins & Memory Allocations
      • Heap Memory Functions
        • free
        • malloc & sysmalloc
        • unlink
        • Heap Functions Security Checks
      • Use After Free
        • First Fit
      • Double Free
      • Overwriting a freed chunk
      • Heap Overflow
      • Unlink Attack
      • Fast Bin Attack
      • Unsorted Bin Attack
      • Large Bin Attack
      • Tcache Bin Attack
      • Off by one overflow
      • House of Spirit
      • House of Lore | Small bin Attack
      • House of Einherjar
      • House of Force
      • House of Orange
      • House of Rabbit
      • House of Roman
    • Common Binary Exploitation Protections & Bypasses
      • ASLR
        • Ret2plt
        • Ret2ret & Reo2pop
      • CET & Shadow Stack
      • Libc Protections
      • Memory Tagging Extension (MTE)
      • No-exec / NX
      • PIE
        • BF Addresses in the Stack
      • Relro
      • Stack Canaries
        • BF Forked & Threaded Stack Canaries
        • Print Stack Canary
    • Write What Where 2 Exec
      • WWW2Exec - atexit()
      • WWW2Exec - .dtors & .fini_array
      • WWW2Exec - GOT/PLT
      • WWW2Exec - __malloc_hook & __free_hook
    • Common Exploiting Problems
    • Windows Exploiting (Basic Guide - OSCP lvl)
    • iOS Exploiting
  • 🔩Reversing
    • Reversing Tools & Basic Methods
      • Angr
        • Angr - Examples
      • Z3 - Satisfiability Modulo Theories (SMT)
      • Cheat Engine
      • Blobrunner
    • Common API used in Malware
    • Word Macros
  • 🔮Crypto & Stego
    • Cryptographic/Compression Algorithms
      • Unpacking binaries
    • Certificates
    • Cipher Block Chaining CBC-MAC
    • Crypto CTFs Tricks
    • Electronic Code Book (ECB)
    • Hash Length Extension Attack
    • Padding Oracle
    • RC4 - Encrypt&Decrypt
    • Stego Tricks
    • Esoteric languages
    • Blockchain & Crypto Currencies
  • 🦂C2
    • Salseo
    • ICMPsh
    • Cobalt Strike
  • ✍️TODO
    • Other Big References
    • Rust Basics
    • More Tools
    • MISC
    • Pentesting DNS
    • Hardware Hacking
      • I2C
      • UART
      • Radio
      • JTAG
      • SPI
    • Industrial Control Systems Hacking
      • Modbus Protocol
    • Radio Hacking
      • Pentesting RFID
      • Infrared
      • Sub-GHz RF
      • iButton
      • Flipper Zero
        • FZ - NFC
        • FZ - Sub-GHz
        • FZ - Infrared
        • FZ - iButton
        • FZ - 125kHz RFID
      • Proxmark 3
      • FISSURE - The RF Framework
      • Low-Power Wide Area Network
      • Pentesting BLE - Bluetooth Low Energy
    • Industrial Control Systems Hacking
    • Test LLMs
    • LLM Training
      • 0. Basic LLM Concepts
      • 1. Tokenizing
      • 2. Data Sampling
      • 3. Token Embeddings
      • 4. Attention Mechanisms
      • 5. LLM Architecture
      • 6. Pre-training & Loading models
      • 7.0. LoRA Improvements in fine-tuning
      • 7.1. Fine-Tuning for Classification
      • 7.2. Fine-Tuning to follow instructions
    • Burp Suite
    • Other Web Tricks
    • Interesting HTTP
    • Android Forensics
    • TR-069
    • 6881/udp - Pentesting BitTorrent
    • Online Platforms with API
    • Stealing Sensitive Information Disclosure from a Web
    • Post Exploitation
    • Investment Terms
    • Cookies Policy
Powered by GitBook
On this page
  • Attention Mechanisms and Self-Attention in Neural Networks
  • Understanding Attention Mechanisms
  • Introduction to Self-Attention
  • Calculating Attention Weights: A Step-by-Step Example
  • Summary of the Process
  • Self-Attention with Trainable Weights
  • Code Example
  • Causal Attention: Hiding Future Words
  • Applying a Causal Attention Mask
  • Masking Additional Attention Weights with Dropout
  • Code Example
  • Extending Single-Head Attention to Multi-Head Attention
  • Code Example
  • References
Edit on GitHub
  1. TODO
  2. LLM Training

4. Attention Mechanisms

Attention Mechanisms and Self-Attention in Neural Networks

Attention mechanisms allow neural networks to focus on specific parts of the input when generating each part of the output. They assign different weights to different inputs, helping the model decide which inputs are most relevant to the task at hand. This is crucial in tasks like machine translation, where understanding the context of the entire sentence is necessary for accurate translation.

The goal of this fourth phase is very simple: Apply some attetion mechanisms. These are going to be a lot of repeated layers that are going to capture the relation of a word in the vocabulary with its neighbours in the current sentence being used to train the LLM. A lot of layers are used for this, so a lot of trainable parameters are going to be capturing this information.

Understanding Attention Mechanisms

In traditional sequence-to-sequence models used for language translation, the model encodes an input sequence into a fixed-size context vector. However, this approach struggles with long sentences because the fixed-size context vector may not capture all necessary information. Attention mechanisms address this limitation by allowing the model to consider all input tokens when generating each output token.

Example: Machine Translation

Consider translating the German sentence "Kannst du mir helfen diesen Satz zu übersetzen" into English. A word-by-word translation would not produce a grammatically correct English sentence due to differences in grammatical structures between languages. An attention mechanism enables the model to focus on relevant parts of the input sentence when generating each word of the output sentence, leading to a more accurate and coherent translation.

Introduction to Self-Attention

Self-attention, or intra-attention, is a mechanism where attention is applied within a single sequence to compute a representation of that sequence. It allows each token in the sequence to attend to all other tokens, helping the model capture dependencies between tokens regardless of their distance in the sequence.

Key Concepts

  • Tokens: Individual elements of the input sequence (e.g., words in a sentence).

  • Embeddings: Vector representations of tokens, capturing semantic information.

  • Attention Weights: Values that determine the importance of each token relative to others.

Calculating Attention Weights: A Step-by-Step Example

Let's consider the sentence "Hello shiny sun!" and represent each word with a 3-dimensional embedding:

  • Hello: [0.34, 0.22, 0.54]

  • shiny: [0.53, 0.34, 0.98]

  • sun: [0.29, 0.54, 0.93]

Our goal is to compute the context vector for the word "shiny" using self-attention.

Step 1: Compute Attention Scores

Just multiply each dimension value of the query with the relevant one of each token and add the results. You get 1 value per pair of tokens.

For each word in the sentence, compute the attention score with respect to "shiny" by calculating the dot product of their embeddings.

Attention Score between "Hello" and "shiny"

Attention Score between "shiny" and "shiny"

Attention Score between "sun" and "shiny"

Step 2: Normalize Attention Scores to Obtain Attention Weights

Don't get lost in the mathematical terms, the goal of this function is simple, normalize all the weights so they sum 1 in total.

Moreover, softmax function is used because it accentuates differences due to the exponential part, making easier to detect useful values.

Apply the softmax function to the attention scores to convert them into attention weights that sum to 1.

Calculating the exponentials:

Calculating the sum:

Calculating attention weights:

Step 3: Compute the Context Vector

Just get each attention weight and multiply it to the related token dimensions and then sum all the dimensions to get just 1 vector (the context vector)

The context vector is computed as the weighted sum of the embeddings of all words, using the attention weights.

Calculating each component:

  • Weighted Embedding of "Hello":

  • Weighted Embedding of "shiny":

  • Weighted Embedding of "sun":

Summing the weighted embeddings:

context vector=[0.0779+0.2156+0.1057, 0.0504+0.1382+0.1972, 0.1237+0.3983+0.3390]=[0.3992,0.3858,0.8610]

This context vector represents the enriched embedding for the word "shiny," incorporating information from all words in the sentence.

Summary of the Process

  1. Compute Attention Scores: Use the dot product between the embedding of the target word and the embeddings of all words in the sequence.

  2. Normalize Scores to Get Attention Weights: Apply the softmax function to the attention scores to obtain weights that sum to 1.

  3. Compute Context Vector: Multiply each word's embedding by its attention weight and sum the results.

Self-Attention with Trainable Weights

In practice, self-attention mechanisms use trainable weights to learn the best representations for queries, keys, and values. This involves introducing three weight matrices:

The query is the data to use like before, while the keys and values matrices are just random-trainable matrices.

Step 1: Compute Queries, Keys, and Values

Each token will have its own query, key and value matrix by multiplying its dimension values by the defined matrices:

These matrices transform the original embeddings into a new space suitable for computing attention.

Example

Assuming:

  • Input dimension din=3 (embedding size)

  • Output dimension dout=2 (desired dimension for queries, keys, and values)

Initialize the weight matrices:

import torch.nn as nn

d_in = 3
d_out = 2

W_query = nn.Parameter(torch.rand(d_in, d_out))
W_key = nn.Parameter(torch.rand(d_in, d_out))
W_value = nn.Parameter(torch.rand(d_in, d_out))

Compute queries, keys, and values:

queries = torch.matmul(inputs, W_query)
keys = torch.matmul(inputs, W_key)
values = torch.matmul(inputs, W_value)

Step 2: Compute Scaled Dot-Product Attention

Compute Attention Scores

Similar to the example from before, but this time, instead of using the values of the dimensions of the tokens, we use the key matrix of the token (calculated already using the dimensions):. So, for each query qi​ and key kj​:

Scale the Scores

To prevent the dot products from becoming too large, scale them by the square root of the key dimension dk​:

The score is divided by the square root of the dimensions because dot products might become very large and this helps to regulate them.

Apply Softmax to Obtain Attention Weights: Like in the initial example, normalize all the values so they sum 1.

Step 3: Compute Context Vectors

Like in the initial example, just sum all the values matrices multiplying each one by its attention weight:

Code Example

import torch

inputs = torch.tensor(
  [[0.43, 0.15, 0.89], # Your     (x^1)
   [0.55, 0.87, 0.66], # journey  (x^2)
   [0.57, 0.85, 0.64], # starts   (x^3)
   [0.22, 0.58, 0.33], # with     (x^4)
   [0.77, 0.25, 0.10], # one      (x^5)
   [0.05, 0.80, 0.55]] # step     (x^6)
)

import torch.nn as nn
class SelfAttention_v2(nn.Module):

    def __init__(self, d_in, d_out, qkv_bias=False):
        super().__init__()
        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_key   = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)

    def forward(self, x):
        keys = self.W_key(x)
        queries = self.W_query(x)
        values = self.W_value(x)
        
        attn_scores = queries @ keys.T
        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)

        context_vec = attn_weights @ values
        return context_vec

d_in=3
d_out=2
torch.manual_seed(789)
sa_v2 = SelfAttention_v2(d_in, d_out)
print(sa_v2(inputs))

Note that instead of initializing the matrices with random values, nn.Linear is used to mark all the wights as parameters to train.

Causal Attention: Hiding Future Words

For LLMs we want the model to consider only the tokens that appear before the current position in order to predict the next token. Causal attention, also known as masked attention, achieves this by modifying the attention mechanism to prevent access to future tokens.

Applying a Causal Attention Mask

To implement causal attention, we apply a mask to the attention scores before the softmax operation so the reminding ones will still sum 1. This mask sets the attention scores of future tokens to negative infinity, ensuring that after the softmax, their attention weights are zero.

Steps

  1. Compute Attention Scores: Same as before.

  2. Apply Mask: Use an upper triangular matrix filled with negative infinity above the diagonal.

    mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1) * float('-inf')
    masked_scores = attention_scores + mask
  3. Apply Softmax: Compute attention weights using the masked scores.

    attention_weights = torch.softmax(masked_scores, dim=-1)

Masking Additional Attention Weights with Dropout

To prevent overfitting, we can apply dropout to the attention weights after the softmax operation. Dropout randomly zeroes some of the attention weights during training.

dropout = nn.Dropout(p=0.5)
attention_weights = dropout(attention_weights)

A regular dropout is about 10-20%.

Code Example

import torch
import torch.nn as nn

inputs = torch.tensor(
  [[0.43, 0.15, 0.89], # Your     (x^1)
   [0.55, 0.87, 0.66], # journey  (x^2)
   [0.57, 0.85, 0.64], # starts   (x^3)
   [0.22, 0.58, 0.33], # with     (x^4)
   [0.77, 0.25, 0.10], # one      (x^5)
   [0.05, 0.80, 0.55]] # step     (x^6)
)

batch = torch.stack((inputs, inputs), dim=0)
print(batch.shape)

class CausalAttention(nn.Module):

    def __init__(self, d_in, d_out, context_length,
                 dropout, qkv_bias=False):
        super().__init__()
        self.d_out = d_out
        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_key   = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.dropout = nn.Dropout(dropout)
        self.register_buffer('mask', torch.triu(torch.ones(context_length, context_length), diagonal=1)) # New

    def forward(self, x):
        b, num_tokens, d_in = x.shape
        # b is the num of batches
        # num_tokens is the number of tokens per batch
        # d_in is the dimensions er token
        
        keys = self.W_key(x) # This generates the keys of the tokens
        queries = self.W_query(x)
        values = self.W_value(x)

        attn_scores = queries @ keys.transpose(1, 2) # Moves the third dimension to the second one and the second one to the third one to be able to multiply
        attn_scores.masked_fill_(  # New, _ ops are in-place
            self.mask.bool()[:num_tokens, :num_tokens], -torch.inf)  # `:num_tokens` to account for cases where the number of tokens in the batch is smaller than the supported context_size
        attn_weights = torch.softmax(
            attn_scores / keys.shape[-1]**0.5, dim=-1
        )
        attn_weights = self.dropout(attn_weights)

        context_vec = attn_weights @ values
        return context_vec

torch.manual_seed(123)

context_length = batch.shape[1]
d_in = 3
d_out = 2
ca = CausalAttention(d_in, d_out, context_length, 0.0)

context_vecs = ca(batch)

print(context_vecs)
print("context_vecs.shape:", context_vecs.shape)

Extending Single-Head Attention to Multi-Head Attention

Multi-head attention in practical terms consist on executing multiple instances of the self-attention function each of them with their own weights so different final vectors are calculated.

Code Example

class MultiHeadAttention(nn.Module):
    def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False):
        super().__init__()
        assert (d_out % num_heads == 0), \
            "d_out must be divisible by num_heads"

        self.d_out = d_out
        self.num_heads = num_heads
        self.head_dim = d_out // num_heads # Reduce the projection dim to match desired output dim

        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.out_proj = nn.Linear(d_out, d_out)  # Linear layer to combine head outputs
        self.dropout = nn.Dropout(dropout)
        self.register_buffer(
            "mask",
            torch.triu(torch.ones(context_length, context_length),
                       diagonal=1)
        )

    def forward(self, x):
        b, num_tokens, d_in = x.shape
        # b is the num of batches
        # num_tokens is the number of tokens per batch
        # d_in is the dimensions er token

        keys = self.W_key(x) # Shape: (b, num_tokens, d_out)
        queries = self.W_query(x)
        values = self.W_value(x)

        # We implicitly split the matrix by adding a `num_heads` dimension
        # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim)
        keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) 
        values = values.view(b, num_tokens, self.num_heads, self.head_dim)
        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)

        # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim)
        keys = keys.transpose(1, 2)
        queries = queries.transpose(1, 2)
        values = values.transpose(1, 2)

        # Compute scaled dot-product attention (aka self-attention) with a causal mask
        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head

        # Original mask truncated to the number of tokens and converted to boolean
        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]

        # Use the mask to fill attention scores
        attn_scores.masked_fill_(mask_bool, -torch.inf)
        
        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
        attn_weights = self.dropout(attn_weights)

        # Shape: (b, num_tokens, num_heads, head_dim)
        context_vec = (attn_weights @ values).transpose(1, 2) 
        
        # Combine heads, where self.d_out = self.num_heads * self.head_dim
        context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
        context_vec = self.out_proj(context_vec) # optional projection

        return context_vec

torch.manual_seed(123)

batch_size, context_length, d_in = batch.shape
d_out = 2
mha = MultiHeadAttention(d_in, d_out, context_length, 0.0, num_heads=2)

context_vecs = mha(batch)

print(context_vecs)
print("context_vecs.shape:", context_vecs.shape)

Short answer of ChatGPT about why it's better to divide dimensions of tokens among the heads instead of having each head check all the dimensions of all the tokens:

While allowing each head to process all embedding dimensions might seem advantageous because each head would have access to the full information, the standard practice is to divide the embedding dimensions among the heads. This approach balances computational efficiency with model performance and encourages each head to learn diverse representations. Therefore, splitting the embedding dimensions is generally preferred over having each head check all dimensions.

References

Previous3. Token EmbeddingsNext5. LLM Architecture

Last updated 8 months ago

Grabbing an example from you can check this class that implements the self-attendant functionality we talked about:

Code example from :

It could be possible to reuse the previous code and just add a wrapper that launches it several time, but this is a more optimised version from that processes all the heads at the same time (reducing the number of expensive for loops). As you can see in the code, the dimensions of each token is divided in different dimensions according to the number of heads. This way if token have 8 dimensions and we want to use 3 heads, the dimensions will be divided in 2 arrays of 4 dimensions and each head will use one of them:

For another compact and efficient implementation you could use the class in PyTorch.

✍️
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb
torch.nn.MultiheadAttention
https://www.manning.com/books/build-a-large-language-model-from-scratch