🤖
hacktricks
  • 👾Welcome!
    • HackTricks
    • HackTricks Values & FAQ
    • About the author
  • 🤩Generic Methodologies & Resources
    • Pentesting Methodology
    • External Recon Methodology
      • Wide Source Code Search
      • Github Dorks & Leaks
    • Pentesting Network
      • DHCPv6
      • EIGRP Attacks
      • GLBP & HSRP Attacks
      • IDS and IPS Evasion
      • Lateral VLAN Segmentation Bypass
      • Network Protocols Explained (ESP)
      • Nmap Summary (ESP)
      • Pentesting IPv6
      • WebRTC DoS
      • Spoofing LLMNR, NBT-NS, mDNS/DNS and WPAD and Relay Attacks
      • Spoofing SSDP and UPnP Devices with EvilSSDP
    • Pentesting Wifi
      • Evil Twin EAP-TLS
    • Phishing Methodology
      • Clone a Website
      • Detecting Phishing
      • Phishing Files & Documents
    • Basic Forensic Methodology
      • Baseline Monitoring
      • Anti-Forensic Techniques
      • Docker Forensics
      • Image Acquisition & Mount
      • Linux Forensics
      • Malware Analysis
      • Memory dump analysis
        • Volatility - CheatSheet
      • Partitions/File Systems/Carving
        • File/Data Carving & Recovery Tools
      • Pcap Inspection
        • DNSCat pcap analysis
        • Suricata & Iptables cheatsheet
        • USB Keystrokes
        • Wifi Pcap Analysis
        • Wireshark tricks
      • Specific Software/File-Type Tricks
        • Decompile compiled python binaries (exe, elf) - Retreive from .pyc
        • Browser Artifacts
        • Deofuscation vbs (cscript.exe)
        • Local Cloud Storage
        • Office file analysis
        • PDF File analysis
        • PNG tricks
        • Video and Audio file analysis
        • ZIPs tricks
      • Windows Artifacts
        • Interesting Windows Registry Keys
    • Brute Force - CheatSheet
    • Python Sandbox Escape & Pyscript
      • Bypass Python sandboxes
        • LOAD_NAME / LOAD_CONST opcode OOB Read
      • Class Pollution (Python's Prototype Pollution)
      • Python Internal Read Gadgets
      • Pyscript
      • venv
      • Web Requests
      • Bruteforce hash (few chars)
      • Basic Python
    • Exfiltration
    • Tunneling and Port Forwarding
    • Threat Modeling
    • Search Exploits
    • Reverse Shells (Linux, Windows, MSFVenom)
      • MSFVenom - CheatSheet
      • Reverse Shells - Windows
      • Reverse Shells - Linux
      • Full TTYs
  • 🐧Linux Hardening
    • Checklist - Linux Privilege Escalation
    • Linux Privilege Escalation
      • Arbitrary File Write to Root
      • Cisco - vmanage
      • Containerd (ctr) Privilege Escalation
      • D-Bus Enumeration & Command Injection Privilege Escalation
      • Docker Security
        • Abusing Docker Socket for Privilege Escalation
        • AppArmor
        • AuthZ& AuthN - Docker Access Authorization Plugin
        • CGroups
        • Docker --privileged
        • Docker Breakout / Privilege Escalation
          • release_agent exploit - Relative Paths to PIDs
          • Docker release_agent cgroups escape
          • Sensitive Mounts
        • Namespaces
          • CGroup Namespace
          • IPC Namespace
          • PID Namespace
          • Mount Namespace
          • Network Namespace
          • Time Namespace
          • User Namespace
          • UTS Namespace
        • Seccomp
        • Weaponizing Distroless
      • Escaping from Jails
      • euid, ruid, suid
      • Interesting Groups - Linux Privesc
        • lxd/lxc Group - Privilege escalation
      • Logstash
      • ld.so privesc exploit example
      • Linux Active Directory
      • Linux Capabilities
      • NFS no_root_squash/no_all_squash misconfiguration PE
      • Node inspector/CEF debug abuse
      • Payloads to execute
      • RunC Privilege Escalation
      • SELinux
      • Socket Command Injection
      • Splunk LPE and Persistence
      • SSH Forward Agent exploitation
      • Wildcards Spare tricks
    • Useful Linux Commands
    • Bypass Linux Restrictions
      • Bypass FS protections: read-only / no-exec / Distroless
        • DDexec / EverythingExec
    • Linux Environment Variables
    • Linux Post-Exploitation
      • PAM - Pluggable Authentication Modules
    • FreeIPA Pentesting
  • 🍏MacOS Hardening
    • macOS Security & Privilege Escalation
      • macOS Apps - Inspecting, debugging and Fuzzing
        • Objects in memory
        • Introduction to x64
        • Introduction to ARM64v8
      • macOS AppleFS
      • macOS Bypassing Firewalls
      • macOS Defensive Apps
      • macOS GCD - Grand Central Dispatch
      • macOS Kernel & System Extensions
        • macOS IOKit
        • macOS Kernel Extensions & Debugging
        • macOS Kernel Vulnerabilities
        • macOS System Extensions
      • macOS Network Services & Protocols
      • macOS File Extension & URL scheme app handlers
      • macOS Files, Folders, Binaries & Memory
        • macOS Bundles
        • macOS Installers Abuse
        • macOS Memory Dumping
        • macOS Sensitive Locations & Interesting Daemons
        • macOS Universal binaries & Mach-O Format
      • macOS Objective-C
      • macOS Privilege Escalation
      • macOS Process Abuse
        • macOS Dirty NIB
        • macOS Chromium Injection
        • macOS Electron Applications Injection
        • macOS Function Hooking
        • macOS IPC - Inter Process Communication
          • macOS MIG - Mach Interface Generator
          • macOS XPC
            • macOS XPC Authorization
            • macOS XPC Connecting Process Check
              • macOS PID Reuse
              • macOS xpc_connection_get_audit_token Attack
          • macOS Thread Injection via Task port
        • macOS Java Applications Injection
        • macOS Library Injection
          • macOS Dyld Hijacking & DYLD_INSERT_LIBRARIES
          • macOS Dyld Process
        • macOS Perl Applications Injection
        • macOS Python Applications Injection
        • macOS Ruby Applications Injection
        • macOS .Net Applications Injection
      • macOS Security Protections
        • macOS Gatekeeper / Quarantine / XProtect
        • macOS Launch/Environment Constraints & Trust Cache
        • macOS Sandbox
          • macOS Default Sandbox Debug
          • macOS Sandbox Debug & Bypass
            • macOS Office Sandbox Bypasses
        • macOS Authorizations DB & Authd
        • macOS SIP
        • macOS TCC
          • macOS Apple Events
          • macOS TCC Bypasses
            • macOS Apple Scripts
          • macOS TCC Payloads
        • macOS Dangerous Entitlements & TCC perms
        • macOS - AMFI - AppleMobileFileIntegrity
        • macOS MACF - Mandatory Access Control Framework
        • macOS Code Signing
        • macOS FS Tricks
          • macOS xattr-acls extra stuff
      • macOS Users & External Accounts
    • macOS Red Teaming
      • macOS MDM
        • Enrolling Devices in Other Organisations
        • macOS Serial Number
      • macOS Keychain
    • macOS Useful Commands
    • macOS Auto Start
  • 🪟Windows Hardening
    • Checklist - Local Windows Privilege Escalation
    • Windows Local Privilege Escalation
      • Abusing Tokens
      • Access Tokens
      • ACLs - DACLs/SACLs/ACEs
      • AppendData/AddSubdirectory permission over service registry
      • Create MSI with WIX
      • COM Hijacking
      • Dll Hijacking
        • Writable Sys Path +Dll Hijacking Privesc
      • DPAPI - Extracting Passwords
      • From High Integrity to SYSTEM with Name Pipes
      • Integrity Levels
      • JuicyPotato
      • Leaked Handle Exploitation
      • MSI Wrapper
      • Named Pipe Client Impersonation
      • Privilege Escalation with Autoruns
      • RoguePotato, PrintSpoofer, SharpEfsPotato, GodPotato
      • SeDebug + SeImpersonate copy token
      • SeImpersonate from High To System
      • Windows C Payloads
    • Active Directory Methodology
      • Abusing Active Directory ACLs/ACEs
        • Shadow Credentials
      • AD Certificates
        • AD CS Account Persistence
        • AD CS Domain Escalation
        • AD CS Domain Persistence
        • AD CS Certificate Theft
      • AD information in printers
      • AD DNS Records
      • ASREPRoast
      • BloodHound & Other AD Enum Tools
      • Constrained Delegation
      • Custom SSP
      • DCShadow
      • DCSync
      • Diamond Ticket
      • DSRM Credentials
      • External Forest Domain - OneWay (Inbound) or bidirectional
      • External Forest Domain - One-Way (Outbound)
      • Golden Ticket
      • Kerberoast
      • Kerberos Authentication
      • Kerberos Double Hop Problem
      • LAPS
      • MSSQL AD Abuse
      • Over Pass the Hash/Pass the Key
      • Pass the Ticket
      • Password Spraying / Brute Force
      • PrintNightmare
      • Force NTLM Privileged Authentication
      • Privileged Groups
      • RDP Sessions Abuse
      • Resource-based Constrained Delegation
      • Security Descriptors
      • SID-History Injection
      • Silver Ticket
      • Skeleton Key
      • Unconstrained Delegation
    • Windows Security Controls
      • UAC - User Account Control
    • NTLM
      • Places to steal NTLM creds
    • Lateral Movement
      • AtExec / SchtasksExec
      • DCOM Exec
      • PsExec/Winexec/ScExec
      • SmbExec/ScExec
      • WinRM
      • WmiExec
    • Pivoting to the Cloud
    • Stealing Windows Credentials
      • Windows Credentials Protections
      • Mimikatz
      • WTS Impersonator
    • Basic Win CMD for Pentesters
    • Basic PowerShell for Pentesters
      • PowerView/SharpView
    • Antivirus (AV) Bypass
  • 📱Mobile Pentesting
    • Android APK Checklist
    • Android Applications Pentesting
      • Android Applications Basics
      • Android Task Hijacking
      • ADB Commands
      • APK decompilers
      • AVD - Android Virtual Device
      • Bypass Biometric Authentication (Android)
      • content:// protocol
      • Drozer Tutorial
        • Exploiting Content Providers
      • Exploiting a debuggeable application
      • Frida Tutorial
        • Frida Tutorial 1
        • Frida Tutorial 2
        • Frida Tutorial 3
        • Objection Tutorial
      • Google CTF 2018 - Shall We Play a Game?
      • Install Burp Certificate
      • Intent Injection
      • Make APK Accept CA Certificate
      • Manual DeObfuscation
      • React Native Application
      • Reversing Native Libraries
      • Smali - Decompiling/[Modifying]/Compiling
      • Spoofing your location in Play Store
      • Tapjacking
      • Webview Attacks
    • iOS Pentesting Checklist
    • iOS Pentesting
      • iOS App Extensions
      • iOS Basics
      • iOS Basic Testing Operations
      • iOS Burp Suite Configuration
      • iOS Custom URI Handlers / Deeplinks / Custom Schemes
      • iOS Extracting Entitlements From Compiled Application
      • iOS Frida Configuration
      • iOS Hooking With Objection
      • iOS Protocol Handlers
      • iOS Serialisation and Encoding
      • iOS Testing Environment
      • iOS UIActivity Sharing
      • iOS Universal Links
      • iOS UIPasteboard
      • iOS WebViews
    • Cordova Apps
    • Xamarin Apps
  • 👽Network Services Pentesting
    • Pentesting JDWP - Java Debug Wire Protocol
    • Pentesting Printers
    • Pentesting SAP
    • Pentesting VoIP
      • Basic VoIP Protocols
        • SIP (Session Initiation Protocol)
    • Pentesting Remote GdbServer
    • 7/tcp/udp - Pentesting Echo
    • 21 - Pentesting FTP
      • FTP Bounce attack - Scan
      • FTP Bounce - Download 2ºFTP file
    • 22 - Pentesting SSH/SFTP
    • 23 - Pentesting Telnet
    • 25,465,587 - Pentesting SMTP/s
      • SMTP Smuggling
      • SMTP - Commands
    • 43 - Pentesting WHOIS
    • 49 - Pentesting TACACS+
    • 53 - Pentesting DNS
    • 69/UDP TFTP/Bittorrent-tracker
    • 79 - Pentesting Finger
    • 80,443 - Pentesting Web Methodology
      • 403 & 401 Bypasses
      • AEM - Adobe Experience Cloud
      • Angular
      • Apache
      • Artifactory Hacking guide
      • Bolt CMS
      • Buckets
        • Firebase Database
      • CGI
      • DotNetNuke (DNN)
      • Drupal
        • Drupal RCE
      • Electron Desktop Apps
        • Electron contextIsolation RCE via preload code
        • Electron contextIsolation RCE via Electron internal code
        • Electron contextIsolation RCE via IPC
      • Flask
      • NodeJS Express
      • Git
      • Golang
      • GWT - Google Web Toolkit
      • Grafana
      • GraphQL
      • H2 - Java SQL database
      • IIS - Internet Information Services
      • ImageMagick Security
      • JBOSS
      • Jira & Confluence
      • Joomla
      • JSP
      • Laravel
      • Moodle
      • Nginx
      • NextJS
      • PHP Tricks
        • PHP - Useful Functions & disable_functions/open_basedir bypass
          • disable_functions bypass - php-fpm/FastCGI
          • disable_functions bypass - dl function
          • disable_functions bypass - PHP 7.0-7.4 (*nix only)
          • disable_functions bypass - Imagick <= 3.3.0 PHP >= 5.4 Exploit
          • disable_functions - PHP 5.x Shellshock Exploit
          • disable_functions - PHP 5.2.4 ionCube extension Exploit
          • disable_functions bypass - PHP <= 5.2.9 on windows
          • disable_functions bypass - PHP 5.2.4 and 5.2.5 PHP cURL
          • disable_functions bypass - PHP safe_mode bypass via proc_open() and custom environment Exploit
          • disable_functions bypass - PHP Perl Extension Safe_mode Bypass Exploit
          • disable_functions bypass - PHP 5.2.3 - Win32std ext Protections Bypass
          • disable_functions bypass - PHP 5.2 - FOpen Exploit
          • disable_functions bypass - via mem
          • disable_functions bypass - mod_cgi
          • disable_functions bypass - PHP 4 >= 4.2.0, PHP 5 pcntl_exec
        • PHP - RCE abusing object creation: new $_GET["a"]($_GET["b"])
        • PHP SSRF
      • PrestaShop
      • Python
      • Rocket Chat
      • Special HTTP headers
      • Source code Review / SAST Tools
      • Spring Actuators
      • Symfony
      • Tomcat
        • Basic Tomcat Info
      • Uncovering CloudFlare
      • VMWare (ESX, VCenter...)
      • Web API Pentesting
      • WebDav
      • Werkzeug / Flask Debug
      • Wordpress
    • 88tcp/udp - Pentesting Kerberos
      • Harvesting tickets from Windows
      • Harvesting tickets from Linux
    • 110,995 - Pentesting POP
    • 111/TCP/UDP - Pentesting Portmapper
    • 113 - Pentesting Ident
    • 123/udp - Pentesting NTP
    • 135, 593 - Pentesting MSRPC
    • 137,138,139 - Pentesting NetBios
    • 139,445 - Pentesting SMB
      • rpcclient enumeration
    • 143,993 - Pentesting IMAP
    • 161,162,10161,10162/udp - Pentesting SNMP
      • Cisco SNMP
      • SNMP RCE
    • 194,6667,6660-7000 - Pentesting IRC
    • 264 - Pentesting Check Point FireWall-1
    • 389, 636, 3268, 3269 - Pentesting LDAP
    • 500/udp - Pentesting IPsec/IKE VPN
    • 502 - Pentesting Modbus
    • 512 - Pentesting Rexec
    • 513 - Pentesting Rlogin
    • 514 - Pentesting Rsh
    • 515 - Pentesting Line Printer Daemon (LPD)
    • 548 - Pentesting Apple Filing Protocol (AFP)
    • 554,8554 - Pentesting RTSP
    • 623/UDP/TCP - IPMI
    • 631 - Internet Printing Protocol(IPP)
    • 700 - Pentesting EPP
    • 873 - Pentesting Rsync
    • 1026 - Pentesting Rusersd
    • 1080 - Pentesting Socks
    • 1098/1099/1050 - Pentesting Java RMI - RMI-IIOP
    • 1414 - Pentesting IBM MQ
    • 1433 - Pentesting MSSQL - Microsoft SQL Server
      • Types of MSSQL Users
    • 1521,1522-1529 - Pentesting Oracle TNS Listener
    • 1723 - Pentesting PPTP
    • 1883 - Pentesting MQTT (Mosquitto)
    • 2049 - Pentesting NFS Service
    • 2301,2381 - Pentesting Compaq/HP Insight Manager
    • 2375, 2376 Pentesting Docker
    • 3128 - Pentesting Squid
    • 3260 - Pentesting ISCSI
    • 3299 - Pentesting SAPRouter
    • 3306 - Pentesting Mysql
    • 3389 - Pentesting RDP
    • 3632 - Pentesting distcc
    • 3690 - Pentesting Subversion (svn server)
    • 3702/UDP - Pentesting WS-Discovery
    • 4369 - Pentesting Erlang Port Mapper Daemon (epmd)
    • 4786 - Cisco Smart Install
    • 4840 - OPC Unified Architecture
    • 5000 - Pentesting Docker Registry
    • 5353/UDP Multicast DNS (mDNS) and DNS-SD
    • 5432,5433 - Pentesting Postgresql
    • 5439 - Pentesting Redshift
    • 5555 - Android Debug Bridge
    • 5601 - Pentesting Kibana
    • 5671,5672 - Pentesting AMQP
    • 5800,5801,5900,5901 - Pentesting VNC
    • 5984,6984 - Pentesting CouchDB
    • 5985,5986 - Pentesting WinRM
    • 5985,5986 - Pentesting OMI
    • 6000 - Pentesting X11
    • 6379 - Pentesting Redis
    • 8009 - Pentesting Apache JServ Protocol (AJP)
    • 8086 - Pentesting InfluxDB
    • 8089 - Pentesting Splunkd
    • 8333,18333,38333,18444 - Pentesting Bitcoin
    • 9000 - Pentesting FastCGI
    • 9001 - Pentesting HSQLDB
    • 9042/9160 - Pentesting Cassandra
    • 9100 - Pentesting Raw Printing (JetDirect, AppSocket, PDL-datastream)
    • 9200 - Pentesting Elasticsearch
    • 10000 - Pentesting Network Data Management Protocol (ndmp)
    • 11211 - Pentesting Memcache
      • Memcache Commands
    • 15672 - Pentesting RabbitMQ Management
    • 24007,24008,24009,49152 - Pentesting GlusterFS
    • 27017,27018 - Pentesting MongoDB
    • 44134 - Pentesting Tiller (Helm)
    • 44818/UDP/TCP - Pentesting EthernetIP
    • 47808/udp - Pentesting BACNet
    • 50030,50060,50070,50075,50090 - Pentesting Hadoop
  • 🕸️Pentesting Web
    • Web Vulnerabilities Methodology
    • Reflecting Techniques - PoCs and Polygloths CheatSheet
      • Web Vulns List
    • 2FA/MFA/OTP Bypass
    • Account Takeover
    • Browser Extension Pentesting Methodology
      • BrowExt - ClickJacking
      • BrowExt - permissions & host_permissions
      • BrowExt - XSS Example
    • Bypass Payment Process
    • Captcha Bypass
    • Cache Poisoning and Cache Deception
      • Cache Poisoning via URL discrepancies
      • Cache Poisoning to DoS
    • Clickjacking
    • Client Side Template Injection (CSTI)
    • Client Side Path Traversal
    • Command Injection
    • Content Security Policy (CSP) Bypass
      • CSP bypass: self + 'unsafe-inline' with Iframes
    • Cookies Hacking
      • Cookie Tossing
      • Cookie Jar Overflow
      • Cookie Bomb
    • CORS - Misconfigurations & Bypass
    • CRLF (%0D%0A) Injection
    • CSRF (Cross Site Request Forgery)
    • Dangling Markup - HTML scriptless injection
      • SS-Leaks
    • Dependency Confusion
    • Deserialization
      • NodeJS - __proto__ & prototype Pollution
        • Client Side Prototype Pollution
        • Express Prototype Pollution Gadgets
        • Prototype Pollution to RCE
      • Java JSF ViewState (.faces) Deserialization
      • Java DNS Deserialization, GadgetProbe and Java Deserialization Scanner
      • Basic Java Deserialization (ObjectInputStream, readObject)
      • PHP - Deserialization + Autoload Classes
      • CommonsCollection1 Payload - Java Transformers to Rutime exec() and Thread Sleep
      • Basic .Net deserialization (ObjectDataProvider gadget, ExpandedWrapper, and Json.Net)
      • Exploiting __VIEWSTATE knowing the secrets
      • Exploiting __VIEWSTATE without knowing the secrets
      • Python Yaml Deserialization
      • JNDI - Java Naming and Directory Interface & Log4Shell
      • Ruby Class Pollution
    • Domain/Subdomain takeover
    • Email Injections
    • File Inclusion/Path traversal
      • phar:// deserialization
      • LFI2RCE via PHP Filters
      • LFI2RCE via Nginx temp files
      • LFI2RCE via PHP_SESSION_UPLOAD_PROGRESS
      • LFI2RCE via Segmentation Fault
      • LFI2RCE via phpinfo()
      • LFI2RCE Via temp file uploads
      • LFI2RCE via Eternal waiting
      • LFI2RCE Via compress.zlib + PHP_STREAM_PREFER_STUDIO + Path Disclosure
    • File Upload
      • PDF Upload - XXE and CORS bypass
    • Formula/CSV/Doc/LaTeX/GhostScript Injection
    • gRPC-Web Pentest
    • HTTP Connection Contamination
    • HTTP Connection Request Smuggling
    • HTTP Request Smuggling / HTTP Desync Attack
      • Browser HTTP Request Smuggling
      • Request Smuggling in HTTP/2 Downgrades
    • HTTP Response Smuggling / Desync
    • Upgrade Header Smuggling
    • hop-by-hop headers
    • IDOR
    • JWT Vulnerabilities (Json Web Tokens)
    • LDAP Injection
    • Login Bypass
      • Login bypass List
    • NoSQL injection
    • OAuth to Account takeover
    • Open Redirect
    • ORM Injection
    • Parameter Pollution
    • Phone Number Injections
    • PostMessage Vulnerabilities
      • Blocking main page to steal postmessage
      • Bypassing SOP with Iframes - 1
      • Bypassing SOP with Iframes - 2
      • Steal postmessage modifying iframe location
    • Proxy / WAF Protections Bypass
    • Race Condition
    • Rate Limit Bypass
    • Registration & Takeover Vulnerabilities
    • Regular expression Denial of Service - ReDoS
    • Reset/Forgotten Password Bypass
    • Reverse Tab Nabbing
    • SAML Attacks
      • SAML Basics
    • Server Side Inclusion/Edge Side Inclusion Injection
    • SQL Injection
      • MS Access SQL Injection
      • MSSQL Injection
      • MySQL injection
        • MySQL File priv to SSRF/RCE
      • Oracle injection
      • Cypher Injection (neo4j)
      • PostgreSQL injection
        • dblink/lo_import data exfiltration
        • PL/pgSQL Password Bruteforce
        • Network - Privesc, Port Scanner and NTLM chanllenge response disclosure
        • Big Binary Files Upload (PostgreSQL)
        • RCE with PostgreSQL Languages
        • RCE with PostgreSQL Extensions
      • SQLMap - CheatSheet
        • Second Order Injection - SQLMap
    • SSRF (Server Side Request Forgery)
      • URL Format Bypass
      • SSRF Vulnerable Platforms
      • Cloud SSRF
    • SSTI (Server Side Template Injection)
      • EL - Expression Language
      • Jinja2 SSTI
    • Timing Attacks
    • Unicode Injection
      • Unicode Normalization
    • UUID Insecurities
    • WebSocket Attacks
    • Web Tool - WFuzz
    • XPATH injection
    • XSLT Server Side Injection (Extensible Stylesheet Language Transformations)
    • XXE - XEE - XML External Entity
    • XSS (Cross Site Scripting)
      • Abusing Service Workers
      • Chrome Cache to XSS
      • Debugging Client Side JS
      • Dom Clobbering
      • DOM Invader
      • DOM XSS
      • Iframes in XSS, CSP and SOP
      • Integer Overflow
      • JS Hoisting
      • Misc JS Tricks & Relevant Info
      • PDF Injection
      • Server Side XSS (Dynamic PDF)
      • Shadow DOM
      • SOME - Same Origin Method Execution
      • Sniff Leak
      • Steal Info JS
      • XSS in Markdown
    • XSSI (Cross-Site Script Inclusion)
    • XS-Search/XS-Leaks
      • Connection Pool Examples
      • Connection Pool by Destination Example
      • Cookie Bomb + Onerror XS Leak
      • URL Max Length - Client Side
      • performance.now example
      • performance.now + Force heavy task
      • Event Loop Blocking + Lazy images
      • JavaScript Execution XS Leak
      • CSS Injection
        • CSS Injection Code
    • Iframe Traps
  • ⛈️Cloud Security
    • Pentesting Kubernetes
    • Pentesting Cloud (AWS, GCP, Az...)
    • Pentesting CI/CD (Github, Jenkins, Terraform...)
  • 😎Hardware/Physical Access
    • Physical Attacks
    • Escaping from KIOSKs
    • Firmware Analysis
      • Bootloader testing
      • Firmware Integrity
  • 🎯Binary Exploitation
    • Basic Stack Binary Exploitation Methodology
      • ELF Basic Information
      • Exploiting Tools
        • PwnTools
    • Stack Overflow
      • Pointer Redirecting
      • Ret2win
        • Ret2win - arm64
      • Stack Shellcode
        • Stack Shellcode - arm64
      • Stack Pivoting - EBP2Ret - EBP chaining
      • Uninitialized Variables
    • ROP - Return Oriented Programing
      • BROP - Blind Return Oriented Programming
      • Ret2csu
      • Ret2dlresolve
      • Ret2esp / Ret2reg
      • Ret2lib
        • Leaking libc address with ROP
          • Leaking libc - template
        • One Gadget
        • Ret2lib + Printf leak - arm64
      • Ret2syscall
        • Ret2syscall - ARM64
      • Ret2vDSO
      • SROP - Sigreturn-Oriented Programming
        • SROP - ARM64
    • Array Indexing
    • Integer Overflow
    • Format Strings
      • Format Strings - Arbitrary Read Example
      • Format Strings Template
    • Libc Heap
      • Bins & Memory Allocations
      • Heap Memory Functions
        • free
        • malloc & sysmalloc
        • unlink
        • Heap Functions Security Checks
      • Use After Free
        • First Fit
      • Double Free
      • Overwriting a freed chunk
      • Heap Overflow
      • Unlink Attack
      • Fast Bin Attack
      • Unsorted Bin Attack
      • Large Bin Attack
      • Tcache Bin Attack
      • Off by one overflow
      • House of Spirit
      • House of Lore | Small bin Attack
      • House of Einherjar
      • House of Force
      • House of Orange
      • House of Rabbit
      • House of Roman
    • Common Binary Exploitation Protections & Bypasses
      • ASLR
        • Ret2plt
        • Ret2ret & Reo2pop
      • CET & Shadow Stack
      • Libc Protections
      • Memory Tagging Extension (MTE)
      • No-exec / NX
      • PIE
        • BF Addresses in the Stack
      • Relro
      • Stack Canaries
        • BF Forked & Threaded Stack Canaries
        • Print Stack Canary
    • Write What Where 2 Exec
      • WWW2Exec - atexit()
      • WWW2Exec - .dtors & .fini_array
      • WWW2Exec - GOT/PLT
      • WWW2Exec - __malloc_hook & __free_hook
    • Common Exploiting Problems
    • Windows Exploiting (Basic Guide - OSCP lvl)
    • iOS Exploiting
  • 🔩Reversing
    • Reversing Tools & Basic Methods
      • Angr
        • Angr - Examples
      • Z3 - Satisfiability Modulo Theories (SMT)
      • Cheat Engine
      • Blobrunner
    • Common API used in Malware
    • Word Macros
  • 🔮Crypto & Stego
    • Cryptographic/Compression Algorithms
      • Unpacking binaries
    • Certificates
    • Cipher Block Chaining CBC-MAC
    • Crypto CTFs Tricks
    • Electronic Code Book (ECB)
    • Hash Length Extension Attack
    • Padding Oracle
    • RC4 - Encrypt&Decrypt
    • Stego Tricks
    • Esoteric languages
    • Blockchain & Crypto Currencies
  • 🦂C2
    • Salseo
    • ICMPsh
    • Cobalt Strike
  • ✍️TODO
    • Other Big References
    • Rust Basics
    • More Tools
    • MISC
    • Pentesting DNS
    • Hardware Hacking
      • I2C
      • UART
      • Radio
      • JTAG
      • SPI
    • Industrial Control Systems Hacking
      • Modbus Protocol
    • Radio Hacking
      • Pentesting RFID
      • Infrared
      • Sub-GHz RF
      • iButton
      • Flipper Zero
        • FZ - NFC
        • FZ - Sub-GHz
        • FZ - Infrared
        • FZ - iButton
        • FZ - 125kHz RFID
      • Proxmark 3
      • FISSURE - The RF Framework
      • Low-Power Wide Area Network
      • Pentesting BLE - Bluetooth Low Energy
    • Industrial Control Systems Hacking
    • Test LLMs
    • LLM Training
      • 0. Basic LLM Concepts
      • 1. Tokenizing
      • 2. Data Sampling
      • 3. Token Embeddings
      • 4. Attention Mechanisms
      • 5. LLM Architecture
      • 6. Pre-training & Loading models
      • 7.0. LoRA Improvements in fine-tuning
      • 7.1. Fine-Tuning for Classification
      • 7.2. Fine-Tuning to follow instructions
    • Burp Suite
    • Other Web Tricks
    • Interesting HTTP
    • Android Forensics
    • TR-069
    • 6881/udp - Pentesting BitTorrent
    • Online Platforms with API
    • Stealing Sensitive Information Disclosure from a Web
    • Post Exploitation
    • Investment Terms
    • Cookies Policy
Powered by GitBook
On this page
  • Pretraining
  • Main LLM components
  • Tensors in PyTorch
  • Mathematical Concept of Tensors
  • Tensors as Data Containers
  • PyTorch Tensors vs. NumPy Arrays
  • Creating Tensors in PyTorch
  • Tensor Data Types
  • Common Tensor Operations
  • Importance in Deep Learning
  • Automatic Differentiation
  • Mathematical Explanation of Automatic Differentiation
  • Implementing Automatic Differentiation in PyTorch
  • Backpropagation in Bigger Neural Networks
  • 1.Extending to Multilayer Networks
  • 2. Backpropagation Algorithm
  • 3. Mathematical Representation
  • 4. PyTorch Implementation
  • 5. Understanding Backward Pass
  • 6. Advantages of Automatic Differentiation
Edit on GitHub
  1. TODO
  2. LLM Training

0. Basic LLM Concepts

Pretraining

Pretraining is the foundational phase in developing a large language model (LLM) where the model is exposed to vast and diverse amounts of text data. During this stage, the LLM learns the fundamental structures, patterns, and nuances of language, including grammar, vocabulary, syntax, and contextual relationships. By processing this extensive data, the model acquires a broad understanding of language and general world knowledge. This comprehensive base enables the LLM to generate coherent and contextually relevant text. Subsequently, this pretrained model can undergo fine-tuning, where it is further trained on specialized datasets to adapt its capabilities for specific tasks or domains, enhancing its performance and relevance in targeted applications.

Main LLM components

Usually a LLM is characterised for the configuration used to train it. This are the common components when training a LLM:

  • Parameters: Parameters are the learnable weights and biases in the neural network. These are the numbers that the training process adjusts to minimize the loss function and improve the model's performance on the task. LLMs usually use millions of parameters.

  • Context Length: This is the maximum length of each sentence used to pre-train the LLM.

  • Embedding Dimension: The size of the vector used to represent each token or word. LLMs usually sue billions of dimensions.

  • Hidden Dimension: The size of the hidden layers in the neural network.

  • Number of Layers (Depth): How many layers the model has. LLMs usually use tens of layers.

  • Number of Attention Heads: In transformer models, this is how many separate attention mechanisms are used in each layer. LLMs usually use tens of heads.

  • Dropout: Dropout is something like the percentage of data that is removed (probabilities turn to 0) during training used to prevent overfitting. LLMs usually use between 0-20%.

Configuration of the GPT-2 model:

GPT_CONFIG_124M = {
    "vocab_size": 50257,  // Vocabulary size of the BPE tokenizer
    "context_length": 1024, // Context length
    "emb_dim": 768,       // Embedding dimension
    "n_heads": 12,        // Number of attention heads
    "n_layers": 12,       // Number of layers
    "drop_rate": 0.1,     // Dropout rate: 10%
    "qkv_bias": False     // Query-Key-Value bias
}

Tensors in PyTorch

In PyTorch, a tensor is a fundamental data structure that serves as a multi-dimensional array, generalizing concepts like scalars, vectors, and matrices to potentially higher dimensions. Tensors are the primary way data is represented and manipulated in PyTorch, especially in the context of deep learning and neural networks.

Mathematical Concept of Tensors

  • Scalars: Tensors of rank 0, representing a single number (zero-dimensional). Like: 5

  • Vectors: Tensors of rank 1, representing a one-dimensional array of numbers. Like: [5,1]

  • Matrices: Tensors of rank 2, representing two-dimensional arrays with rows and columns. Like: [[1,3], [5,2]]

  • Higher-Rank Tensors: Tensors of rank 3 or more, representing data in higher dimensions (e.g., 3D tensors for color images).

Tensors as Data Containers

From a computational perspective, tensors act as containers for multi-dimensional data, where each dimension can represent different features or aspects of the data. This makes tensors highly suitable for handling complex datasets in machine learning tasks.

PyTorch Tensors vs. NumPy Arrays

While PyTorch tensors are similar to NumPy arrays in their ability to store and manipulate numerical data, they offer additional functionalities crucial for deep learning:

  • Automatic Differentiation: PyTorch tensors support automatic calculation of gradients (autograd), which simplifies the process of computing derivatives required for training neural networks.

  • GPU Acceleration: Tensors in PyTorch can be moved to and computed on GPUs, significantly speeding up large-scale computations.

Creating Tensors in PyTorch

You can create tensors using the torch.tensor function:

pythonCopy codeimport torch

# Scalar (0D tensor)
tensor0d = torch.tensor(1)

# Vector (1D tensor)
tensor1d = torch.tensor([1, 2, 3])

# Matrix (2D tensor)
tensor2d = torch.tensor([[1, 2],
                         [3, 4]])

# 3D Tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]],
                         [[5, 6], [7, 8]]])

Tensor Data Types

PyTorch tensors can store data of various types, such as integers and floating-point numbers.

You can check a tensor's data type using the .dtype attribute:

tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype)  # Output: torch.int64
  • Tensors created from Python integers are of type torch.int64.

  • Tensors created from Python floats are of type torch.float32.

To change a tensor's data type, use the .to() method:

float_tensor = tensor1d.to(torch.float32)
print(float_tensor.dtype)  # Output: torch.float32

Common Tensor Operations

PyTorch provides a variety of operations to manipulate tensors:

  • Accessing Shape: Use .shape to get the dimensions of a tensor.

    print(tensor2d.shape)  # Output: torch.Size([2, 2])
  • Reshaping Tensors: Use .reshape() or .view() to change the shape.

    reshaped = tensor2d.reshape(4, 1)
  • Transposing Tensors: Use .T to transpose a 2D tensor.

    transposed = tensor2d.T
  • Matrix Multiplication: Use .matmul() or the @ operator.

    result = tensor2d @ tensor2d.T

Importance in Deep Learning

Tensors are essential in PyTorch for building and training neural networks:

  • They store input data, weights, and biases.

  • They facilitate operations required for forward and backward passes in training algorithms.

  • With autograd, tensors enable automatic computation of gradients, streamlining the optimization process.

Automatic Differentiation

Automatic differentiation (AD) is a computational technique used to evaluate the derivatives (gradients) of functions efficiently and accurately. In the context of neural networks, AD enables the calculation of gradients required for optimization algorithms like gradient descent. PyTorch provides an automatic differentiation engine called autograd that simplifies this process.

Mathematical Explanation of Automatic Differentiation

1. The Chain Rule

At the heart of automatic differentiation is the chain rule from calculus. The chain rule states that if you have a composition of functions, the derivative of the composite function is the product of the derivatives of the composed functions.

Mathematically, if y=f(u) and u=g(x), then the derivative of y with respect to x is:

2. Computational Graph

In AD, computations are represented as nodes in a computational graph, where each node corresponds to an operation or a variable. By traversing this graph, we can compute derivatives efficiently.

  1. Example

Let's consider a simple function:

Where:

  • σ(z) is the sigmoid function.

  • y=1.0 is the target label.

  • L is the loss.

We want to compute the gradient of the loss L with respect to the weight w and bias b.

4. Computing Gradients Manually

5. Numerical Calculation

Implementing Automatic Differentiation in PyTorch

Now, let's see how PyTorch automates this process.

pythonCopy codeimport torch
import torch.nn.functional as F

# Define input and target
x = torch.tensor([1.1])
y = torch.tensor([1.0])

# Initialize weights with requires_grad=True to track computations
w = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# Forward pass
z = x * w + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

# Backward pass
loss.backward()

# Gradients
print("Gradient w.r.t w:", w.grad)
print("Gradient w.r.t b:", b.grad)

Output:

cssCopy codeGradient w.r.t w: tensor([-0.0898])
Gradient w.r.t b: tensor([-0.0817])

Backpropagation in Bigger Neural Networks

1.Extending to Multilayer Networks

In larger neural networks with multiple layers, the process of computing gradients becomes more complex due to the increased number of parameters and operations. However, the fundamental principles remain the same:

  • Forward Pass: Compute the output of the network by passing inputs through each layer.

  • Compute Loss: Evaluate the loss function using the network's output and the target labels.

  • Backward Pass (Backpropagation): Compute the gradients of the loss with respect to each parameter in the network by applying the chain rule recursively from the output layer back to the input layer.

2. Backpropagation Algorithm

  • Step 1: Initialize the network parameters (weights and biases).

  • Step 2: For each training example, perform a forward pass to compute the outputs.

  • Step 3: Compute the loss.

  • Step 4: Compute the gradients of the loss with respect to each parameter using the chain rule.

  • Step 5: Update the parameters using an optimization algorithm (e.g., gradient descent).

3. Mathematical Representation

Consider a simple neural network with one hidden layer:

4. PyTorch Implementation

PyTorch simplifies this process with its autograd engine.

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)  # Input layer to hidden layer
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(5, 1)   # Hidden layer to output layer
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        h = self.relu(self.fc1(x))
        y_hat = self.sigmoid(self.fc2(h))
        return y_hat

# Instantiate the network
net = SimpleNet()

# Define loss function and optimizer
criterion = nn.BCELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Sample data
inputs = torch.randn(1, 10)
labels = torch.tensor([1.0])

# Training loop
optimizer.zero_grad()          # Clear gradients
outputs = net(inputs)          # Forward pass
loss = criterion(outputs, labels)  # Compute loss
loss.backward()                # Backward pass (compute gradients)
optimizer.step()               # Update parameters

# Accessing gradients
for name, param in net.named_parameters():
    if param.requires_grad:
        print(f"Gradient of {name}: {param.grad}")

In this code:

  • Forward Pass: Computes the outputs of the network.

  • Backward Pass: loss.backward() computes the gradients of the loss with respect to all parameters.

  • Parameter Update: optimizer.step() updates the parameters based on the computed gradients.

5. Understanding Backward Pass

During the backward pass:

  • PyTorch traverses the computational graph in reverse order.

  • For each operation, it applies the chain rule to compute gradients.

  • Gradients are accumulated in the .grad attribute of each parameter tensor.

6. Advantages of Automatic Differentiation

  • Efficiency: Avoids redundant calculations by reusing intermediate results.

  • Accuracy: Provides exact derivatives up to machine precision.

  • Ease of Use: Eliminates manual computation of derivatives.

PreviousLLM TrainingNext1. Tokenizing

Last updated 8 months ago

✍️