The paper investigates matrix-free high-order implementation of finite element discretization with p-multigrid preconditioning for the compressible Neo-Hookean hyperelasticity problem at finite strain on unstructured 3D meshes in parallel. We consider two formulations for the matrix-free action of the Jacobian in Neo-Hookean hyperelasticity: (i) working in the reference configuration to define the second Piola-Kirchhoff tensor as a function of the Green-Lagrange strain S(E) (or equivalently, the right Cauchy-Green tensor C = I+2E), and (ii) working in the current configuration to define the Kirchhoff stress in terms of the left Cauchy-Green tensor τ(b). The proposed efficient algorithm utilizes the Portable, Extensible Toolkit for Scientific Computation (PETSc), along with the libCEED library for efficient compiler optimized tensor-product-basis computation to demonstrate an efficient nonlinear solution algorithm. We utilize p-multigrid preconditioning on the high-order problem with algebraic multigrid (AMG) on the assembled linear Q1 coarse grid operator. In contrast to classical geometric multigrid, also known as h-multigrid, each level in p-multigrid is related to a different approximation polynomial order p, instead of the element size h. A Chebyshev polynomial smoother is used on each multigrid level. AMG is then applied to the assembled Q1 (trilinear hexahedral elements), which allows low storage that can be efficiently used to accelerate convergence to a solution. For the compressible Neo-Hookean hyperelastic constitutive model we exploit the stored energy density function to compute the stored elastic energy density of the Neo-Hookean material as it relates to the deformation gradient. Based on our formulation, we consider four different algorithms each with different storage strategies. Algorithms 1 and 3 are implemented in the reference and current configurations respectively and store ∇Xξ, det(∇ξX), and ∇Xu. Algorithm 2 in the reference configuration stores, ∇Xξ, det(∇ξX), ∇Xu, C−1, and λ log (J). Algorithm 4, in the current configuration, stores det(∇ξX), ∇xξ, τ, and μ – λ log(J). x refers to the current coordinates, X to the reference coordinates, and ξ to the natural coordinates. We perform 3D bending simulations of a tube composed of aluminum (modulus of elasticity E = 69 GPa, Poisson’s ratio v = 0.3) using unstructured meshes and polynomials of order p = 1 through p = 4 under mesh refinement. We explore accuracy-time-cost tradeoffs for the prediction of strain energy across the range of polynomial degrees and Jacobian representations. In all cases, Algorithm 4 using the current configuration formulation outperforms the other three algorithms and requires less storage. Similar simulations for large deformation compressible Neo-Hookean hyperelasticity as applied to the same aluminum material are conducted with ABAQUS, a commercial finite element software package which is a state-of-the-art engineering software package for finite element simulations involving large deformation. The best results from the proposed implementations and ABAQUS are compared in the case of p = 2 on an Intel system with @2.4 GHz and 128 GB RAM. Algorithm 4 outperforms ABAQUS for polynomial degree p = 2.