Back to Search
Start Over
TPII: tracking personally identifiable information via user behaviors in HTTP traffic
- Source :
- Frontiers of Computer Science. 14
- Publication Year :
- 2019
- Publisher :
- Springer Science and Business Media LLC, 2019.
-
Abstract
- It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.
- Subjects :
- General Computer Science
Computer science
business.industry
Local area network
020207 software engineering
Cloud computing
Application service provider
02 engineering and technology
Theoretical Computer Science
World Wide Web
Tree (data structure)
Campus network
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Haystack
business
Personally identifiable information
Decision model
Subjects
Details
- ISSN :
- 20952236 and 20952228
- Volume :
- 14
- Database :
- OpenAIRE
- Journal :
- Frontiers of Computer Science
- Accession number :
- edsair.doi...........1448fe0c131ce6cba1e7b9704d7ffc96
- Full Text :
- https://doi.org/10.1007/s11704-018-7451-z