Author: "Michael L. Nelson" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

1. It's All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth

Author: Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, and Michael L. Nelson
Published: 2021
Full Text: View/download PDF

2. Extending Chromium: Memento-Aware Browser

Author: Abigail Mabe, Michael L. Nelson, and Michele C. Weigle
Published: 2021
Full Text: View/download PDF

3. Modeling Updates of Scholarly Webpages Using Archived Data

Author: C. Lee Giles, Yasith Jayawardana, Alexander C. Nwala, Michael L. Nelson, Sampath Jayarathna, Gavindya Jayawardena, and Jian Wu
Subjects: FOS: Computer and information sciences, Information retrieval, business.industry, Computer science, Big data, Computer Science - Digital Libraries, 02 engineering and technology, Crawling, Computer Science - Information Retrieval, Data modeling, 020204 information systems, Web page, 0202 electrical engineering, electronic engineering, information engineering, Digital Libraries (cs.DL), Fraction (mathematics), The Internet, business, Web crawler, Baseline (configuration management), Information Retrieval (cs.IR)
Abstract: The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors' homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency ($\lambda$) values. Our evaluation shows that $\lambda$ values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions., Comment: 12 pages, 2 appendix pages, 18 figures, to be published in Proceedings of IEEE Big Data 2020 - 5th Computational Archival Science (CAS) Workshop
Published: 2020
Full Text: View/download PDF

4. Algorithms on Compressed Time-Evolving Graphs

Author: Sridhar Radhakrishnan, Michael L. Nelson, and Chandra N. Sekharan
Subjects: Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Algorithm, Graph, MathematicsofComputing_DISCRETEMATHEMATICS
Abstract: Time-evolving graphs are structures that encapsulate how a graph changes over time. Thus, we not only have to deal with large graphs consisting of nodes and edges in the billions, but we must also keep track of when these edges activate and deactivate over long lifetimes. In this age of big historical data, we must make use of efficient time-evolving graph compressions, or we will find ourselves quickly out of main memory. These time-evolving graph compressions must not only be space efficient, but must also facilitate fast querying directly on the compressed graph. In this paper, define several novel time-evolving graph problems and develop algorithms to solve them directly on various, massive, synthetic and real-world time-evolving graphs compressed using our technique. Our experiments provide details of the compressed graph sizes, algorithm run times, and other metrics.
Published: 2019
Full Text: View/download PDF

5. Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining

Author: Chandra N. Sekharan, Sridhar Radhakrishnan, and Michael L. Nelson
Subjects: Binary tree, business.industry, Computer science, Big data, computer.software_genre, Matrix multiplication, Compression (functional analysis), Key (cryptography), Multiplication, Adjacency matrix, Data mining, business, computer, Sparse matrix
Abstract: Billion-scale Boolean matrices in the era of big data occupy storage that is measured in 100's of petabytes to zetabytes. The fundamental operation on these matrices for data mining involves multiplication which suffers a significant slow-down as the required data cannot fit in most main memories. In this paper, we propose new algorithms to perform Matrix-Vector and Matrix-Matrix operations directly on compressed Boolean matrices using innovative techniques extended from our previous work on compression. Our extension involves the development of a row-by-row differential compression technique which reduces the overall space requirement and the number of matrix operations. We have provided extensive empirical results on billion-scale Boolean matrices that are Boolean adjacency matrices of web graphs. Our work has significant implications on key problems such as page-ranking and itemset mining that use matrix multiplication.
Published: 2019
Full Text: View/download PDF

6. Using Micro-Collections in Social Media to Generate Seeds for Web Archive Collections

Author: Michele C. Weigle, Michael L. Nelson, and Alexander C. Nwala
Subjects: FOS: Computer and information sciences, Vocabulary, Computer science, Web archiving, media_common.quotation_subject, 05 social sciences, Computer Science - Digital Libraries, 02 engineering and technology, computer.file_format, Computer Science - Information Retrieval, WAR, Identifier, World Wide Web, Subject-matter expert, Web page, 0202 electrical engineering, electronic engineering, information engineering, Digital Libraries (cs.DL), 020201 artificial intelligence & image processing, Social media, 0509 other social sciences, Web resource, 050904 information & library sciences, computer, Information Retrieval (cs.IR), media_common
Abstract: In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events ranging from elections to disease outbreaks. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but it is time consuming to collect these seeds. Two main strategies adopted by curators for discovering seeds include scraping Web (e.g., Google) Search Engine Result Pages (SERPs) and social media (e.g., Twitter) SERPs. In this work, we studied three social media platforms in order to provide insight on the characteristics of seeds generated from different sources. First, we developed a simple vocabulary for describing social media posts across different platforms. Second, we introduced a novel source for generating seeds from URIs in the threaded conversations of social media posts created by single or multiple users. Users on social media sites routinely create and share posts about news events consisting of hand-selected URIs of news stories, tweets, videos, etc. In this work, we call these posts micro-collections, and we consider them as an important source for seeds because the effort taken to create micro-collections is an indication of editorial activity, and a demonstration of domain expertise. Third, we generated 23,112 seed collections with text and hashtag queries from 449,347 social media posts from Reddit, Twitter, and Scoop.it. We collected in total 120,444 URIs from the conventional scraped SERP posts and micro-collections. We characterized the resultant seed collections across multiple dimensions including the distribution of URIs, precision, ages, diversity of webpages, etc..., Comment: This is an extended version of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2019) full paper. Some figures have been enlarged, and appendices of additional figures included
Published: 2019
Full Text: View/download PDF

7. Queryable Compression on Time-Evolving Social Networks with Streaming

Author: Michael L. Nelson, Sridhar Radhakrishnan, and Chandra N. Sekharan
Subjects: Matrix (mathematics), Binary tree, Theoretical computer science, Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Adjacency list, 02 engineering and technology, Data structure, Graph
Abstract: Time-evolving graphs represent a set of individuals (nodes) and their edges (relationships) over time. How these graphs are represented in data structures determines what information is easy to obtain from them. Now that we have such massive social networks with dynamic lifetimes, even basic data structures are too large to fit into main memory. Clearly, this poses a problem to areas such as time-evolving graph pattern analysis. Therefore, it is an interesting field of study to design time-evolving graph compressions that can efficiently answer certain queries about the graph at any given point in time.If a single snapshot of a graph at a moment in time can be considered a 2D matrix, then can we visualize these time-evolving graphs as 3D matrices and then use a novel technique to compress the entire graph over time. Our technique is based on our previous work using compressed binary trees. In this work, we adapt our strategy to compress time-evolving graphs, rather than static ones. We manage to maintain our minimal main memory overhead by not requiring an intermediate structure (e.g. adjacency list) to compress. This compression is queryable, meaning that the data can be read without decompression. It is also streaming, meaning that the data can be changed without decompression. This includes adding/removing edges in individual frames. We test our algorithms on public, anonymized, massive, time-evolving graphs such as Flickr, Yahoo!, and Wikipedia. Our empirical evaluation is based on several parameters including time to compress, size of compressed graph, and time to execute queries. Our compression rates are highly competitive, as we achieve the smallest representation of 4.9GB on our largest dataset which only spans three days yet occupies 21.5GB of space.
Published: 2018
Full Text: View/download PDF

8. Client-Side Reconstruction of Composite Mementos Using ServiceWorker

Author: Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson
Published: 2017
Full Text: View/download PDF

9. WAIL: Collection-Based Personal Web Archiving

Author: John A. Berlin, Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Published: 2017
Full Text: View/download PDF

10. Archival Crawlers and JavaScript: Discover More Stuff but Crawl More Slowly

Author: Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson
Published: 2017
Full Text: View/download PDF

11. Exploiting topological structures for graph compression based on quadtrees

Author: M. Zerrudo, Michael L. Nelson, Sridhar Radhakrishnan, M. Levan, Amlan Chatterjee, and C. Lanham
Subjects: Data processing, Theoretical computer science, business.industry, Computer science, Computation, Big data, computer.software_genre, Data structure, Computer data storage, Quadtree, Data mining, Adjacency matrix, business, Graph property, computer
Abstract: In the age of big data, the need for efficient data processing and computation has been in the forefront of research endeavors. The process of extracting information from huge data sets require novel storage techniques to aid the computing devices to perform necessary computation. With pervasive use of heterogeneous systems and advent of non-traditional computing units like GPUs, with limited memory, it has become relevant to underline the relevance of data storage, especially to utilize such computing devices. Graphs contain a plethora of information, and also can be used to represent data from a broad range of domains; real-world big data sets are effectively represented by graphs. Efficient graph compression is therefore essential for performing computations on large data sets. Quadtrees, generally used to represent images, can be used as an effective technique to perform compression. Using additional topological information that depict certain patterns for the data sets, further improvements can be made to the space complexity of storing graph data. In this paper we describe algorithms that take into consideration the properties of graphs, and perform compression based on quadtrees. The introduced techniques achieve up to 70% compression as compared to adjacency matrix representation; when compared to existing quadtree based compression method, the proposed algorithms achieve an additional 50% improvement. Techniques to both compress data and also perform queries on the compressed data itself are introduced and discussed in detail.
Published: 2016
Full Text: View/download PDF

12. On compressing massive streaming graphs with Quadtrees

Author: Chandra N. Sekharan, Amlan Chatterjee, Sridhar Radhakrishnan, and Michael L. Nelson
Subjects: Data stream, Power graph analysis, Graph bandwidth, Graph database, Theoretical computer science, Computer science, Graph (abstract data type), Quadtree, computer.software_genre, computer, Graph, Data compression
Abstract: Social networks are constantly changing as new members join, existing members leave, and ‘followers’ or ‘friends’ are formed and disappear. The model that captures this constantly changing graph is the streaming graph model. Given a massive graph data stream wherein the number of nodes is in the order of millions and the number of edges is the tens of millions, we propose a simple algorithm to compress this graph without having read in the entire graph into the main memory. Our algorithm uses the quadtree data structure that is implicitly constructed to produce the compressed graph output. As a result of this implicit construction, our algorithm allows for node and edge additions/deletions that directly modifies the output compressed graph. We further develop algorithms to solve edge queries (is there any between two nodes?) and node queries (for a given node, list all its neighbors) that directly operates on the compressed graph. We have performed extensive empirical evaluations of our algorithms using publicly available, large social networks such as LiveJournal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time and memory size required to execute queries. We have also presented extensions to the compression algorithm that we have developed.
Published: 2015
Full Text: View/download PDF

13. When should i make preservation copies of myself?

Author: Charles L. Cartledge and Michael L. Nelson
Published: 2014
Full Text: View/download PDF

14. Not all mementos are created equal: Measuring the impact of missing resources

Author: Michael L. Nelson, Justin F. Brunelle, Mat Kelly, Michele C. Weigle, and Hany M. SalahEldeen
Subjects: Relative value, Measure (data warehouse), Information retrieval, business.industry, Computer science, User perception, Library and Information Sciences, JavaScript, World Wide Web, Resource (project management), Constant (computer programming), The Internet, Web crawler, business, computer, computer.programming_language
Abstract: Web archives do not always capture every resource on every page that they attempt to archive. This results in archived pages missing a portion of their embedded resources. These embedded resources have varying historic, utility, and importance values. The proportion of missing embedded resources does not provide an accurate measure of their impact on the Web page; some embedded resources are more important to the utility of a page than others. We propose a method to measure the relative value of embedded resources and assign a damage rating to archived pages as a way to evaluate archival success. In this paper, we show that Web users’ perceptions of damage are not accurately estimated by the proportion of missing embedded resources. In fact, the proportion of missing embedded resources is a less accurate estimate of resource damage than a random selection. We propose a damage rating algorithm that provides closer alignment to Web user perception, providing an overall improved agreement with users on memento damage by 17 % and an improvement by 51 % if the mementos have a damage rating delta $$>$$ 0.30. We use our algorithm to measure damage in the Internet Archive, showing that it is getting better at mitigating damage over time (going from a damage rating of 0.16 in 1998 to 0.13 in 2013). However, we show that a greater number of important embedded resources (2.05 per memento on average) are missing over time. Alternatively, the damage in WebCite is increasing over time (going from 0.375 in 2007 to 0.475 in 2014), while the missing embedded resources remain constant (13 % of the resources are missing on average). Finally, we investigate the impact of JavaScript on the damage of the archives, showing that a crawler that can archive JavaScript-dependent representations will reduce memento damage by 13.5 %.
Published: 2014
Full Text: View/download PDF

15. An experimental comparison of hierarchical and subsumption software architectures for control of an autonomous underwater vehicle

Author: Ronald B. Byrnes, Robert B. McGhee, D.L. MacPherson, Se-Hung Kwak, and Michael L. Nelson
Subjects: Programming language, business.industry, Computer science, Backward chaining, computer.software_genre, Hierarchical database model, Prolog, Software, Control theory, High-level programming language, Forward chaining, Software design, business, computer, computer.programming_language
Abstract: A three-level hybrid architecture is used to model both a hierarchical and a subsumption controller for an autonomous underwater vehicle (AUV). The hierarchical model uses a backward chaining language (PROLOG) while the subsumption model uses a forward chaining language (CLIPS). The details of the backward chaining hierarchical implementation of the strategic level of mission control are first presented. This is followed by a similar description of the functionally equivalent subsumption controller. Experimental results and the advantages and disadvantages of each approach are discussed. CLIPS and PROLOG ran virtually even in terms of execution time. Also, the total run time of the experiment was dominated by the graphical simulator. A repeat of the experiment in which the controllers were decoupled from the simulator resulted in execution times between three and four seconds. This reinforces the view of PROLOG and CLIPS as viable language alternatives for the mission control of complex systems. However, PROLOG is the more concise of the two languages and is easier to read. >
Published: 2003
Full Text: View/download PDF

16. Evaluating the Plus Program: a hybrid on-line/conventional classroom approach

Author: Michael L. Nelson and D. Rice
Subjects: Blended learning, Class (computer programming), Multimedia, Computer science, ComputingMilieux_COMPUTERSANDEDUCATION, Mathematics education, Educational technology, Just in Time Teaching, Open learning, Line (text file), computer.software_genre, Experiential learning, computer
Abstract: The Plus Program combines on-line and independent learning with the conventional classroom approach. The typical Plus Program class meets every other week during the semester. This allows for the maximization of both classroom and instructor resources, and also allows both the student and the instructor to spend fifty percent less time in the classroom. It is ideal for those students and subjects who are not prime candidates for on-line learning, but who also do not require the full-time conventional classroom approach. The Plus Program is now in its second semester, and has been generally successful. Primary challenges include identifying students that will most benefit from and be successful with this approach, identifying courses that can best be adapted to this approach, and identifying and/or developing appropriate resources for the student to use outside of the classroom.
Published: 2002
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"Michael L. Nelson"'

1. It's All About The Cards: Sharing on Social Media Encouraged HTML Metadata Growth

2. Extending Chromium: Memento-Aware Browser

3. Modeling Updates of Scholarly Webpages Using Archived Data

4. Algorithms on Compressed Time-Evolving Graphs

5. Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining

6. Using Micro-Collections in Social Media to Generate Seeds for Web Archive Collections

7. Queryable Compression on Time-Evolving Social Networks with Streaming

8. Client-Side Reconstruction of Composite Mementos Using ServiceWorker

9. WAIL: Collection-Based Personal Web Archiving

10. Archival Crawlers and JavaScript: Discover More Stuff but Crawl More Slowly

11. Exploiting topological structures for graph compression based on quadtrees

12. On compressing massive streaming graphs with Quadtrees

13. When should i make preservation copies of myself?

14. Not all mementos are created equal: Measuring the impact of missing resources

15. An experimental comparison of hierarchical and subsumption software architectures for control of an autonomous underwater vehicle

16. Evaluating the Plus Program: a hybrid on-line/conventional classroom approach

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

16 results on '"Michael L. Nelson"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources