Liu Yu

DfounderLiu

Reviews on Knowledge Graph Research

NETWORK-COMPUTATION IN NEURAL SYSTEMS(NCNS)

With the arrival of the big data era, a broad range of questions like how to rapidly retrieve useful knowledge from massive data and describe the concepts and relations of the objective world, as well as their relationships, in a structural manner, have gained increasing attention ...

Research on Random Template Anti-crawler Strategy for MTV Development Pattern

IEEE Access

Web crawler refers to the program or script that automatically captures the Internet information. Web crawler can usually be divided into regular crawler and irregular crawler. For regular crawler, it will obtain data according to the website data capture protocol. While the irregular crawler, also known as malicious crawler, is mainly used to illegally steal data ...

Review of the Classification of Massive Chinese Texts Based on Spark

EITCE 2018

As the Internet develops rapidly, the number of texts is also growing rapidly. Whether it is the content of online emails exchanged by people, or the online novels and other literary contents, or news reports, personal blogs, Weibo or comments, they are constantly increasing the amount of text at all times. However, most of the data is not classified or processed, which causes a lot of spam, junk information, meaningless articles or advertisements. Their production not only consumes a lot of Internet resources, but also affects users' online experience and reduces the users' work and study efficiency. Therefore, it is vital accurately classify a large amount of text, judge its nature according to the classification result, and carry out targeted treatment. The classification of massive texts based on Spark framework is reviewed in this paper.

Research on General Programming Environment Technology Based on Web

EITCE 2018

As the MOOC develops, many users have begun to gradually like and get used to programming learning on the MOOC platform, so virtual online experiments have become a hot spot in the online programming education industry, which is also known as online IDE or online programming environment. Compared with the traditional programming environment, the online programming environment does not require a complicated establishment of local environment, making it easy to be used. At the same time, the online programming environment can also be integrated into the MOOC for learning, and can also be combined with OJ for evaluation. However, the existing online programming environment usually only supports few programming languages, such as C language and Java language. Even some online platforms only have the online compiling function with one language. Based on the above phenomena, this paper will propose a general online programming solution. And based on the idea, a platform that supports multi-language online compiling function is built through simple code writing and the construction of server environment.

Matlab Programming Environment Based on Web

ITOEC 2018

Matlab is a mathematics software, which is the industry leader in numerical calculation, but this software takes up a lot of system memory and is extremely cumbersome to install. As various types of modeling rise and the intensity and frequency of discipline penetration are deepened and accelerate gradually, Matlab has gradually entered more people's field of vision. In order to simplify the programming environment and improve the convenience ...

The design of the majority voting algorithm based on search engine for the text copyright detection crawler

PGMEE 2018

Web crawler refers to the program or script that automatically captures the Internet information. Web crawler can usually be divided into regular crawler and irregular crawler. For regular crawler, it will obtain data according to the website data capture protocol. While the irregular crawler, also known as malicious crawler, is mainly used to illegally steal data ...

基于遗传算法的教室资源合理分配研究

计算机与网络

教室资源的高效利用提高了教学资源的利用率和教师的授课效率。随着"走班制"教学模式的兴起,传统的以班级为单位的固定式教学方式已经落伍,人工安排教室和课程已经无法满足新教学模式的需要。针对教室资源的分配问题,提出了基于遗传算法的教室分配方法。从学生因素、老师因素、教室因素、时间因素以及课程因素等约束上入手,在特定条件下对不同的目标进行优化组合,合理地协调硬约束条件和软约束条件。

基于Petri网的航班延误问题分析

信息与电脑(理论版)

航班延误是指航班降落时间(航班实际到港挡轮挡时间)比计划降落时间(航班时刻表上的时间)延迟15分钟以上或航班取消的情况。航班延误问题是各个国家、各个航空公司高度重视的事情,因为它不仅仅会影响用户的出行安排、出行心情,也会影响到航空公司的效率,甚至是一个国家的名声。笔者将通过对数据的基础采集、系统整理进行层次分析,搭建了有色Petri模型以及时间Petri网模型,通过链式分析、波及分析等,制作出模型优化解决方案,综合性分析我国航班延误问题是否在全球范围内最严重以及造成航班延误的原因,最后提出相关的改进措施。

基于决策树算法的爬虫识别技术

软件

网络爬虫指的是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本~([1])。但是实际上爬虫还分为正规爬虫和非正规爬虫,所谓的正规爬虫就是通过正规途径和手段获取网站信息和数据,非正规爬虫又称为恶意爬虫,主要用于非法盗窃数据,给网站服务器增加负担以及偷窥一些敏感信息数据等。本文将会基于决策树算法设计一种新爬虫检测技术,并根据爬虫检测结果提供一些反爬机制,对恶意爬虫进行进行评屏蔽等,进而实现对网站和服务器以及部分数据,信息的保护,降低互联网资源重叠现象。

基于Scrapy的深层网络爬虫研究

软件

随着大数据时代的到来,网络爬虫已经成为很普遍的技术,无论是做项目、科研、创业或者写论文,获得大量数据并且对数据进行分析都是必不可少的。但是目前存在深层网(Deep Web)的数据量是表层网(Surface Web)数据量的数百倍,乃至上千倍。传统的爬虫对表层网数据进行获取已经无法满足我们的需求,同时因为深层网数据通常没有各种复杂的标签结构等,使得其本身更加清晰,干净,故而我们深入研究深层网络爬虫是非常有必要的。本文将会通过Python的Scrapy爬虫框架,对深层网络爬虫进行研究,通过分析深层网络特点制定合适的Scrapy爬虫策略,最后通过实际操作,对指定的爬虫策略进行验证。

基于因子分析法对长吉珲高铁带动旅游经济发展进行研究

信息与电脑(理论版)

影响游客选择出游目的地的因素,包括交通因素、食宿因素、消费、娱乐设施、亲友意见、空间距离、旅游氛围、景观口碑8个。通过对每个因素评分和Matlab计算综合评定KMO数值为0.65,大于0.5,说明各指标间具备相关性,适合使用因子分析法进行分析,进而使用因子分析法对长吉珲高铁带动旅游经济发展进行相关研究。