Data Cleaning Framework for Healthcare Applications

Bonfring International Journal of Research in Communication Engineering Volume 1, Issue Inaugural Special Issue, 2011
of 3
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
   Bonfring International Journal of Research in Communication Engineering, Vol. 1, Special Issue, December 2011 32 ISSN 2250  –   110X | © 2011 Bonfring Abstract---   RFID technologies are used in various applications ranging from traditional applications such as access control, electronic toll collection, e-ID documents to modernizing applications such as asset management, baggage handling, cargo tracking/security, contactless payment and ticketing, supply chain management and healthcare. Of these medical healthcare applications are of more importance because minute errors in it can cost heavy financial and  personal losses. Data captured by RFID reader often has errors including false negatives, false positives and duplicates.  In order to provide reliable data to RFID application, it is necessary to clean the collected data. In this paper we have  suggested physical solutions to solve missed readings, middleware solutions to overcome anomalies found within the reader and finally rule based solutions to correct various anomalies already exist in the database. Drawbacks of the methodologies are also discussed and some solutions are  suggested. With the aid of the planned data cleaning technique we can bring down the health care costs, optimize business  processes, streamline patient identification processes and improve patient safety. The security and privacy issues of  RFID , and their solutions are also discussed. Keywords  ---  RFID, Data Observations, Granularity, Antenna I.   I  NTRODUCTION  ADIO FREQUENCY IDENTIFICATION (RFID) is a means to identify and track objects using radio frequency transmission .It is far more advanced than traditional barcodes. These systems are composed of three components  –   an interrogator (reader), passive tag(s), and a host as shown in Figure 1.1. Among the types of tags - passive, active and semi  passive, passive tags have much demand due to their least system cost and long life. The tag is composed of an antenna coil and a silicon chip that includes basic modulation circuitry and non-volatile memory. The tag is energized by a time-varying electromagnetic radio frequency (RF) wave that is transmitted by the reader. When the RF field passes through an antenna coil, an AC voltage is generated across the coil which is rectified to supply power to the tag. The tag using the mechanism of backscattering transmits its ID to the reader. By detecting the backscattering signal, the reader demodulates the received signal to retrieve tag’s ID. A reader can also be fitted with an additional interface to transmit its stored data to a  A. Anne Leema, Karpagam University, Coimbatore, E-mail:  Dr. M. Hemalatha, Karpagam University, Coimbatore, E-mail:  computer or a programmable logic controller. The RFID system allows great freedom of movement. RFID includes hardware, middleware and software components. This technology is widely used in diverse application such as supply chain automation, Asset tracking, Medical/Health Care applications, People tracking, Manufacturing, Retail, Warehouses, and Livestock Timing. II.   S YSTEM A RCHITECTURE  The System Architecture of an RFID system contains four important components (Chawathe et al., 2004): an RFID Tag, an RFID Reader, the RFID Middleware and the Database Storage depicted in figure 2. The RFID Tag is the simplest, lowest level component of the RFID System Architecture. These tags come in three types - Passive, Semi-Passive and Active. The Passive Tags are the most error-prone, but due to not needing a battery, also the most cost-effective and long-lasting. Electromagnetic pulses emitted from the Readers allow the Passive Tag enough energy to transmit its identification back. In comparison, the Semi-Passive Tag has a  battery. However, it is only utilised to extend the readability scan resulting in a shorter life-span but increased observation integrity. The final tag is the Active Tag which utilises a  battery to, not only extend its range, but also to transmit its identification number. From its heavy reliance of the battery, the Active Tag has the highest cost and shortest life-span of all the tags currently available (Chawathe et al., 2004). The RFID Readers are the machines used to record the Tag identifiers and attach a timestamp of the observation. Next the transmitted data coming through antenna (RF-wave) are being recognized by RFID-based system PC. It acts as a middleware communication gateway among items, reader and system database; It is also called as the Savant or Edge Systems, where the raw RFID readings are cleaned and filtered to make the data more application-friendly. It receives information  passed into it from the Readers and then applies techniques such as Anti-Collision and Smoothing Algorithms to correct Data Cleaning Framework for Healthcare Applications A. Anne Leema and Dr. M. Hemalatha   R   Bonfring International Journal of Research in Communication Engineering, Vol. 1, Special Issue, December 2011 33 ISSN 2250  –   110X | © 2011 Bonfring simple missing and duplicate anomalies (Jeffery et al., 2006; Shih et al., 2006). The filtrated observational records, including the Tag and Reader Identifiers along with the Timestamp the reading was taken, are then passed onto the Database Storage. And at the end it filters out and store data in RFID-databases for checking the data fault and relevant operation. Figure 2: Architecture of RFID The final destination of all the observational records is to  be placed within a collection of readings taken from all connected RFID Readers. This component is known as the Database Storage and is used to hold all information which is streamed from the Readers. In most cases, due to the massive amount of interrogation undertaken to read all Tags at all times, this can result in massive flood s of data, for example, 7TB of data generated daily (Schuman, 2005). The software architecture required to collect, filter, organize and answer on-line queries. Having all information stored in a central database also allows for higher level processes such as data cleaning, data mining and analytical evaluations.  A.    Format of data Observations The format of the data recorded in the database after a tag has been read consists of three primary pieces of information: the Electronic Product Code, the Reader Identifier the Timestamp which contains the reading time. The Electronic Product Code (EPC) is a unique identification number introduced by the Auto-ID Center and given to each RFID Tag which is made up of a 96 bit, 25 character-long code containing numbers and letters. The number itself, as seen in Figure 3, is made up of a Header for 8 bits, EPC Manager for 28 bits, Object Class for 24 bits and Serial Number for 36 bits.  B.   Characteristics of RFID Data RFID applications generate more data when compared to  barcode applications. It generates ten to hundred times the data volume of typical bar code applications. The resulting database is larger in size in RFID. Managing such high volume of data poses the challenges to applications as well as  back-end databases. Even though the process of filtering is done at the edge, a significant portion of incorrect data is still inserted into the database. This incorrect data cause inconsistency in database and results with the following characteristics: C.   Temporal and Dynamic RFID Applications are mostly temporal oriented. All events are associated with timestamps when the events happen. Observations may represent different semantics including i) location changes ii) aggregation iii) start or end operations iv) occurrences of new events.  D.    Inaccuracy of Data Sometimes non-existing tag may be incorrectly read (False  positive reads) or reader may miss tags which where in its vicinity (False negatives). Also reader may read a same tag more than once. Certainly such erroneous data has to be semantically filtered.  E.   Continuous Streaming  Numbers of RFID tags are proportional to number of items  being tracked and numbers of readers are proportional to strategic areas. In typical scenarios, tagged object stay in place for longer duration and readers records their existence on continuous basis in periodic intervals. A tuple is inserted into database each time a tag is read by the reader. Such simple observations keeps pilling in database and produce redundancy. This continuous streaming data must be filtered. Some kind of compression is also required to reduce data without loss of information.  F.   Granularity This factor depends on the applications for which RFID system is being implemented. If the system is deployed at airport the granularity will be the unit of luggage/baggage. G.    RFID in Healthcare RFID in Medical/Health care applications are of more importance because minute errors in it can cost heavy financial and personal losses. For hospitals and healthcare systems, increasing the operational efficiency is the primary target. It is a tough task to keep up the effectiveness and monitor each and every patient [6]. However, utilization of RFID (Radio Frequency Identification) technology in addition to reducing the health care costs facilitates automating and streamlining patient identification processes in hospitals and use of mobile devices like PDA, smart phones, design of health care management systems etc[5]. RFID technology can  be formed as an essential part of healthcare. It plays an vital role in all the sub-domains of the applications in health care applications. Among them, RFID technology dominates in tracking the patients under treatment. But, there may be errors and redundancies in the obtained RFID data from all the readers. The effectiveness in cleaning the RFID data in healthcare sectors remains a concern, even though a number of   Bonfring International Journal of Research in Communication Engineering, Vol. 1, Special Issue, December 2011 34 ISSN 2250  –   110X | © 2011 Bonfring literary works are available.  H.   Significance of Data Quality Data quality has become increasingly important to many firms as they build data warehouses and focus more on customer relationship management. For health care organizations, data is central to both effective health care and to financial survival. Data quality was concerned with accuracy, precision and timeliness. It can best be defined as “fitness for use”. [18] Recently the Institute of Medicine shocked the public with a report that 98,000 people die every year due to medical errors [8]. Some of the errors are the result of missing or bad information about drugs, orders and treatments. Poor data quality has adverse effects at the operational, tactical and strategic levels of an organization.  I.    RFID Anomalies The following table and figure depicts the three types of errors. Wrong Readings, also known as False Positives refer to observations found in the data storage of tag which were not physically present in the location or time. These false readings may be produced when tags outside the normal Reader range are captured or where there is a problem with the environmental setup. Duplicate Readings refer to an RFID tag which has been scanned twice in the database as opposed to just one scanning. Like the Wrong Readings, Duplicate anomalies also fall into the category of False Positive observations as they record the data which do not accurately depict reality. This may occur in several situations such as the situation in which there is more than one Reader covering an area and a tag happens to pass within overlapped region. . Other duplicate reading situations occur when a scanned item stays in the reader range for a long period of time. Missed Readings, also known as False Negative observations, refers to tagged objects not being scanned when, in actuality, they were present. Missed read is due to the object outside the scanning range. Recorded value Tag EPC TimeStamp Reader ID T1 13/11/2011 14:31:05 R1 T2 13/11/2011 14:31:05 R3 T3 13/11/2011 14:31:05 R3 T3 13/11/2011 14:31:05 R4 T3 13/11/2011 14:31:05 R5 Value to be Recorded T1 13/11/2011 14:31:05 R1 T2 13/11/2011 14:31:05 R3 T3 13/11/2011 14:31:05 R3 T4 13/11/2011 14:31:05 R5 1.   Service Rate  : Number of tags identified in a second can be calculated as, 2.    Arrival Rate: Number of tags arriving on an average  per second can be estimated as, 3.   To find if the environment is in steady state or explosive state System is said to be stable only if    . R  EFERENCES   [1]   Belal Chowdhury and Rajiv Khosla, RFID-based Hospital Real-time Patient Management System, ICIS, pp.363-368, In proceedings of 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007), 2007. [2]   Sudarshan S. Chawathe, Venkat Krishnamurthy, Sridhar Ramachandran, and Sanjay Sarma, Managing RFID Data , In Proceedings of the 30th VLDB Conference, pp.1189-1195, 2004. [3]   Angela M. Wicks, John K. Visich and Suhong Li, Radio Frequency Identification Applications in Hospital Environments , heldref  publication, Vol. 84, No.3, pp.3-9, 2006. [4]   Hector Gonzalez, Jiawei Han, Hong Cheng, Xiaolei Li, Diego Klabjan, Tianyi Wu, Modeling Massive RFID Data Sets: A Gateway-Based Movement Graph Approach, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 1, pp. 90-104, Jan. 2010.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks