Zaɓi Harshe

PointAR: Ingantacciyar Kiyasin Haske don Augmented Reality na Wayar Hannu

Bincike na PointAR, wani sabon tsari mai inganci don kiyasin haske mai bambancin sarari akan na'urorin hannu ta amfani da gajimaren maki da harmonics na siffar zobe.
rgbcw.cn | PDF Size: 4.5 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - PointAR: Ingantacciyar Kiyasin Haske don Augmented Reality na Wayar Hannu

1. Gabatarwa

Wannan takarda tana magance kalubalen da ke tattare da kiyasin haske don Augmented Reality (AR) na Wayar Hannu a cikin wuraren cikin gida. Zane na abubuwa na zahiri na buƙatar ingantaccen bayanin haske a takamaiman wurin da aka sanya abu. Wayoyin hannu na kasuwanci ba su da kyamarori na 360°, wanda ke sa kama kai tsaye ba zai yiwu ba. Aikin ya ƙara rikitarwa ta hanyar manyan ƙuntatawa guda uku: 1) Kiyasin haske a wurin zane wanda ya bambanta da mahangar kyamara, 2) Ƙaddara hasken da ke waje da ƙaramin filin gani (FoV) na kyamara, da kuma 3) Yin kiyasin cikin sauri don dacewa da ƙimar firam ɗin zane.

Hanyoyin da suka danganci koyo na yanzu [12,13,25] galibi suna da girma, suna da rikitarwa a lissafi, kuma ba su dace da turawa akan wayar hannu ba. An gabatar da PointAR a matsayin madadin mai inganci, wanda ke raba matsalar zuwa canjin dubawa mai sanin tsarin jiki da kuma na'urar koyo dangane da gajimaren maki, wanda ke rage rikitarwa sosai yayin da yake kiyaye daidaito.

2. Hanyoyin Bincike

2.1. Tsarin Matsala & Duba Tsarin Aiki

Manufar PointAR ita ce a ƙididdige ma'auni na 2nd order Spherical Harmonics (SH) waɗanda ke wakiltar hasken da ke faruwa a wurin da aka yi niyya a cikin hoton RGB-D guda ɗaya. Abin shiga shine firam ɗin RGB-D guda ɗaya da kuma daidaitaccen pixel 2D. Abin da ake fitarwa shine jeri na ma'auni na SH (misali, ma'auni 27 don 2nd order RGB). Tsarin aikin ya ƙunshi manyan matakai guda biyu:

  1. Canjin Duba Mai Sanin Tsarin Jiki: Yana canza gajimaren maki mai mayar da hankali ga kyamara zuwa wakilcin da ke mayar da hankali ga wurin da aka yi niyya.
  2. Koyo Dangane da Gajimaren Maki: Cibiyar sadarwar jijiya tana sarrafa gajimaren makin da aka canza don yin hasashen ma'auni na SH.

2.2. Canjin Duba Mai Sanin Tsarin Jiki

Maimakon amfani da cibiyar sadarwar jijiya don koyo da alaƙar sarari a ɓoye (kamar yadda yake a [12,13]), PointAR yana amfani da ƙirar lissafi a bayyane. An ba da sigogin ciki na kyamara da taswirar zurfin, ana samar da gajimaren maki na 3D. Don pixel da aka yi niyya $(u, v)$, ana ƙididdige wurinsa na 3D $P_{target}$. Duk gajimaren makin sai a canza shi ta yadda $P_{target}$ ya zama sabon asali. Wannan matakin yana magance kalubalen bambancin sarari kai tsaye ta hanyar daidaita tsarin daidaitawa tare da wurin zane, yana ba da shigar da ke daidaita tsarin jiki don na'urar koyo.

2.3. Koyo Dangane da Gajimaren Maki

An yi wahayi daga haɗin Monte Carlo da ake amfani da shi a cikin hasken SH na ain lokaci, PointAR ya tsara kiyasin haske a matsayin matsalar koyo kai tsaye daga gajimaren maki. Gajimaren maki, wanda ke wakiltar wani ɓangare na duba wurin, yana aiki azaman saiti na samfuran maki na yanayin. Cibiyar sadarwar jijiya (misali, dangane da PointNet ko bambancin mai sauƙi) tana koyo don tattara bayanai daga waɗannan maki don ƙaddara cikakkiyar yanayin haske. Wannan hanyar tana da inganci fiye da sarrafa hotunan RGB masu yawa kuma ta dace da ilimin kimiyyar jiki na jigilar haske.

3. Cikakkun Bayanai na Fasaha

3.1. Wakilcin Harmonics na Siffar Zobe

Ana wakiltar haske ta amfani da 2nd order Spherical Harmonics. Irradiance $E(\mathbf{n})$ a wurin saman da ke da al'ada $\mathbf{n}$ ana kiyasin shi kamar haka: $$E(\mathbf{n}) \approx \sum_{l=0}^{2} \sum_{m=-l}^{l} L_l^m Y_l^m(\mathbf{n})$$ inda $L_l^m$ suke ma'auni na SH da za a yi hasashen, kuma $Y_l^m$ su ne ayyukan tushe na SH. Wannan wakilcin da ya taƙaita (ƙimomi 27 don RGB) shine ma'auni a cikin zane na ain lokaci, yana sa abin da PointAR ke fitarwa ya zama mai amfani kai tsaye ta injunan AR na wayar hannu.

3.2. Tsarin Cibiyar Sadarwa

Takardar tana nuna amfani da cibiyar sadarwa mai sauƙi wacce ta dace da gajimaren maki. Duk da yake ba a yi cikakken bayani game da ainihin tsarin a cikin taƙaitaccen bayani ba, da alama zai haɗa da cire fasali a kowane maki (ta amfani da MLPs), aikin tattarawa mai daidaito (kamar max-pooling) don ƙirƙirar bayanin wurin duniya, da kuma layukan ƙarshe na regression don fitar da ma'auni na SH. Babban ƙa'idar ƙira ita ce inganci na farko na wayar hannu, tare da ba da fifiko ga ƙarancin adadin sigogi da FLOPs.

4. Gwaje-gwaje & Sakamako

4.1. Kimantawa ta Ƙididdiga

An kimanta PointAR da hanyoyin zamani kamar na Gardner et al. [12] da Garon et al. [13]. Ma'auni da alama sun haɗa da kuskuren kusurwa tsakanin jeri na SH da aka yi hasashen da na gaskiya, ko ma'auni na fahimta akan abubuwan da aka zana. Takardar ta yi iƙirarin cewa PointAR ya sami ƙananan kurakuran kiyasin haske idan aka kwatanta da waɗannan ma'auni, yana nuna cewa inganci baya zuwa da farashin daidaito.

Fitattun Ayyuka

  • Daidaito: Ƙananan kuskuren ƙididdiga fiye da hanyoyin SOTA.
  • Inganci: Ƙananan amfani da albarkatu da yawa.
  • Sauri: An tsara shi don ƙimar firam ɗin wayar hannu.

4.2. Kimantawa ta Hali & Hoto

Hoto na 1 a cikin PDF (wanda aka ambata yana nuna zomaye na Stanford) yana ba da sakamako na hali. Layi na 1 yana nuna abubuwa na zahiri (zomaye) waɗanda PointAR ya yi hasashen ma'auni na SH ke haskaka su a ƙarƙashin yanayi masu bambancin sarari. Layi na 2 yana nuna zanen gaskiya. Kamancen gani tsakanin layukan biyu yana nuna ikon PointAR na samar da inuwa na gaskiya, inuwa, da zubar da launi waɗanda suka dace da ainihin yanayin haske.

4.3. Binciken Ingantaccen Amfani da Albarkatu

Wannan shine da'awar da PointAR ya fito da ita. Tsarin aikin yana buƙatar ƙananan albarkatu da yawa (dangane da girman samfurin, ƙarar ƙwaƙwalwar ajiya, da lissafi) idan aka kwatanta da hanyoyin CNN na baya-bayan nan. An bayyana cewa rikitarwarsa yana kama da na Cibiyoyin Sadarwar Jijiya (DNN) na musamman na wayar hannu na zamani, yana sa aiwatar da ain lokaci akan na'urar ya zama gaskiya mai amfani.

5. Tsarin Bincike & Nazarin Lamari

Babban Fahimta: Hazakar takardar tana cikin rarrabuwarta. Yayin da fagen ke tseren gina manyan, CNN guda ɗaya na hoto-zuwa-haske (wani yanayi mai kama da tseren makamai na farko na GAN/CNN), Zhao da Guo sun ɗauki mataki na baya. Sun gane cewa matsalar "bambancin sarari" ta asali ce ta lissafi, ba kawai na fahimta ba. Ta hanyar ɗaukar wannan zuwa wani canji na lissafi mai sauƙi a bayyane, sun 'yantar da cibiyar sadarwar jijiya don mayar da hankali kawai akan ainihin aikin ƙaddara daga wakilcin bayanai mafi dacewa—gajimaren maki. Wannan ƙa'idar ƙira ce ta "tsarin haɗin gwiwa mai kyau" da yawa suka yi watsi da ita a cikin binciken zurfin koyo kawai.

Kwararar Ma'ana: Ma'anar tana da kyau: 1) AR na wayar hannu yana buƙatar saurin haske mai sanin sarari. 2) Hotuna suna da bayanai masu yawa kuma ba su da alaƙa da tsarin jiki. 3) Gajimaren maki shine wakilcin 3D na asali daga na'urori masu auna RGB-D kuma yana da alaƙa kai tsaye da samfurin haske. 4) Don haka, koyo daga gajimaren maki bayan daidaitawar lissafi. Wannan kwararar tana kama da mafi kyawun ayyuka a cikin injinan mutum-mutumi (ji->samfurin->tsara) fiye da gani na kwamfuta na yau da kullun.

Ƙarfi & Kurakurai: Babban ƙarfinsa shine ingancinsa na aiki, yana magance matsalar turawa kai tsaye. Na'urar lissafi a bayyane tana da fassara da ƙarfi. Duk da haka, wani lahani mai yuwuwa shine dogaronsa akan ingantaccen bayanin zurfi. Zurfin da ke da hayaniya ko bacewa daga na'urori masu auna wayar hannu (misali, LiDAR na iPhone a cikin yanayi masu kalubale) na iya lalata canjin dubawa. Takardar, kamar yadda aka gabatar a cikin taƙaitaccen bayani, bazai magance wannan matsalar ƙarfi gaba ɗaya ba, wanda ke da mahimmanci ga AR na zahiri. Bugu da ƙari, zaɓin 2nd order SH, yayin da yake da inganci, yana iyakance wakilcin cikakkun bayanai na haske mai girma (inuwa mai kaifi), wani ciniki wanda ya kamata a yi muhawara a bayyane.

Fahimta Mai Aiki: Ga masu aiki, wannan aikin shiri ne: koyaushe raba tsarin jiki daga koyon bayyanar a cikin ayyukan 3D. Ga masu bincike, yana buɗe hanyoyi: 1) Haɓaka ƙarin masu koyo na gajimaren maki masu inganci (ta amfani da ayyuka kamar PointNeXt). 2) Bincika ƙarfi ga hayaniyar zurfi ta hanyar ingantattun na'urori da aka koya. 3) Bincikin zaɓin oda na SH mai daidaitawa dangane da abubuwan da ke cikin wurin. Babban abin da za a ɗauka shine cewa a cikin AR na wayar hannu, mafita mai nasara za ta kasance haɗin gwiwa na lissafi na gargajiya da AI mai sauƙi, ba cibiyar sadarwar jijiya mai ƙarfi ba. Wannan ya dace da babban canjin masana'antu zuwa "Neural Rendering" waɗanda ke haɗa zane na gargajiya tare da abubuwan da aka koya, kamar yadda aka gani a cikin ayyuka kamar NeRF, amma tare da mai da hankali sosai kan ƙuntatawa na wayar hannu.

Bincike na Asali (kalmomi 300-600): PointAR yana wakiltar muhimmin gyara kwas na da ake buƙata a cikin neman AR na wayar hannu mai gaskiya. Shekaru da yawa, babban tsari, wanda nasarar CNNs a cikin haɗin hoto (misali, Pix2Pix, CycleGAN) ya rinjaye shi, shine a ɗauki kiyasin haske a matsayin matsalar fassarar hoto-zuwa-hoto ko hoto-zuwa-sigogi. Wannan ya haifar da gine-ginen da suke da ƙarfi amma masu nauyi sosai, suna yin watsi da ƙuntatawa na musamman na yankin wayar hannu—ƙarancin lissafi, kasafin kuɗin zafi, da buƙatar ƙarancin jinkiri. Aikin Zhao da Guo shine zargi mai kaifi game da wannan yanayin, ba a cikin kalmomi ba amma a cikin gine-gine. Babban fahimtarsu—don amfani da gajimaren maki—yana da bangarori da yawa. Na farko, ya yarda cewa haske wani abu ne na 3D, mai girma. Kamar yadda aka kafa a cikin rubutun zane na asali da kuma babban aiki akan taswirorin muhalli na Debevec et al., haske yana da alaƙa da tsarin 3D na wurin. Gajimaren maki shine samfurin kai tsaye, mara yawa na wannan tsari. Na biyu, yana haɗuwa da tushen jiki na hasken Spherical Harmonics kanta, wanda ya dogara da haɗin Monte Carlo a kan siffar zobe. Gajimaren maki daga na'urar auna zurfi ana iya ganinsa azaman saiti na jagororin da aka samfura masu mahimmanci tare da ƙimar haske masu alaƙa (daga hoton RGB), yana sa aikin koyo ya fi tushe. Wannan hanyar tana tunawa da falsafar "bincike ta hanyar haɗawa" ko juyar da zane, inda mutum yake ƙoƙarin juyar da samfurin gaba (zane) ta hanyar amfani da tsarinsa. Idan aka kwatanta da hanyar baƙar fata na hanyoyin da suka gabata, tsarin aikin PointAR yana da fassara sosai: matakin lissafi yana sarrafa canjin mahangar, cibiyar sadarwa tana sarrafa ƙaddara daga bayanan ɓangare. Wannan haɗin kai ƙarfi ne don gyara kurakurai da ingantawa. Duk da haka, aikin kuma yana nuna dogaro mai mahimmanci: ingancin na'urori masu auna RGB-D na kasuwanci. Yaduwar na'urori masu auna LiDAR a kan wayoyi masu kyau (Apple, Huawei) na baya-bayan nan ya sa PointAR ya dace da lokaci, amma ayyukansa akan zurfin daga tsarin stereo ko SLAM (mafi yawanci) yana buƙatar bincike. Aikin nan gaba zai iya bincika haɗin gwiwar ƙididdige zurfi da aikin ƙididdige haske, ko amfani da cibiyar sadarwa don inganta gajimaren maki na farko mai hayaniya. A ƙarshe, gudunmawar PointAR ita ce nuna cewa daidaiton zamani a cikin aikin fahimta baya buƙatar rikitarwa na zamani lokacin da aka haɗa ilimin yanki yadda ya kamata. Darasi ne da al'ummar AI ta wayar hannu za su yi kyau su kula da shi.

6. Aikace-aikace na Gaba & Jagorori

  • Haske Mai Sauƙi na Ain Lokaci: Tsawaita PointAR don sarrafa tushen haske mai motsi (misali, kunna/kashe fitila) ta hanyar haɗa bayanan lokaci ko jerin gajimaren maki.
  • Kiyasin Haske na Waje: Daidaita tsarin aikin don AR na waje, magance babban kewayon motsi na rana da zurfin mara iyaka.
  • Haɗin Neural Rendering: Amfani da hasken da PointAR ya yi hasashen a matsayin shigar da sharadi don filayen haske na jijiya akan na'ura (tiny-NeRF) don ƙarin shigar da abu na gaskiya.
  • Haɗin Na'urar Aunawa: Haɗa bayanai daga wasu na'urori masu auna wayar hannu (na'urori masu auna motsi, na'urori masu auna hasken muhalli) don inganta ƙarfi da kuma magance lokuta inda zurfin ba shi da aminci.
  • Haɗin gwiwar Edge-Cloud: Turawa sigar mai sauƙi akan na'ura don amfani na ain lokaci, tare da samfurin mai nauyi, mafi daidaito akan gajimare don ingantawa na lokaci-lokaci ko sarrafa kashe wuta.
  • Kiyasin Kayan Aiki: Haɗin ƙididdige hasken wurin da kaddarorin kayan saman (reflectance) don ƙarin haɗawa mai daidaiton jiki.

7. Nassoshi

  1. Zhao, Y., & Guo, T. (2020). PointAR: Efficient Lighting Estimation for Mobile Augmented Reality. arXiv preprint arXiv:2004.00006.
  2. Gardner, M., et al. (2019). Learning to Predict Indoor Illumination from a Single Image. ACM TOG.
  3. Garon, M., et al. (2019). Fast Spatially-Varying Indoor Lighting Estimation. CVPR.
  4. Song, S., et al. (2019). Deep Lighting Environment Map Estimation from Spherical Panoramas. CVPR Workshops.
  5. Debevec, P. (1998). Rendering Synthetic Objects into Real Scenes. SIGGRAPH.
  6. Zhu, J., et al. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV. (CycleGAN)
  7. Qi, C. R., et al. (2017). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CVPR.
  8. Mildenhall, B., et al. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV.